One-way Anova (post-hoc)

Dr. Elijah Meyer

NC State University
ST 511 - Fall 2025

2025-10-22

Checklist

– Exam-1 (in-class) is posted

  > Solutions on Moodle 
  > Regrade requests open Wednesday after class (until Sunday)
  

– I’m grading your take-homes now

– Quiz released Wednesday (due Sunday)

– Homework released Friday (due next Friday)

– Statistics experience released (we will talk about it)

Last Time

Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

Why can we not answer this question using difference in means?

# A tibble: 3 × 3
  group count  mean
  <fct> <int> <dbl>
1 ctrl     10  5.03
2 trt1     10  4.66
3 trt2     10  5.53

Last time

Where did all of these values come from? What do they mean?

            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  3.766  1.8832   4.846 0.0159 *
Residuals   27 10.492  0.3886                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Definitions

The Sum of Squares (SS) is a measure of total variability (between or within)

The Mean Squares (MS) is an estimate (\(s^2\)) of the population variance (\(\sigma^2\))

It’s a generalized form of \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)

Deviations divided by degrees of freedom!

New statistic

\(F = \frac{\text{MS}_{\text{Between}}}{\text{MS}_{\text{Within}}}\)

\(\mathbf{F} = \frac{\frac{\text{SS}_{\text{Between}}}{k - 1}}{\frac{\text{SS}_{\text{Within}}}{N - k}}\)

\(\text{SS}_{\text{Between}} = \sum_{i=1}^{k} n_i (\bar{X}_i - \bar{X}_{\text{grand}})^2\)

\(\text{SS}_{\text{Within}} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_i)^2\)

So what’s actually different?

Tukeys HSD

The main idea of the hsd is to compute the honestly significant difference (hsd) between all pairwise comparisons

Why do we need to use this thing called Tukeys hsd? Why can’t we just conduct a bunch of individual hypothesis tests?

Type 1 error

\(\alpha\) is our significance level. It’s also our Type 1 error rate.

a Type 1 error is:

– rejecting the null hypothesis

– when the null hypothesis is actually true

Family-wise error rate

FWER = \(1 - (1-\alpha)^m\)

where m is the number of comparisons

Family-wise error rate

“k choose 2” = \(\frac{k(k-1)}{2}\) (write this down)

Chocolate study

Article

Learning objectives

– Why we use tukeysHSD

– What distribution we use to account for the inflated type-1 error rate

– How to read output

Pairwise comparisons

Commonly report confidence intervals to estimate which means are actually different (also reports p-values).

tukey’s hsd (technically Tukey-Kramer with unequal sample size)

\(\bar{x_1} - \bar{x_2} \pm \frac{q^*}{\sqrt{2}}* \sqrt{MSE*\frac{1}{n_j} + \frac{1}{n_j`}}\)

Where

\(q^*\) is a value from the studentized range distribution

– (MSE) refers to the average of the squared differences between individual data points within each group and their respective group mean, divided by the degrees of freedom. Also known as the estimated variance by pooling all the groups.

\[\text{MSE} = \frac{\sum_{i=1}^{k} (n_i - 1)s_i^2}{N - k}\]

Pairwise comparisons

\(q^*\) can be found using the qtukey function in R, or here

Typically, we use pre-packaged functions/applets to do the above calculations for us.

qtukey(p = 0.95, nmeans = 3, df = 27)
[1] 3.506426

test-statsitic

\[Q = \frac{\bar{x}_i - \bar{x}_j}{\sqrt{\frac{MSE}{2} \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}}\]

Results

anova_model <- aov(weight ~ group, data = PlantGrowth)

plant_tukey <- TukeyHSD(anova_model)

plant_tukey
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = weight ~ group, data = PlantGrowth)

$group
            diff        lwr       upr     p adj
trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
trt2-ctrl  0.494 -0.1972161 1.1852161 0.1979960
trt2-trt1  0.865  0.1737839 1.5562161 0.0120064

Summary

Tukey’s Honest Significant Difference (HSD) test is a post hoc test commonly used to assess the significance of differences between pairs of group means. Tukey HSD is often a follow up to one-way ANOVA, when the F-test has revealed the existence of a significant difference between some of the tested groups.

More than one way to do this

Bonferroni

Tukey’s HSD is best for making all possible pairwise comparisons, while the Bonferroni correction is more general and suitable for a small number of pre-planned comparisons

Bonferroni

You conduct individual t-tests, but divide by the original significance level \(\alpha\) by the number of tests being performed (\(m\)).

What does this do in terms of evidence to reject the null hypothesis?

Questions?

Anova vs Post-Hoc

A conversation

There is (some) debate on if Anova is necessary to perform before using tukeyHSD to find which means are different.

Anova first

Logistic guardrail

Applying a global test first is pretty solid protection against the risk of (even inadvertently) uncovering spurious “significant” results from post-hoc data snooping.

It is known that a global ANOVA F test can detect a difference of means even in cases where no individual test of any of the pairs of means will yield a significant result. In other words, in some cases the data can reveal that the true means likely differ but it cannot identify with sufficient confidence which pairs of means differ.

Example

              Df Sum Sq Mean Sq F value Pr(>F)  
factor(group)  2 11.132   5.566    11.3 0.0401 *
Residuals      3  1.478   0.493                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ factor(group))

$`factor(group)`
           diff         lwr      upr     p adj
2-1  2.90306081 -0.03000355 5.836125 0.0513476
3-1  2.87565627 -0.05740809 5.808721 0.0526193
3-2 -0.02740454 -2.96046891 2.905660 0.9991602

Questions?

ae-anova