Joe Mudge wrote a guest post for the blog here in March of this year. The post dealt with our use of α=0.05 for hypothesis testing. The following is another guest post by Joe Mudge in which he presents evidence of the benefit of using the optimal alpha level approach. Please chime in with your thoughts.
By: Joseph F. Mudge, Ph.D. candidate (biology), University of New Brunswick, Saint John, N.B., Canada, email@example.com
In March 2012, I wrote a blog post on the use of null hypothesis significance testing in ecology, attempting to address the incongruity of their widespread use despite widespread criticism (here). I argued that null hypothesis significance tests continue to have utility in ecology as a statistical decision-making tool and that much of the criticism revolves around the consistent use of an arbitrary value (α=0.05) as a statistical decision-making threshold. I outlined a recently described approach (Mudge et al. 2012a) that can be used to calculate study-specific α levels that achieve the best possible compromise between Type I errors and biologically relevant Type II errors. The optimal α approach can minimize the combined probabilities or costs of errors while avoiding the implicit, unexamined assumptions about biological relevance and relative costs of Type I and II errors made when using α=0.05.
For a given combination of inputs (critical effect size, and if known, relative error costs and/or relative prior probabilities of null and alternate hypotheses), the optimal α approach can be objectively applied post-hoc to re-evaluate test conclusions made using α=0.05. This begs the question, “How much would the optimal α approach affect test conclusions, on average?” To address this question, my colleagues and I recently compared test conclusions between α=0.05 and optimal α for over 1200 environmental monitoring tests for differences in fish condition, liver size and gonad size upstream and downstream of pulp mills, conducted under Canada’s Environmental Effects Monitoring program from 1992-2003 (Mudge et al. 2012b).
Depending on the sample size and amount of variability in the data for each test, optimal α levels were occasionally very small and occasionally very large. Optimal α levels ranged from several orders of magnitude smaller than 0.05 (when sample sizes were large and variability low) to as large as 0.3 (when samples sizes were small and variability high).
By comparing the outcome of each test using an optimal α level to the outcome reached using α=0.05, we observed that the optimal α approach frequently resulted in different test outcomes. Of 1256 environmental monitoring tests, 148 (12%) reached opposite outcomes using optimal α levels from those reached using α=0.05.
After calculating the statistical power for each test at α=0.05, we were able to calculate the average of Type I and Type II error associated with using α=0.05 for each test and compare this to the minimized average of Type I and Type II error associated with the optimal α for each test. The median reduction in the average of Type I and Type II error associated with switching to an optimal α was 16%.
By assuming α=0.05 was the optimal α for each test such that it minimized the relative costs of Type I and Type II error for that study, we determined that tests for differences in fish gonad size and liver size were typically designed in a way that implied that Type I errors (reflecting unnecessary extra monitoring expenses for industry) were more serious than Type II errors (reflecting unidentified environmental impact), while the reverse was implied for tests for differences in fish condition.
Optimal α re-analysis of the Canadian Environmental Effects Monitoring dataset shows us that α=0.05 is rarely the optimal α for null hypothesis significance test and as a consequence, optimal test outcomes frequently differ from those obtained using α=0.05. In addition to potentially yielding different test outcomes, the optimal α approach offers substantially lower averages of Type I and Type II errors relative to α=0.05. The optimal α approach can also reveal the inconsistent implied relative costs of Type I vs. Type II errors that occur with consistent use of α=0.05. Had the Canadian Environmental Effects Monitoring program used the optimal α approach, they would have had lower combined error rates and would have made statistical decisions consistent with explicitly considered and transparently stated critical effect sizes and relative costs of Type I and Type II errors.
Mudge JF, Baker LF, Edge CB, Houlahan JE. 2012a. Setting an optimal α that minimizes errors in null hypothesis significance tests. PLoS ONE 7: e32734. dx.doi.org/10.1371/journal.pone.0032734
Mudge JF, Barrett TJ, Munkittrick KR, Houlahan JE. 2012b. Negative consequences of using α=0.05 for environmental monitoring decisions: A case study from a decade of Canada’s Environmental Effects Monitoring program. Environmental Science and Technology 46: 9249-9255. dx.doi.org/10.1021/es301320n