On reporting and interpreting statistical significance and p values in medical research

Herman Aguinis; Matt Vassar; Cole Wayant

doi:10.1136/bmjebm-2019-111264

Article Text

PDF

EBM opinion and debate

On reporting and interpreting statistical significance and p values in medical research

Herman Aguinis1,
Matt Vassar2,
http://orcid.org/0000-0001-8829-8179Cole Wayant2

¹ Management, The George Washington University, Washington, District of Columbia, USA
² Psychiatry and Behavioral Sciences, Oklahoma State University Center for Health Sciences, Tulsa, Oklahoma, USA

Correspondence to Cole Wayant, Oklahoma State University Center for Health Sciences, Tulsa, OK 74107, USA; cole.wayant{at}okstate.edu

https://doi.org/10.1136/bmjebm-2019-111264

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Recent proposals to change the p value threshold from 0.05 to 0.005 or to retire statistical significance altogether have garnered much criticism and debate.1 2 As of the writing of our manuscript, the proposal to eliminate statistical significance testing, backed by over 800 signatories, achieved record-breaking status on Altmetrics, with an attention score exceeding 13 000 derived from 19 000 Twitter comments and 35 news stories. We appreciate the renewed enthusiasm for tackling important issues related to the analysis, reporting and interpretation of scientific research results. Our perspective, however, focuses on the current use and reporting of statistical significance and where we should go from here.

We begin by saying that p values themselves are not flawed. Rather, the use, misuse or abuse of p values in ways antithetical to rigorous scientific pursuits is the flaw. If p values are a hammer, scientists are the hammer wielders. One would not discard the hammer if the wielder, when using the hammer, repeatedly missed the nail. Similarly, one would not discard the hammer if the wielder used the hammer in a way not suited to the hammer’s purpose, such as in an attempt to drive a screw. Rather, one would expect that the fault lies with the hammer-wielder and recommend ways to refine the hammer’s use. Thus, a focus on education and reform may be more helpful than the abandonment of statistical significance testing, which is a tool that can be used well, or misused and even abused.
Similarly, in this perspective, we argue that abandoning statistical significance because scientists misuse p values does not address the underlying problems of statistical negligence. Similarly, it does not address the incorrect belief that statistical significance equates to clinical significance.3

The a priori level (ie, alpha or type I error rate) and the precisely observed probability values (ie, p) should be explicitly stated and justified in protocols and published reports of medical studies. We have examined current guidance on p value reporting in influential sources in medicine (table 1). Generally, this guidance supports reporting exact p values but fails to issue direction on specifying the a priori significance level. The ‘conventional’ a priori significance (ie, type I error) level in many scientific disciplines is 0.05—an arbitrary choice. Two issues arise when scientists arbitrarily default to an a priori significance level: results become misleading and the relative seriousness of making a type I (‘false-positive’) or type II error (‘false-negative’) is ignored.

View this table:

Table 1

Guidance on p value, alpha prespecification and effect size reporting from influential sources in medicine

First, misleading results may fall on either side of the conventional 0.05 threshold, with scientists either rejecting or accepting the null hypothesis blindly—failing to consider sample size, measurement error and other factors that affect observed p values but are unrelated to the size of the effect in the population. Also, when considering the dichotomous interpretation of a truly continuous probability, Rosnow and Rosenthal4 sarcastically lamented that ‘Surely, God loves the 0.06 nearly as much as the 0.05’. Second, the choice of an a priori significance level should be made in the context of the potential for type II error. When researchers arbitrarily default to a type I error rate of 0.05, it has been calculated that the corresponding type II error is approximately 60%, because statistical power (ie, probability to correctly reject a null hypothesis) is usually insufficient given small sample sizes and the pervasive and unavoidable use of less-than-perfectly reliable measures.5 6 In other words, while authors focus on whether their results show an acceptably small type I error rate, type II error—the probability of accepting the null hypothesis erroneously and incorrectly concluding that an effect is absent—looms large. Do authors, peer reviewers, editors and readers of studies that fail to reach statistical significance consider the probability that the results are falsely ‘negative’?

A second limitation in the current guidance is the inconsistency in mandating effect size reporting that describes the strength of the relationship and/or the effect found. The only information to be gleaned from p values is whether the observed data are likely where the null hypothesis (that no effect exists) true. Therefore, a p value without an effect size is like peering into a pool of murky water: one cannot determine the depth, just say that it is likely that a pool exists. Consider interventions for improving medication adherence for patients with hypertension. A recent systematic review of medication adherence interventions found that the overall standardised mean difference for systolic blood pressure was 0.235—a 3 mm Hg difference.7 Translating mean differences to clinical differences assists in determining the practical value of the intervention. In this example, the clinician must consider whether a 3 mm Hg reduction in systolic blood pressure is clinically meaningful and weigh this reduction against the factors associated with enacting the intervention as well as whether other interventions might yield a more clinically meaningful improvement. Some of the influential guidance (or omission thereof) provided to authors in medicine (table 1) may serve to promote the poor statistical practices that readers work to mitigate. Therefore, it is our perspective that not only should all guidance emphasise reporting effect sizes, but that all guidance to interpret and report effect sizes in a meaningful way should be included as well. For example, one may report the absolute difference between groups and the number needed to treat for a medical intervention. Readers may be incapable of determining the meaningfulness of a p value but are well-equipped to interpret an absolute difference in effectiveness.

Taken together, reporting (1) precise observed p values (rather than whether it is larger or smaller than arbitrary cutoffs), (2) effect sizes and (3) the practical importance of effect sizes (ie, their interpretation for clinical practice) would improve our understanding of the meaning of study findings. Let us not throw out the baby with the bathwater.

References

↵
2. Benjamin DJ ,
3. Berger JO ,
4. Johannesson M , et al
. Redefine statistical significance. Nat Hum Behav 2018;2:6–10.doi:10.1038/s41562-017-0189-z
OpenUrl
↵
2. Amrhein V ,
3. Greenland S ,
4. McShane B
. Scientists rise up against statistical significance. Nature 2019;567:305–7.doi:10.1038/d41586-019-00857-9
OpenUrl CrossRef PubMed
↵
2. Aguinis H ,
3. Werner S ,
4. Lanza Abbott J , et al
. Customer-centric science: reporting significant research results with rigor, relevance, and practical impact in mind. Organ Res Methods 2010;13:515–39.doi:10.1177/1094428109333339
OpenUrl CrossRef
↵
2. Rosnow RL ,
3. Rosenthal R
. Statistical procedures and the justification of knowledge in psychological science. American Psychologist 1989;44:1276–84.doi:10.1037/0003-066X.44.10.1276
OpenUrl CrossRef Web of Science
↵
2. Sedlmeier P ,
3. Gigerenzer G
. Do studies of statistical power have an effect on the power of studies? Psychol Bull 1989;105:309–16.doi:10.1037/0033-2909.105.2.309
OpenUrl CrossRef Web of Science
↵
2. Aguinis H ,
3. Stone-Romero EF
. Methodological artifacts in moderated multiple regression and their effects on statistical power. J Appl Psychol 1997;82:192–206.doi:10.1037/0021-9010.82.1.192
OpenUrl CrossRef Web of Science
↵
2. Conn VS ,
3. Ruppar TM ,
4. Chase J-AD
. Blood pressure outcomes of medication adherence interventions: systematic review and meta-analysis. J Behav Med 2016;39:1065–75.doi:10.1007/s10865-016-9730-1
OpenUrl
1. New England Journal of Medicine
. Submitting to NEJM, 2019. Available: https://www.nejm.org/author-center/new-manuscripts [Accessed 1 Oct 2019].
1. Journal of the American Medical Association
. Instructions for authors: statistical methods and data presentation, 2019. Available: https://jamanetwork.com/journals/jama/pages/instructions-for-authors#SecStatisticalMethodsandDataPresentation [Accessed 1 Oct 2019].
1. The Lancet
. Information for authors, 2019. Available: https://els-jbs-prod-cdn.literatumonline.com/pb/assets/raw/Lancet/authors/tl-info-for-authors-1568214645933.pdf [Accessed 1 Oct 2019].
2. Lang TA ,
3. Altman DG
. Basic statistical reporting for articles published in Biomedical Journals: The “Statistical Analyses and Methods in the Published Literature” or the SAMPL Guidelines. Int J Nurs Stud 2015;52:5–9.doi:10.1016/j.ijnurstu.2014.09.006
OpenUrl CrossRef PubMed
1. Annals of Internal Medicine
. Information for authors - general statistical guidance, 2019. Available: https://annals.org/aim/pages/AuthorInformationStatisticsOnly [Accessed 1 Oct 2019].
1. International Conference on Harmonisation
. Ich Harmonised tripartite guideline statistical principles for clinical trials E9, 1998. Available: https://database.ich.org/sites/default/files/E9_Guideline.pdf [Accessed 1 Oct 2019].
2. Schulz KF ,
3. Altman DG ,
4. Moher D , et al
. Consort 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c332. doi:10.1136/bmj.c332

Footnotes

Twitter @ColeWayant_OK
Contributors HA and MV conceptualised the paper. CW extracted all the data. HA, MV and CW wrote the manuscript and approve of it in its final form.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.

[1] ↵

Benjamin DJ ,
Berger JO ,
Johannesson M , et al
. Redefine statistical significance. Nat Hum Behav 2018;2:6–10.doi:10.1038/s41562-017-0189-z
OpenUrl

[3] Benjamin DJ ,

[4] Berger JO ,

[5] Johannesson M , et al

[6] ↵

Amrhein V ,
Greenland S ,
McShane B
. Scientists rise up against statistical significance. Nature 2019;567:305–7.doi:10.1038/d41586-019-00857-9
OpenUrl CrossRef PubMed

[8] Amrhein V ,

[9] Greenland S ,

[10] McShane B

[11] ↵

Aguinis H ,
Werner S ,
Lanza Abbott J , et al
. Customer-centric science: reporting significant research results with rigor, relevance, and practical impact in mind. Organ Res Methods 2010;13:515–39.doi:10.1177/1094428109333339
OpenUrl CrossRef

[13] Aguinis H ,

[14] Werner S ,

[15] Lanza Abbott J , et al

[16] ↵

Rosnow RL ,
Rosenthal R
. Statistical procedures and the justification of knowledge in psychological science. American Psychologist 1989;44:1276–84.doi:10.1037/0003-066X.44.10.1276
OpenUrl CrossRef Web of Science

[18] Rosnow RL ,

[19] Rosenthal R

[20] ↵

Sedlmeier P ,
Gigerenzer G
. Do studies of statistical power have an effect on the power of studies? Psychol Bull 1989;105:309–16.doi:10.1037/0033-2909.105.2.309
OpenUrl CrossRef Web of Science

[22] Sedlmeier P ,

[23] Gigerenzer G

[24] ↵

Aguinis H ,
Stone-Romero EF
. Methodological artifacts in moderated multiple regression and their effects on statistical power. J Appl Psychol 1997;82:192–206.doi:10.1037/0021-9010.82.1.192
OpenUrl CrossRef Web of Science

[26] Aguinis H ,

[27] Stone-Romero EF

[28] ↵

Conn VS ,
Ruppar TM ,
Chase J-AD
. Blood pressure outcomes of medication adherence interventions: systematic review and meta-analysis. J Behav Med 2016;39:1065–75.doi:10.1007/s10865-016-9730-1
OpenUrl

[30] Conn VS ,

[31] Ruppar TM ,

[32] Chase J-AD

[33] New England Journal of Medicine
. Submitting to NEJM, 2019. Available: https://www.nejm.org/author-center/new-manuscripts [Accessed 1 Oct 2019].

[34] New England Journal of Medicine

[35] Journal of the American Medical Association
. Instructions for authors: statistical methods and data presentation, 2019. Available: https://jamanetwork.com/journals/jama/pages/instructions-for-authors#SecStatisticalMethodsandDataPresentation [Accessed 1 Oct 2019].

[36] Journal of the American Medical Association

[37] The Lancet
. Information for authors, 2019. Available: https://els-jbs-prod-cdn.literatumonline.com/pb/assets/raw/Lancet/authors/tl-info-for-authors-1568214645933.pdf [Accessed 1 Oct 2019].

[38] The Lancet

[39] Lang TA ,
Altman DG
. Basic statistical reporting for articles published in Biomedical Journals: The “Statistical Analyses and Methods in the Published Literature” or the SAMPL Guidelines. Int J Nurs Stud 2015;52:5–9.doi:10.1016/j.ijnurstu.2014.09.006
OpenUrl CrossRef PubMed

[41] Lang TA ,

[42] Altman DG

[43] Annals of Internal Medicine
. Information for authors - general statistical guidance, 2019. Available: https://annals.org/aim/pages/AuthorInformationStatisticsOnly [Accessed 1 Oct 2019].

[44] Annals of Internal Medicine

[45] International Conference on Harmonisation
. Ich Harmonised tripartite guideline statistical principles for clinical trials E9, 1998. Available: https://database.ich.org/sites/default/files/E9_Guideline.pdf [Accessed 1 Oct 2019].

[46] International Conference on Harmonisation

[47] Schulz KF ,
Altman DG ,
Moher D , et al
. Consort 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c332. doi:10.1136/bmj.c332

[49] Schulz KF ,

[50] Altman DG ,

[51] Moher D , et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Statistics from Altmetric.com

Request Permissions

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password