Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers

Lisa Hartling; Andrea Milne; Michele P Hamm; Ben Vandermeer; Mohammed Ansari; Alexander Tsertsvadze; Donna M Dryden

doi:10.1016/j.jclinepi.2013.03.003

Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers

J Clin Epidemiol. 2013 Sep;66(9):982-93. doi: 10.1016/j.jclinepi.2013.03.003. Epub 2013 May 16.

Authors

Lisa Hartling¹, Andrea Milne, Michele P Hamm, Ben Vandermeer, Mohammed Ansari, Alexander Tsertsvadze, Donna M Dryden

Affiliation

¹ Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 4-472 Edmonton Clinic Health Academy, 11405-87 Avenue, Edmonton, Alberta, Canada T5G 1C9. hartling@ualberta.ca

PMID: 23683848
DOI: 10.1016/j.jclinepi.2013.03.003

Abstract

Objectives: To assess inter-rater reliability and validity of the Newcastle Ottawa Scale (NOS) used for methodological quality assessment of cohort studies included in systematic reviews.

Study design and setting: Two reviewers independently applied the NOS to 131 cohort studies included in eight meta-analyses. Inter-rater reliability was calculated using kappa (κ) statistics. To assess validity, within each meta-analysis, we generated a ratio of pooled estimates for each quality domain. Using a random-effects model, the ratios of odds ratios for each meta-analysis were combined to give an overall estimate of differences in effect estimates.

Results: Inter-rater reliability varied from substantial for length of follow-up (κ = 0.68, 95% confidence interval [CI] = 0.47, 0.89) to poor for selection of the nonexposed cohort and demonstration that the outcome was not present at the outset of the study (κ = -0.03, 95% CI = -0.06, 0.00; κ = -0.06, 95% CI = -0.20, 0.07). Reliability for overall score was fair (κ = 0.29, 95% CI = 0.10, 0.47). In general, reviewers found the tool difficult to use and the decision rules vague even with additional information provided as part of this study. We found no association between individual items or overall score and effect estimates.

Conclusion: Variable agreement and lack of evidence that the NOS can identify studies with biased results underscore the need for revisions and more detailed guidance for systematic reviewers using the NOS.

Keywords: Cohort studies; Internal validity; Methodological quality; Reliability; Systematic reviews; Validity.

MeSH terms

Cohort Studies
Humans
Meta-Analysis as Topic*
Observer Variation*
Reproducibility of Results
Research Design
Review Literature as Topic*