Article Text

Special tests for assessing meniscal tears within the knee: a systematic review and meta-analysis
  1. Benjamin E Smith1,
  2. Damian Thacker2,
  3. Ali Crewesmith1,
  4. Michelle Hall3
  1. 1Department of Physiotherapy Outpatients, London Road Community Hospital, Derby Hospitals NHS Foundation Trust, Derby, UK
  2. 2Department of Physiotherapy Outpatients, Ashfield Health Village, Kirkby-In-Ashfield, Nottingham, UK
  3. 3School of Health Sciences, Clinical Sciences Building, University of Nottingham, Nottingham, UK
  1. Correspondence to Benjamin E Smith
    , Department of Physiotherapy Outpatients, London Road Community Hospital, Derby Hospitals NHS Foundation Trust, London Road, Derby DE1 2QY, UK; benjamin.smith3{at}


Background Musculoskeletal knee pain is a large and costly problem, and meniscal tears make up a large proportion of diagnoses. ‘Special tests’ to diagnose torn menisci are often used in the physical examination of the knee joint. A large number of publications within the literature have investigated the diagnostic accuracy of these tests, yet despite the wealth of research their diagnostic accuracy remains unclear.

Aim To synthesise the most current literature on the diagnostic accuracy of special tests for meniscal tears of the knee in adults.

Method An electronic search of MEDLINE, Cumulative Index to Nursing and Allies Health Literature (CINAHL), The Allied and Complementary Medicine Database (AMED) and SPORTDiscus databases was carried out from inception to December 2014. Two authors independently selected studies and independently extracted data. Methodological quality was evaluated using the Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS) 2 tool.

Results Nine studies were included (n=1234) and three special tests were included in the meta-analysis. The methodological quality of the included studies was generally poor. McMurray's had a sensitivity of 61% (95% CI 45% to 74%) and a specificity of 84% (95% CI 69% to 92%). Joint line tenderness had a sensitivity of 83% (95% CI 73% to 90%) and a specificity of 83% (95% CI 61% to 94%). Thessaly 20° had a sensitivity of 75% (95% CI 53% to 89%) and a specificity of 87% (95% CI 65% to 96%).

Conclusions The accuracy of the special tests to diagnose meniscal tears remains poor. However, these results should be used with caution, due to the poor quality and low numbers of included studies and high levels of heterogeneity.


The authors are grateful to Arthritis Research UK and the Chartered Society of Physiotherapy Charitable Trust for providing the funding for BES to complete this study.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The lifetime prevalence of musculoskeletal knee pain within England is 54%,1 with the point prevalence in working adults over 40 years of age 28%.2 Within the UK, 24% of workers between the ages of 16 and 65 present with musculoskeletal knee pain lasting up to 2 years, with 12% of all workers saying they needed time off within the past 12 months due to knee pain.3 Despite its importance to clinicians and patients, there is a paucity of information on the epidemiology of meniscal tears.4 The incidence of specific meniscal tears within the Netherlands is 2/1000/year,5 and they account for 25 000 hospital admissions a year in the UK.6 Although the prevalence of specific meniscal injuries within the UK is unknown, the point prevalence has been recorded as 57% in symptomatic knees and 36% in asymptomatic knees in Switzerland,7 and 32% in symptomatic knees and 23% in asymptomatic knees in the USA.8

‘Special tests’ have been a historical part of the physical examination during the clinical assessment of musculoskeletal knee pain,9 and a number of these special tests are thought to diagnose torn menisci. Apley's, McMurray's and joint line tenderness (JLT) are commonly used in practice,10 with Thessaly's being considered a new dynamic test with high diagnostic accuracy.11 The diagnostic accuracy of these special clinical tests for the detection of meniscal tears has been examined quite extensively within the literature, yet still remains unclear.9 ,12–15 Previous systematic reviews have not limited the age range of included participants to adults only, with many of the studies including children within their data. In addition, there exists some confusion over the definitions of the test procedures.9 ,12–15 For example, McMurray's test was originally described with the knee being tested from full flexion to 90°,16 but its use and application now varies widely.17 Similarly, Apley's test is originally described as only applying a lateral rotation force,18 but is often described with a lateral and medial rotational force.11 ,19–21 Four of the five previous systematic reviews do not make clear their specific definitions of their test procedure, and nor do they make an attempt to analyse or categorise their included studies by their definition of the clinical special tests used.12–15 Hegedus et al9 did attempt to subcategorise and analyse studies by test definition. However, it is unclear how much investigative work was carried out. For example, they included Karachalios et al11 in which clear contradiction exists for two of their special tests, since they reference two publications that describe them in different ways. Hegedus et al9 did not state in their data synthesis how this confusion was dealt with.

The last systematic review on the diagnostic accuracy of special tests for meniscal tears was conducted almost 6 years ago, with unclear results.15 Since then, the literature has been greatly added to; new standards of methodological quality by which systematic reviews are measured against have been introduced,22 and the statistical method by which meta-analyses are carried out for diagnostic accuracy studies has improved with the unification of the bivariate model.23 ,24

The main objective of this review was to synthesise the most up-to-date literature for diagnostic accuracy studies for meniscal tears of the knee for adults specifically and, if the data allowed, pool results into a meta-analysis.


The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were utilised during the search and reporting phase of this systematic review/meta-analysis.22

Search strategy

An electronic database search of titles and abstracts was conducted from inception to December 2014. A systematic search of the literature was conducted using MEDLINE, Cumulative Index to Nursing and Allies Health Literature (CINAHL), The Allied and Complementary Medicine Database (AMED) and SPORTDiscus databases. The specific search strategy differed depending on the electronic database being searched at that time (see online supplementary table S1 for the MEDLINE keywords and search strategy). Relevant articles, titles and abstracts were identified and screened and the reference lists of retrieved articles were also searched for additional references. An attempt was made to identify unpublished studies by emailing all authors from retrieved studies and previous systematic reviews.

Eligibility criteria

All studies examining the accuracy of special tests in diagnosing meniscal tears of the knee in adults (16 years of age or older) were included. The study must have had at least one clinical special test, must have reported specificity and sensitivity and been written in English. Special tests included McMurray's test,16 Apley's test,18 Thessaly's test11 or JLT.10 The tests must not have been carried out under anaesthetics or on cadavers, or been part of a composite examination. Clinical diagnosis by MRI or arthroscopy surgery was considered the gold standard reference test.

Study selection

One reviewer (BES) conducted the initial database searches and screened the titles and abstracts. Full copies of potential eligible papers were retrieved and independently screened by two reviewers (BES and DT). The initial percentage agreement was 95%. Using Cohen's statistical method, κ agreement was κ=0.87, which is considered near-perfect agreement.25 ,26 Any disagreements were resolved through discussion, without the need for a third reviewer who was available (AC).

Data extraction

One reviewer (BES) independently extracted data regarding study design, participant information, gold standard test used, clinical special test information, setting and outcome data.27 All data were independently checked by a second reviewer (AC). A description of the examination protocols for each special test is included in table 1.

Table 1

Test procedures of included studies

In order to complete pooling of data through a meta-analysis, the raw 2×2 data are required.28 Of the included studies, five had incomplete data to allow for this.11 ,21 ,29–31 All five were contacted, two responded and provided the raw data,30 ,31 one responded but advised that they no longer had the data,21 and two failed to respond.11 ,29 Sensitivity, specificity and likelihood ratios were calculated and summarised in table 2 along with the study characteristics.

Table 2

Included studies characteristic summary

Quality assessment

The methodological quality of the included studies was assessed independently by both reviewers using the Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS) 2 tool (BES and DT).32 Despite there being no point scoring system with QUADAS 2, authors may restrict primary analysis to studies only showing a low risk of bias; however, this is not thought to be best practice and subgroup analysis postheterogeneity investigation is considered optimal.32 Disagreements were resolved through discussion, with a third reviewer available (AC). Results were presented through graphs and tables provided by QUADAS via their website.32

Statistical analysis

All data were analysed using the OpenMetaAnalyst software.28 Heterogeneity between studies was assessed through the I2 statistic, with this systematic review considering 25% low, 50% moderate and 75% high.33 2×2 Data tables were created in order to perform a meta-analysis. If data were unavailable, then studies remained within the review for qualitative analysis. As recommended by Harbord's unification of models for meta-analysis of diagnostic accuracy studies, the bivariate model was used for pooling of data with their corresponding 95% CIs.23 ,24 The bivariate model is recommended for diagnostic accuracy studies where inherent heterogeneity exists between studies, for example, in threshold effects, study populations and index testing protocols, since it uses a model similar to the random effects model used for treatment efficacy meta-analyses.24

Sensitivity, also called the true positive rate, is the measure of true positives actually identified (eg, the percentage of people with a meniscal tear who are correctly diagnosed as having a tear). Specificity, also called the true negative rate, is the measure of true negatives actually identified (eg, the percentage of people who do not have a meniscal tear, who are correctly diagnosed as not having one). These two measures are combined to give likelihood ratios. The positive likelihood ratio (LR+) is a measure of how much the probability of having a tear increases in the presence of a positive test result. An LR+ of 1 indicates that the post-test probability is exactly the same as the pretest probability, and greater than 1 indicates that the probability has increased.34 The higher the LR+, the greater the probability increase.34 The negative likelihood ratio (LR−) is a measure of how much the probability of having a tear decreases in the absence of a positive test result.35 An LR− of below 1 indicates that the post-test probability has decreased, and the smaller the LR− the greater the decrease in probability.34

Where primary analysis was restricted to studies only showing a low risk of bias, the robustness of our results was tested through a sensitivity analysis.


Study identification

The initial database search produced 739 citations, with 6 further studies found through reference list searches. Only one unpublished trial was identified, but unfortunately they declined to allow it to be included within this review. After duplicates were removed, 43 were appropriate for full-text review (see figure 1 for study selection process).

Figure 1

Study selection process (CINAHL, Cumulative Index to Nursing and Allies Health Literature; AMED, The Allied and Complementary Medicine Database).

After full-text review, 26 studies were excluded due to participants not meeting the criteria (all due to participants not exclusively being 16 years or older);36–61 9 due to the study design not meeting the criteria;52 ,55 ,58–60 ,62–64 and 2 due to no outcome data being recorded.65 ,66 Some of the studies were excluded for more than one reason. Nine studies remained for full inclusion of this review.11 ,21 ,29–31 ,53 ,67–69

Characteristics of included studies

There were heterogeneous populations within the included studies with regard to age, duration of symptoms and sex (table 2). Age of participants of the studies varied widely, with the mean age ranging from 19.2 to 39 years of age.30 ,53 One study did not publish age ranges, but confirmed that all were adults via email.31 Five studies did not specify the mean duration of symptoms,11 ,21 ,53 ,67 ,69 and in the remaining four studies the symptom duration ranged from 1430 to 52 months.31 Heterogeneity also existed with regard to the description and specific manoeuvre of these special tests (table 1).

Study quality and bias

Percentage agreement between the two reviewers for the overall score using the QUADAS 2 tool was 75%, with a κ of κ=0.43, which is considered moderate to fair.25 ,26 No study received a score of high risk of bias in more than one of the four categories (figure 2).

Figure 2

Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS) 2 score breakdown.

The greatest risk of bias was with the index test and patient selection. There was a great amount of uncertainty with regard to reference testing, as most studies failed to explicitly say that this was carried out without knowledge of the index test results. There were often inappropriate exclusion,21 ,29 and inappropriate patient selection.30 Flow and timing issues were associated with poor documentation, as most failed to specify the length of time between index and reference test, or failed to confirm that all patients received the reference test and were included within the analysis. With all but one study,11 the greatest source of bias was with regard to verification bias, as all participants were referred for secondary care with knee symptoms and suspected meniscal tears. All nine studies were included within quantitative data synthesis.

Data synthesis

Nine studies, with 1234 participants, were included within this review. Apley's test18 was investigated by two studies,16 ,18 both of which failed to supply the raw 2×2 data. Thessaly at 5°11 was investigated by two studies,16 ,31 but only Konan et al53 supplied the raw 2×2 data. Therefore, there was insufficient data to pool results for Apley's and Thessaly at 5° into a meta-analysis. All nine studies were included in the meta-analysis to some degree (see table 3).

Table 3

Summary of sensitivity, specificity, likelihood ratios and heterogeneity

Three special tests were included in the meta-analysis: McMurray's,16 JLT10 and Thessaly at 20° knee flexion11 (table 3). McMurray's had a pooled sensitivity of 61% (95% CI 45% to 74%) and a pooled specificity of 84% (95% CI 69% to 92%). JLT had a pooled sensitivity of 83% (95% CI 73% to 90%) and a pooled specificity of 83% (95% CI 61% to 94%). Thessaly 20° had a pooled sensitivity of 75% (95% CI 53% to 89%) and a pooled specificity of 87% (95% CI 65% to 96%).

LR+ of 3.2, 4.0 and 5.6, and LR− of 0.52, 0.23 and 0.28 for McMurray's, JLT and Thessaly 20° (see online supplementary figures S1–3), respectively. LR+ of between 0.2 and 0.5 indicate only small shifts in probability post-test.34

Two of the tests, JLT and Thessaly 20°, had a high heterogeneity I2 score, with McMurray's having a moderate between-study heterogeneity I2 score. These data, coupled with the relatively low shifts in probability with the likelihood ratios,70 show that the three tests analysed will not accurately diagnose a torn meniscus.

Apley's test had a combined (medial and lateral) sensitivity of 84% and 20% and specificity of 79% and 84% (Rinonapoli et al21 and Karachalios et al,11 respectively). Thessaly 5° had a combined (medial and lateral) sensitivity of 35% and 65% and specificity of 89% and 82% (Konan et al53 and Karachalios et al,11 respectively).


The main objective of this systematic review was to synthesise the most up-to-date literature for diagnostic accuracy studies for meniscal tears of the knee for adults. The overall results of the three tests that were included within the meta-analysis (McMurray's, JLT and Thessaly 20°) indicate that they have poor accuracy. The pooled meta-analyses indicate that McMurray's will diagnose 61% of people presenting with a meniscal tear, Thessaly better at 75% and JLT best at 83%. False-positive findings are likely to be approximately 20% for all three tests. However, these results should be used with caution, due to the low number of included studies, poor quality of the studies and high levels of heterogeneity.

Apley's test, not included within the meta-analysis, was investigated by two studies. Performance varied considerably between the two; for example, sensitivity varied from 84%21 to 41%.11 One possible cause for this difference is that Karachalios et al11 included participants with no knee symptoms, and Rinonapoli et al21 suffered from verification bias, which could overestimate the sensitivity of a test, since prevalence within the sample size is larger.71

Combined lateral and medial sensitivity of the Thessaly 5° test varied from 35%53 to 65%.11 Reasons for this are unclear as patient selection and exclusion criteria were similar in both studies. One explanation could be the different reference standards used. Karachalios et al's11 study was the only study that used MRI. Karachalios et al11 also developed the Thessaly test, and therefore may have interpreted results differently or may have biased results inadvertently towards their own test. Karachalios et al11 gives the Thessaly test substantially higher sensitivity and specificity scores than all the other studies that investigated it and their study was one of those that failed to supply for raw 2×2 data for meta-analysis and data checking despite being contacted for these data.

One study subcategorised results by concomitant injuries (anterior cruciate ligament tears) and found that the accuracy of the individual tests (JTL, McMurray's and Thessaly) was lower.53

Limitations of included studies

For the meta-analysis of McMurray's, JLT and Thessaly 20°, moderate to high levels of heterogeneity existed. I2 scores were 51%, 83% and 94%, respectively. This reduced the robustness and usefulness of the pooled data. In general, wide variation in test procedures were applied to a wide variety of patients including different ages, sex ratios and duration of symptoms. Despite trying to limit heterogeneity by excluding studies with children, there still existed a wide range of ages and sexes. One study, for example, only had participants between the ages of 18 and 20,30 while the majority of studies had participants up to 40 and 50 years of age, with one study's eldest participant being 73 years old.29

There was also wide variation in how the special tests were performed (see table 1). Considering that there are poor levels of inter-rater reliability found with McMurray's test when examiners have agreed on the test procedure,40 ,72 it is plausible that this accounts for the majority of the heterogeneity.

Another possible cause of heterogeneity between included studies is the differing prevalence rates within each sample. Verification bias can exaggerate the prevalence of the disease within the sample and, as a consequence, overestimate the sensitivity and underestimate the specificity.71 The prevalence rates of meniscal tears (as confirmed by the reference test) across the three tests included within the meta-analysis varied hugely. For example, prevalence for McMurray's ranged from 35%67 to 88%.68 Prevalence for JLT ranged from 31%30 to 64%31 and Thessaly 20° ranged from 21%11 to 49%.53 ,69 Another limitation of the included studies is that all but one study used arthroscopy as the gold standard test, and it is thought that this also introduces verification bias.

Although all studies scored ‘low risk’ in total on the QUADAS 2 tool, no study received a score of low risk in all categories for the risk of bias. The main methodological errors were with index test description, no confirmation of blinding for the reference test, poor description of flow and timing, poor details given with regard to dropouts and the number of patients being included within the analysis.

Limitations of this review

An extensive literature search was carried out. To reduce risk of bias, two reviewers screened full texts independently for inclusion. An attempt was made to source unpublished trials; however, it is possible that not all publications were retrieved. Furthermore, language bias remained, since no attempt was made to source studies published in any other language than English.

Other limitations of this review are that dichotomised subgroup analysis by prevalence rates and clinical test definitions were not carried out. This may have reduced the between-study heterogeneity and improved robustness of the data synthesis. However, as many of the included studies lacked a clear definition and/or contradicted themselves, this was not considered possible.

Statistical pooling of data for sensitivity and specificity may not represent an accurate estimate, and clinicians should be aware of this and interpret with caution.

Our inter-rater agreement of the QUADAS 2 scores was only moderate to fair.25 ,26 However, as the main analysis was carried out with all included studies and the QUADAS 2 tool was not used to subcategorise studies, these difficulties would not have affected the conclusions of this review.

Comparison with other reviews

Despite updating the data with more narrow inclusion criteria, and including four more studies,21 ,31 ,53 ,69 our main findings differ very little from the previous four systematic reviews that included a meta-analysis9 ,12 ,13 ,15 (see online supplementary table S2).

The robustness of our results and conclusion are greater than those of the other systematic reviews. Our study was the first to use the QUADAS 2 tool, the first to perform a meta-analysis for Thessaly's test and the first to limit our search to adults only. Furthermore, the methodological quality of this review is guided by the PRISMA statement.22

Clinical and research implications

It is known that levels of pain do not correlate to the presence of meniscal tears of the knee,7 ,8 ,73 and that both peripheral and central sensitisation can be an underlying mechanism for people with chronic knee pain,74–76 and musculoskeletal pain in general.74 A recent systematic review showed that arthroscopy meniscectomy for degenerative meniscal tears works no better than sham/placebo surgery or versus conservative treatment.77 This starts to question the clinical need for such a diagnosis, since it is perceivable that ‘confirmed’ meniscal tears on MRI may be incidental, and therefore play no part in the development of pain or loss of function. Given that the prevalence of meniscal tears increases with age, and is almost double in people with radiological evidence of osteoarthritic changes,73 the accuracy may be different in different age groups, but the usefulness of this in relation to their management is questionable. The incidental MRI findings in the spine are almost as common as meniscal tears of the knee.73 ,78 It is thought that these incidental spinal findings may actually have adverse effects for the patient leading to long-term fear avoidance.79 No study has been found that looks at the implications of this for patients presenting with meniscal tears, but it is perceivable that false-positive findings could increase fear avoidance and limit restoration of normal knee function.

Primary clinicians must still be aware of the need to recognise if conservative treatment is not appropriate, such as in cases of a ‘locked knee’ or true giving way.80 A ‘mechanical’ block to a full range of movement or a ‘mechanically’ unstable knee would usually indicate an MRI and a surgical opinion.81 In these situations, the principal aim of the assessment is not to diagnose the specific tissue structure at fault, but to identify patients who are not appropriate for conservative rehabilitation in a timely manner, and the special tests would not support the clinical decision-making for this. It is likely that this would hold true for acute and degenerative tears.

In clinical practice, the tests are often not used in isolation alone, but are frequently used in combination with each other. Using the tests this way may produce a more accurate diagnosis; however, this cannot be confirmed or concluded from the current data. It is thought by the current authors that clinicians should abandon the tests that are based on the pathological model that lacks validity and reliability. Future research should focus on identifying which clinical characteristics might be useful either as prognostic indicators or have management implications for identifying conservative or surgical management.


The results of this systematic review indicate that the accuracy of McMurray's, Apley's, JLT and Thessaly to diagnose meniscal tears remains poor. This conclusion must be taken with caution since frequent methodological design flaws exist within the included studies, most studies suffered from various biases, and between-study heterogeneity makes pooled data unreliable.

The latest research surrounding meniscal tears within asymptomatic patients, and modern thinking with regard to pain and lack of efficacy for surgical treatment starts to challenge the need for such a diagnosis and use of special tests. Having a diagnosis of a meniscal tear is unlikely to help with the rehabilitation process and may induce fear avoidance.

This review cannot recommend the use of special tests for diagnosing meniscal tears. It is unclear if further research would considerably alter this conclusion.


The authors are grateful to Arthritis Research UK and the Chartered Society of Physiotherapy Charitable Trust for providing the funding for BES to complete this study.


View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Twitter Follow Benjamin Smith at @benedsmith

  • Contributors BES was responsible for conception and design, publication screening, acquisition of data, analysis and interpretation, as well as drafting and revising the manuscript. DT was responsible for publication screening, analysis, reviewing and revising the manuscript. AC was responsible for acquisition of data, as well as reviewing and revising the manuscript. MH was responsible for reviewing and revising the manuscript. All authors have read and approved the final manuscript.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.