Article Text
Statistics from Altmetric.com
Introduction
The peer-reviewed medical literature has grown exponentially. Through mobile devices, previously unavailable clinical resources have now become freely accessible worldwide. These include abstracts and some full-text articles of published studies. Through PubMed/MEDLINE, abstracts can now be conveniently accessed at the point of care.1 ,2 The universal accessibility of abstracts as a knowledge resource in many parts of the world with mobile Internet access has sparked interest in using journal abstracts as evidence-based sources when full-text articles are unavailable to clinicians especially in geographic areas with limited resources.1 ,3 Hence, it was recently proposed that a clinician-oriented web application formatted for mobile devices such as ‘Consensus Abstracts’ could be used to search and review multiple concurring abstracts in MEDLINE/PubMed to inform clinical decisions.1
The abstract is the most often read section of a research article.4 Journal abstracts and related bottom-line summaries are appealing because they are easy to read and give a quick digest of the article.1 ,4–8 As a short summary of the full-text article,8 the abstract incorporates a significant amount of information when written well and is frequently the only part perused by the reader.9 It is essential therefore that the abstract accurately reflects the contents of the full-text article.9 It should be a concise, clear and informative representation of the results or interpretation of the full text.10
Searching abstracts is convenient because it contains most of the article's relevant keywords.11 Many clinicians continue to depend on journal abstracts in seeking answers to clinical questions despite the increasing availability of full-text articles from online archives like PubMed Central12 and other similar repositories. Clinicians and other healthcare practitioners also rely exclusively on abstracts due to lack of time to read the full-text article, poor critical appraisal skills or limited access to full-text articles, especially in resource-constrained settings.5–7 13–15 But there are inherent problems to this approach because dependence on abstracts alone assumes that it is complete and accurate. Incorrect reporting of data in the abstract could bias the reader and lead to misinterpretation of research findings.16
Over the years, several efforts have been undertaken to make the abstract more informative and minimise inaccuracies between the abstract and the full text of an article. Most journals have adopted the use of structured abstracts. A structured abstract's distinctly labelled sections enhance comprehension of the article.17 Compared with the unstructured format, structured abstracts are of higher quality and are preferred for providing informative summaries of published studies.18–21 Advantages of using structured abstracts also include: expediting the peer review process prior to publication, assisting health professionals in finding clinically relevant journal articles and conducting more detailed literature searches.22 ,23 In 1987, a standard abstract format for articles reporting clinical studies was introduced.24 This structured abstract format was comparable with the introduction, methods, results and discussion (IMRAD) format.25 In effect, IMRAD has now become a de facto standard for the full text of scientific journal articles and is also commonly used as a structure for journal article abstracts since it follows the course of scientific discovery.26–28 The Consolidated Standards of Reporting Trials (CONSORT) group and the International Committee of Medical Journal Editors (ICMJE) recommend the use of structured abstracts although there may be differences in the precise format from one journal to the next.29 ,30 This has led to a steady increase in the number of journal articles with structured abstracts in MEDLINE citations. A recent study by the National Library of Medicine reported that the proportion of structured abstracts added yearly to MEDLINE increased from 0.4% in 1991 to 23.0% in 2008 and the number of journals publishing structured abstracts expanded from 78 in 1989–1991 to 3166 in 1992–2006 time periods.31
However, even with structured abstracts, Pitkin et al9 observed that it was common for medical journals to contain data in abstracts that were inconsistent with or absent from the full-text article. They found a persistence of inaccurate data in the abstracts (18–68%) in six general medical journals. Another study discovered that in one medical specialty journal, at least 25% of manuscripts that were returned after revision, comprised of data in the abstracts that were unverifiable in the full text.32 A study by Boutron et al33 found ‘spin’ (defined as reporting strategies that emphasise the beneficial effect of the experimental treatment even with statistically non-significant primary outcomes or divert the reader from results with no statistical significance) in the results (37.5%) and conclusions (58.3%) sections of abstracts of randomised controlled trial (RCT) reports published in 2006. A study by Yavchitz et al16 recently identified ‘spin’ in 47% of press releases and media coverage with ‘spin’ found in abstract conclusions of RCTs as the only factor associated. This is a significant finding because clinicians often rely their initial assessment of a clinical trial on the abstract,29 and in some geographic areas, the abstract may be all that clinicians will have access to.
In this study, we aimed to determine the accuracy of data contained in structured abstracts and their usefulness for guiding clinical decisions, by attempting to answer the following questions: (1) Do the data in the structured abstract accurately reflect that in the full text? Are the numbers reported in the structured abstract the same as in the full text? In this comparison, we concentrated on the numerical data. (2) If there is a discrepancy in the data between the structured abstract and full text, is it clinically significant? (3) Would the structured abstract alone be an adequate guide for clinical decision-making? For items 2 and 3, we tried to evaluate whether data discrepancies found between a structured abstract and its corresponding full text affected the conclusion and clinical ‘bottom line’ of the published research.
Methods
Between 11 February and 14 March, 2011, 60 articles listed as the ‘most read’, ‘most cited’ or ‘most e-mailed’ were selected from six clinical journals34: American Journal of Emergency Medicine, British Medical Journal, Journal of the American Medical Association, Lancet, New England Journal of Medicine and Obstetrics and Gynaecology. These journals were chosen because they were highly regarded34 ,35 and ‘most read’ in the medical community.
Using the IMRAD format as a model, we extracted and stored the data in an Excel spreadsheet. If a category did not exactly match the IMRAD format, we assigned it to a cell that closely matched the category. We independently examined the structured abstract for numerical data and verified its accuracy from the corresponding section in the body of the article, including tables and figures. Differences in assessments were resolved through consensus. Structured abstracts were considered inaccurate if they contained data that were either inconsistent or missing in the full-text article. We did not consider rounded off values in the abstract as discrepancies if the precise value was found in the full text, including tables and figures and if the rounding was done correctly and consistently. We then decided if a discrepancy was: (1) clinically significant or (2) clinically not significant. We defined clinically significant discrepancy as data that would require a clinical response or action if encountered in a clinical setting; was misleading; or would result in any misinterpretation of the conclusions of the study. We also considered data similar to ‘spin’, as described in a previous work,16 as clinically significant discrepancy such as misinterpretation of statistically significant results as not significant intervention effects, either neutral or non-beneficial or misinterpretation of statistically not significant results as showing comparable effectiveness or equivalence of treatment.
The proportions of articles containing deficiencies were compared across journals using the Fisher's exact test. We attempted to determine whether there was a significant relationship between the journal that published the article and whether an article contained discrepancies. We also calculated the Fleiss κ coefficient to assess the degree of agreement among the three evaluators. Statistical analyses were performed using R Statistical Software V.2.15.2.36
Results
The six journals examined were high impact journals. At the time of review, their median (IQR) impact factor was 37.99 (4.04–42.033).
Table 1 shows a comparison of the articles with and without discrepancies for each journal. Data inaccuracy was observed in 53.33% of the articles. Analysis showed that there was no significant relationship between the journal that published the article and whether an article contained discrepancies (Fisher’s exact p value=0.3405). The Fleiss κ coefficient was 0.869 (z=12, p=0.001), indicating high agreement37 among the three evaluators.
Table 2 shows the proportion of articles with discrepancies between the structured abstract and the full text by IMRAD sections. The proportion of sections of the structured abstract with discrepancies ranged from 3.33% (Discussion) to 45.00% (Results). Compared to other areas, the Results section showed the most number of discrepancies although these were mostly clinically not significant. In only one article (1.67%) was a clinically significant difference found between the structured abstract and full text. This was observed in the Results section that contained the highest percentage of inaccuracies.
Table 3 shows the list of discrepancies found between the structured abstracts and the full texts. Of the 32 articles with discrepancies, we found seven articles (11.67%) with discrepancies in at least two different IMRAD sections. The two most common discrepancies found were clinically not significant. These were numbers or percentages in the structured abstract not matching with those in the full text (11.67%) and numerical data or calculations found in the abstract but not explicitly mentioned in the full text (40%). The discrepancy pertaining to mismatched numbers or percentages included: mismatched p values, inconsistent range of values and mismatched numbers or percentages of variables. However, the mismatched values did not adversely affect the final interpretation of results and were still considered as clinically not significant. For the discrepancy related to numerical data or calculations mentioned in the abstract but not directly mentioned in the full text, we performed recalculations and observed that the numbers, percentages or calculations found in the structured abstract were consistent with the data presented in the full text had they been calculated.
Discussion
The proportion of discrepancies we found between the structured abstracts and full texts in the six journals examined was quite large (53.33%), but consistent with previous observations by Taddio et al18 and Wong et al38 For us, this was surprising considering that these were highly regarded medical journals with a high median impact factor. The discrepancies found were mostly due to inconsistencies in numerical data between the abstract and those in the full text, but were regarded as clinically not significant. We deemed these as minor discrepancies and could be explained either because of a mismatch in data numbers or percentages or missing statistical calculations in the full text but found in the abstract. Recalculations proved that the numbers presented in the structured abstract were indeed correct and consistent with the data presented in the full text. There were no discrepancies attributed to data found only in the abstract. The discrepancies found were not considered to be misleading nor would result in any misinterpretation of the findings on the part of the reader. Overall, we observed that the structured abstracts were still appropriate surrogates10 of full-text articles.
To practice evidence-based medicine (EBM), clinicians need to critically appraise full-text articles to guide their clinical decision-making.1 Applying EBM in clinical practice encourages the use of timely and relevant information to complement one's expertise.39 Clinicians look for significant relevant findings in treatment protocols, diagnostic examinations and outcomes of certain interventions. In this study, most of the discrepancies found between the structured abstract and the full text were not significant clinically. The sole article that showed a clinically significant discrepancy was a misinterpretation of a significant intervention effect, explicitly stated as ‘HbA1c was significantly lower (p=0.008)’ in the Results section of the full-text article but was shown as ‘no difference in HbA1c among agents on the intensive group’ in the abstract. However, this discrepancy did not have any effect in the article's conclusion and overall interpretation. There were no clinically significant discrepancies observed in other sections of the article.
Using medical literature to guide clinical decisions has been shown to be effective.40–44 To be most useful, current clinical evidence needs to be conveniently accessible.45 When provided access to either abstracts or full-text articles, improvements in clinical decisions have been observed.3 With a high percentage of clinicians (69%) using abstracts only in seeking answers to clinical questions,3 it is crucial that a high standard of quality for published abstracts is present. A 10-year follow-up study found that using structured abstracts, regardless of precise formats, has helped maintain a high quality of available abstracts.38
Our study had several limitations. The journal articles selected may not be representative of all scientific articles in the medical literature. The small sample of articles examined limits the generalisability of the findings in this study. We also only evaluated the utility of structured abstracts in guiding clinical decisions. We did not include research articles with unstructured abstracts. More reviewers would enhance the reliability of this type of evaluation. Further analysis with a broader sample of medical journals to include articles with unstructured abstracts is needed.
In this study, we still found a large proportion of mostly clinically not significant inaccuracies between the abstracts and full texts consistent with previous reviews. These discrepancies did not seem to affect the clinical bottom line. This study may provide some support that evidence derived from structured abstracts may be another reliable resource for clinicians. Nevertheless, the high proportion of inaccuracies suggests that there is still room for improvement. Greater attention needs to be devoted by authors, reviewers and editors in improving the quality of abstracts. Clinicians must also remember that one abstract alone may not be enough in making clinical decisions46 ,47; hence, the need to review many, perhaps using tools that can facilitate such review such as the web application ‘Consensus Abstracts’.1
References
Footnotes
-
Limitations The journal articles selected may not be representative of all scientific articles in the medical literature. The small sample of articles examined limits the generalisability of the findings in this study. We also only evaluated the utility of structured abstracts in guiding clinical decisions. We did not include research articles with unstructured abstracts. More reviewers would enhance the reliability of this type of evaluation. Further analysis with a broader sample of medical journals to include articles with unstructured abstracts is needed.
-
Contributors PF designed the study, participated in analysis of the results and helped draft the manuscript. AG participated in the study design and analysis of the results. RFS conducted statistical and qualitative analysis of the results and drafted the manuscript.
-
Funding This research was supported by the Intramural Research Programme of the National Institutes of Health (NIH), National Library of Medicine (NLM) and Lister Hill National Center for Biomedical Communications (LHNCBC). This research was also supported in part by an appointment to the NLM Research Participation Programme, administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the US Department of Energy (DoE) and the NLM.
-
Disclaimer The views and opinions of the authors expressed herein do not necessarily state or reflect those of the National Library of Medicine, National Institutes of Health or the US Department of Health and Human Services.
-
Competing interests None.