Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and, for quantitative synthesis, to collect the necessary data to carry out meta-analysis. In systematic reviews, information about the included studies will also be required to conduct risk of bias assessments, but these data are not the focus of this article.
Following good practice when extracting data will help make the process efficient and reduce the risk of errors and bias. Failure to follow good practice risks basing the analysis on poor quality data, and therefore providing poor quality inputs, which will result in poor quality outputs, with unreliable conclusions and invalid study findings. In computer science, this is known as ‘garbage in, garbage out’ or ‘rubbish in, rubbish out’. Furthermore, providing insufficient information about the included studies for readers to be able to assess the generalisability of the findings from a systematic review will undermine the value of the pooled analysis. Such failures will cause your systematic review and meta-analysis to be less useful than it ought to be.
Some guidelines for data extraction are formal, including those described in the Cochrane Handbook for Systematic Reviews of Interventions,1 the Cochrane Handbook for Diagnostic Test Accuracy Reviews,2 3 the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines for systematic reviews and their protocols4–7 and other sources,8 9, formal guidelines are complemented with informal advice in the form of examples and videos on how to avoid possible pitfalls and guidance on how to carry out data extraction more efficiently.10–12
Guidelines for data extraction involve recommendations for:
Ideally, at least two reviewers should extract data independently,1 2 9–12 particularly outcome data,1 as data extraction by only one person can generate errors.1 13 Data will be extracted from the same sources into identical data extraction forms. If time or resources prevent independent dual extraction, one reviewer should extract the full data and another should independently check the extracted data for both accuracy and completeness.8 In rapid or restricted reviews, an acceptable level of verification of the data extraction by the first reviewer may be achieved by a second reviewer extracting a random sample of data.14 Then before comparing the extracted data and seeking a consensus, the extent to which coded (categorical) data extracted by two different reviewers is consistent and may be measured using kappa statistics,1 2 12 15 or Fleis kappa statistics, when more than two people have extracted the data.16 Formal comparisons are not routine in Cochrane Reviews, and the Cochrane Handbook recommends that if agreement is to be formally assessed, it should focus only on key outcomes or risk of bias assessments.1
Reviewers should anticipate a number of problems that may arise during data extraction. The study protocol should prespecify how these problems will be addressed:
Disagreement between reviewers when extracting data. Some differences in extracted data are simply due to human error, and such conflicts can be easily resolved. Conflicts and questions about clinical issues, about which data to extract, or whether the relevant data have been reported can be addressed by involving both clinicians and methodologists in data extraction.3 12 The protocol should set out the strategy for resolving disagreements between reviewers, using consensus and, if necessary, arbitration by another reviewer. If arbitration fails, the study authors should be contacted for clarification. If that is unsuccessful, the disagreement should be documented and reported.1 6 7
Outcome data being reported in different ways, which are not necessarily suitable for meta-analysis. Many resources are available for helping with data extraction, involving various methods and equations to transform reported data or make estimates.1 2 10 The protocol may acknowledge this by stating that any estimates made and their justification will be documented and reported.
Including estimates and alternative data. It is also important to anticipate the roles that extracted data will play in the analysis. Studies should be highlighted when multiple sets of outcome data are reported or when estimates have been made in extracting outcome data.9 Clearly identifying these studies during the data extraction phase will ensure that the studies can be quickly identified later, during the data analysis phase.
Risk of double counting patients. Some studies involve multiple reports, but the study should be the unit of interest.1 Tracking down multiple reports and ensuring that patients are not double-counted may require good detective skills.
Risk of human error, inconsistency and subjectivity when extracting data. The protocol should state whether data extraction was independent and carried out in duplicate, if a standardised data extraction form was used, and whether it was piloted. The protocol should also state any special instruction, for example, only extracting prespecified eligibility criteria.1 2 6–9 11 12
Ambiguous or incomplete data. Authors should be contacted to seek clarification about data and make enquiries about the availability of unreported data.1 2 9 The process of confirming and obtaining data from authors should be prespecified6 7 including the number of attempts that will be made to make contact, who will be contacted (eg, the first author), and what form the data request will take. Asking for data that are likely to be readily available will reduce the risk of authors offering data with preconditions.
Extracting the right amount of data. Time and resources are wasted extracting data that will not be analysed, such as the language of the publication and the journal name when other extracted data (first author, title and year) adequately identify the publication. The aim of the systematic review will determine which study characteristics are extracted.16 For example, if the prevalence of a disease is important and is known to vary across cities, the country and city should be extracted. Any assumptions and simplifications should be listed in the protocol.6 7 The protocol should allow some flexibility for alternative analyses by not over-aggregating data, for example, collecting data on smoking status in categories ‘smoker/ex-smoker/never smoked’ instead of ‘smoker/non-smoker’.11
Guidelines recommend that the process of extracting data should be well organised. This involves having a clear plan, which should feature in the protocol, stating who will extract the data, the actual data that will be extracted, details about the use, development, piloting of a standardised data extraction form1 6–9 and having good data management procedures,10 including backing up files frequently.11 Standardised data extraction forms can provide consistency in a systematic review, while at the same time reducing biases and improving validity and reliability. It may be possible to reuse a form from another review.12 It is recommended that the data extraction form is piloted and that reviewers receive training in advance1 2 12 and instructions should be given with extraction forms (eg, about codes and definitions used in the form) to reduce subjectivity and to ensure consistency.1 2 12 It is recommended that instructions be integrated into the extraction form, so that they are seen each time data are extracted, rather than having instructions in a separate instruction document, which may be ignored or forgotten.2 Data extraction forms may be paper based or electronic or involve sophisticated data systems. Each approach will have advantages and disadvantages.1 11 17 For example, using a paper-based form does not require internet access or software skills, but using an electronic extraction form facilitates data analysis. Data systems, while costly, can provide online data storage and automated comparisons between data that have been independently extracted.
Data extraction procedures and preanalysis calculations should be well documented9 10 and based on ‘good bookkeeping’.5 10 Having good documentation supports accurate reporting, transparency and the ability to scrutinise and replicate the analysis. Reporting guidelines for systematic reviews are provided by PRISMA,4 5 and these correspond to the set of PRISMA guidelines for protocols of systematic reviews.6 7 In cases where data are derived from multiple reports, documenting the source of each data item will facilitate the process of resolving disagreements with other reviewers, by enabling the source of conflict to be quickly identified.10
Data extraction is both time consuming and error-prone, and automation of data extraction is still in its infancy.1 18 Following both formal and informal guidelines for good practice in data extraction (table 1) will make the process efficient and reduce the risk of errors and bias when extracting data. This will contribute towards ensuring that systematic reviews and meta-analyses are carried out to a high standard.
Contributors KST and KRM conceived the idea of the series of which this is one part. KST wrote the first draft of the manuscript. All authors revised the manuscript and agreed the final version.
Funding This research was supported by the National Institute for Health Research Applied Research Collaboration Oxford and Thames Valley at Oxford Health NHS Foundation Trust.
Disclaimer The views expressed in this publication are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.
Competing interests KRM and JKA were associate editors of BMJ Evidence Medicine at the time of submission.
Provenance and peer review Commissioned; internally peer reviewed.