Article Text

Download PDFPDF
Informatics hygiene to support reuse of routinely collected health care data for evidence-based practice
  1. Elmer V Bernstam1,2,
  2. Alejandro Araya1,
  3. Matthew Decaro1,
  4. Todd R Johnson1
  1. 1D Bradley McWilliams School of Biomedical Informatics, The University of Texas Health Science Center, Houston, Texas, USA
  2. 2Division of General Internal Medicine, Department of Internal Medicine, John P and Kathrine G McGovern Medical School, The University of Texas Health Science Center, Houston, Texas, USA
  1. Correspondence to Dr Elmer V Bernstam, D. Bradley McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA; Elmer.V.Bernstam{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Healthcare data are increasingly collected in transactional systems such as electronic health records (EHRs) and aggregated into analytical systems such as clinical data warehouses (figure 1). Consequently, there has been significant interest in reusing these data for evidence-based practice, research, biosurveillance, quality improvement and other purposes. Despite enthusiasm and multiple successful use cases,1 2 reuse of clinical data remains challenging.3 There are challenges related to data (ie, bits and bytes stored in computer systems), information (ie, meaning of the bits and bytes) and knowledge (ie, conclusions drawn based on analyses of information).4 5 For example, data may be corrupted due to a hardware or software malfunction (data problem). Alternatively, the data may be correct, but the meaning or context may be lost. One famous example of information corruption occurred when institutional EHR data were transferred into a commercial personal health record.6 In that case, billing codes were transmitted accurately, but interpreted as diagnoses; resulting in alerts for conditions that the patient did not have.

Figure 1

Data flow from data-generating process into transactional systems and databases to analytical systems via an extract-transform-load (ETL) process and on to data reuse (adapted from Bates et al and Lui et al17 18). CDW, Clinical Data Warehouse; EHR, electronic health record.

Informatics hygiene

Reuse of clinical data requires good data quality, or more accurately information quality, defined as ‘fitness for purpose’.7 8 Further, all healthcare data are collected and recorded in context. Context, defined as the assumptions built into the data, must be preserved for information to be fit for purpose. As a simple example, a patient’s weight may be 150, but one must know whether this was measured in kilograms or pounds. The units (kilograms vs pounds) are an important aspect of the context. Other important aspects might include scale calibration, …

View Full Text


  • Contributors EVB and TRJ conceptualised the paper. All authors contributed to data interpretation. EVB wrote the first draft of the manuscript. All authors provided substantive feedback on the manuscript and have read and approved the final version. EVB is the guarantor and attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding This work was supported by National Institutes of Health/National Center for Advancing Translational Sciences grant number UL1 TR003167 and the Reynolds and Reynolds Foundation.

  • Competing interests None declared.

  • Provenance and peer review Commissioned; externally peer reviewed.