Article Text

Download PDFPDF

7 Papers as living documents: using literate programming to produce fully transparent, reproducible research manuscripts
  1. Matthew Parkes1,2
  1. 1Arthritis Research UK Centre for Epidemiology, University of Manchester, Manchester, UK
  2. 2NIHR Manchester Musculoskeletal Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK


Objectives The existing model of academic writing and publishing in medical research has not strayed far from its correspondence-based letter writing origins. Authors frequently complain that this system is out of date and restrictive. Currently, articles lack transparency - it is difficult to fully and concisely explain complex analyses without presenting full code and dataset structure. Analyses are also fixed and limited - should readers wish to test assumptions, conduct additional tests, or amalgamate study data with other datasets, they are limited to that which is published, or to contacting authors for datasets and code (often unsuccessfully).

Literate programming is an approach which allows manuscripts to be more transparent, reproducible, and interactive. There are free, open-source tools (e.g. RMarkdown and Pweave) which allow entire research manuscripts to be generated end-to-end in one continuous interwoven block, with live code which runs upon opening the document, which can be compiled into .pdf or HTML.

Method This abstract demonstrates how manuscripts can be written using the literate programming model. Using a cleaned, publicly available dataset taken from Vickers’ 2006 paper (Vickers Trials 2006, 7:15), we use the knitr, rmarkdown, and rticles packages in RStudio, to write a fully transparent example trial results paper. This paper is compiled into .pdf format using the PLOS LaTeX template (Public Library of Science 2018;, a commonly used manuscript template which is under Creative Commons licence.

Results The resulting . pdf manuscript has a format familiar to readers, yet has increased transparency and interactivity, as the code and dataset are distributed with, and are integrated in, the manuscript. The script can be therefore be downloaded and altered to test assumptions, derive values not presented in the paper (to be used in meta-analyses, for example), and reproduce results.

The compiled paper will be available at the author’s GitHub repository:

Conclusions These ‘living papers’, whereby code, data, and interpretation are all interwoven into one live-compiled document has numerous applications for meta-level analyses, and a significantly greater level of transparency. They circumvent the limitations discussed with the current static publishing model, allowing readers to interact and unpick papers in a way that is currently not possible with manuscripts that are divorced from their datasets and analysis code.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.