Duke University, Department of Biostatistics & Bioinformatics
2024-08-29
Literate Programming
First introduced by Donald Knuth (“The Art of Computer Programming”) in 1984
Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.
Literate Programming: Enhances traditional software development by embedding code in explanatory essays and encourages treating the act of development as one of communication with future maintainers
Reproducible Research: Embeds executable code in research reports and publications, with the aim of allowing readers to re-run the analyses described.
Code not paper is the scholarship
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and complete set of instructions which generated the figures.
Research Compendium
Encapsulates the actual work, not just an abridged version;
Allows different levels of detail in different renderings;
Easy to re-run by anyone;
Provides explicit computational details, enabling others to adapt and extend the reported computational methods;
Enables programmatic construction and clear provenance of plots and tables;
Part of Figure 1, Gentleman and Temple Lang (2007)
Reproducible research lens
Embeds executable code in research reports and publications, with the aim of allowing readers to re-run the analyses described. (Schulte et al (2012))
Important concepts:
Ties together narrative (the “why”), code that implements it, and the results from running the code.
Can be executed (“executable paper”), in its original or modified form.
Provenance of tables and charts is clear and verifiable.
The practice of copying values for tables and plots for figures into a document breaks the provenance chain.
Provenance: Information about entities, activities, and people involved in producing data or other results, which are necessary to assess their quality, reliability or trustworthiness.1