4 Literate Programming

Warning

This chapter is in an early draft form, and is incomplete.

Traditionally, if you needed to do science on a computer, you ran your analyses in one program, and then copied over any outputs you wanted into a separate document where you wrote your results.

The separation of analysis and communication creates two main problems:

Copy errors: it is easy to forget exactly how or when results where generated when there are in a separate place on your computer. Further more, if you update the analyses half way through writing, you need to remember to copy over any of the new outputs into the written document and it’s all to easy to miss something.
Discourages reproducibility: Typically just the final written document will be shared with other scientists, and not the underlying analyses that generated the results. Although the data/code repositories mentioned in the precious section can help to prevent this, it’s an extra step.

One solution is an approach to programming with the fancy name of literate programming which just means combining writing and code in the same document. Then you render the document, which runs the code and generates the output of any analyses, and embed it within the text.

Having the analyses next to your written interpretation makes it easy for us to verify that we are writing about the analysis we think we are! It also means that your results will automatically be updated to the latest version whenever you compile the document. And finally, you can share this document and (theoretically) any other scientist should be able to compile it and hopefully understand the relationship between your analysis and results more easily.

Literate programming is not a new idea. It was first suggested by a prominent computer scientist called Donald Knuth in 1984(Knuth 1984). It didn’t immediately take-off, because the early implementations were clunky and significantly harder to use than the usual method of writing code by itself in it’s own file.

However, literate programming has taken strides forwards in the past decade. The R programming language has been a major part of this development, and includes one of the most developed systems for literate programming: RMarkdown, which appeared in 2014.

Prior to RMarkdown, we typically wrote R code in source files that could only contain code (you may encounter these files: they end with the .R extension). RMarkdown is an R package (i.e. an extension to the R language, see Section 3.9) that allows sections of R code to be included in text documents.

Recently, an alternative to RMarkdown has appeared: Quarto.

Quarto is an attempt to expand RMarkdown to support other programming languages and types of input document. As we will see in the rest of this chapter, RMarkdown and Quarto are extremely similar, and you can achieve almost identical results with either system.

This book is written using literate programming, using a combination of RMarkdown and Quarto files.

In the next few sections, we will go over how to create and edit RMarkdown and Quarto files. Most of the advice will be the same for each file type. Where the two systems differ, I will highlight this and show you both approaches. I will end the chapter with a more advanced discussion of the differences, in case you are planning a larger project and need to pick between the two (spoiler: it doesn’t matter too much).

4.1 RMarkdown and Quarto files

RMarkdown files are text files that end in the file extension .Rmd. For example, a file called my_homework.Rmd in an RMarkdown file. Quarto files are similar, but end with the extension .qmd.

We can create both types of files from within RStudio. Click on the File menu in the top left of RStudio¹, then pick the New File option, and then select either:

RMarkdown… to create an RMarkdown file, or
Quarto document… to create a Quarto file.

When you create new files in this way, RStudio will include some default contents to give an example of how each type of file works. (We will also see how to create empty files, with no default contents.)

Since the results are slightly different, let’s look at each filetype in turn.

4.1.0.1 New RMarkdown files

When you create a new RMarkdown file, you will be presented with the following wizard:

Figure: RMarkdown creation wizard.

You can accept the defaults in the wizard as shown above, and modify the rest later. Or you can enter a title, author name, and/or date now if you prefer. Click OK to create the file (which will contain some default contents).

Figure: Example default RMarkdown file.

To create an empty file without any default contents, click the Create Empty Document button instead.

The file will automatically be opened in the Editor pane of RStudio.

4.1.0.2 New Quarto files

Quarto files come with a similar set-up wizard:

Figure: Quarto creation wizard.

Again, it is fine to just click OK with the defaults, but you can put in a title and author name if you like.

I also like to deselect the Editor checkbox, but you can easily turn the visual editor on or off once you have created the file (Section 4.10).

If you would like an empty Quarto document without any default contents, click the Create Empty Document button instead of OK.

The file will automatically be opened in the Editor pane of RStudio.

Figure: Side-by-side examples of default RMarkdown & Quarto files, with key features highlighted.

4.2 Rendering to an output

Note that the files you create are not saved when you create them. To save them, make sure the file is the current one open in the editor pane and either press the Save icon in the top toolbar, or File > Save in the top menu, or click anywhere in the open file and then press the Ctrl-S keys at the same time.

To render the saved RMarkdown/Quarto files to an output:

In RMarkdown, click the Knit button at the top of the open file in the Editor pane.
In Quarto, click the Render button.

In each case, the default output filetype will be created, such as a PDF or a HTML file (a webpage). Later on we’ll see change the type of output file you render to in Section 4.6.

The rendering process will be shown in the bottom-right pane of RStudio while it happens. Sometimes a problem will occur which prevents the file from rendering. Common reasons for this are covered in the Frequently Encountered Problems section: ?sec-common-render-issues.

Once rendering is finished, the file should open up automatically. A PDF will typically open in your computer’s default PDF viewer. An HTML output file will open up in your browser. If the output file doesn’t open automatically, you can open it manually by clicking on the file in the Files tab in the bottom-right pane of RStudio.

4.3 Text

If you look at Fig. X (rmd/qmd side-by-side), you can see that both Quarto and RMarkdown files are divided into two sections. At the top of each file is a header section, which begins and ends with a line of three dashes: ---. This is where we can specify settings for the file, which we will learn more about in Section 4.6.

Below that is the main section of the document. Here we can add text that will appear in our output when we render.

We can try adding text to a document to see what effect it will have on the output. On the left of Fig. X, we can see an RMarkdown and Quarto files with a brief header section (the same contents should work just as well in a Quarto file. Below that is a line of text that says Hello, world!. On the right are the outputs that we get when we render these files to HTML.

In the outputs, the text Hello, world! has been formatted as a paragraph of text.

Figure: Side-by-side examples of RMarkdown and output, and Quarto and output (hello world).

What about if we want to format contents as something other than regular paragraph text? For example, you might want section headings (like this book contains), or italics or bold text. In a program like Microsoft Word or Google Docs, you can click a button to apply these formatting styles.

RMarkdown and Quarto files cannot contain this type of embedded formatting, because they are just text documents (i.e. the same thing as a file with a .txt extension). But we can specify that we want their output to contain formatting. We do this by including symbols in the input file that will be interpreted as formatting instructions when we render.

For example, we can create italicized text by putting the words inside a pair of asterisks *italicized text* or a pair of underscores _italicized_text_. The asterisks or underscores will not appear in the output, but the output will instead have the formatting you have specified.

This method of applying formatting in a text document is called Markdown. You may have encountered something similar online, as websites like Reddit and Discord allow you to use symbols like this to apply formatting. It is also where RMarkdown files got their name from.

There are a lot of different symbols we can use to format text with Markdown.

As mentioned above, we can use a pair of asterisks or a pair of underscores to italicize text. In the same way, we can use double asterisks or underscores to get bold text, and three to make text both italicized and bold.

Write this in your file	To get this when you render
`italics or _italics_`	italics or italics

Write this in your file	To get this when you render
	Regular output

*italics* or _italics_

**bold** or __bold__

_**bold and italicized**_

italics or italics

bold or bold

bold and italicized

We create section headings by putting the hashtag/pound symbol # at the start of a line that contains the text of a heading. Two hashtags create a sub-heading (e.g. for a subsection), three hashtags create a sub-sub-heading, and so on. Note that there should be no spaces before the hashtags (it should be the first character on a line), but there should be a space between the hashtags and the first character of the text of the heading.

# Heading

Heading

## Sub-heading

Sub-heading

### Sub-sub-heading

Sub-sub-heading

Clickable links to webpages are created by putting the text of the link (what you will see) in square brackets, immediately followed by the URL (web address) of the webpage in parentheses. For example, here is a link to the Google homepage:

[Link text](https://www.google.com)

Link text

We can also create two types of list: unordered (i.e. bullet points) and ordered.

An unordered bullet point list is created by starting a line with an asterisk symbol * and then one or more spaces before the text of the item.

An ordered list is created by starting the line with a number followed by a period, and then one or more spaces before the text. You can also use roman numerals or letters of the alphabet to use different ordering symbols.

* Bullet point item
* Second item
  * Nested list, indented by spaces
  * Another item
    - Another nested list, 
      with a different marker

becomes

Bullet point item
Second item
- Nested list, indented by spaces
- Another item
  - Another nested list, using a hyphen marker

1. An ordered list
2. Second item
   i. A nested list that uses 
       roman numerals
      a. A nested-nested list
   ii. Back to the roman numerals
   
       Second paragraph of item (ii).

becomes

An ordered list
Second item
1. A nested list that uses roman numerals
  1. A nested-nested list
2. Back to the roman numerals.
  
  Second paragraph of item (ii).

One important thing to note is that the number of spaces at the start of a list item matters.

Nested list items need to be preceded by the number of spaces to make them flush with the first character of the parent item’s main text.

If a list item has multiple paragraphs, then (1) these paragraphs need to be separated by a blank line, and (2) all paragraphs after the first need to be indented with enough spaces so that their first character is flush with the first character of the first paragraph’s main text.

The importance of lines and spaces

In general, text on consecutive lines will be bundled up into one line when you render Markdown into another format. For example,

This paragraph is written on two consecutive lines.
It will be compiled into a single paragraph because there is no blank line between them.

This will be a second paragraph, since there is a blank link between it and the previous text.

This can cause problems if you forget to put a blank line where you need it, e.g. before a heading line. For example,

some regular text
### Important Heading

becomes

some regular text ### Important Heading {.unnumbered .unlisted}

Spaces can also affect formatting.

If you forget a space between a heading’s hashtags and its text, you will render it as normal text (with hashtags).

###Heading Text

becomes

###Heading Text

The same thing can happen if you forget a space between a list symbol and the text of that list item.

Putting four spaces at the start of a line will cause it to be rendered like code:

    Indented by four spaces.

becomes

Indented by four spaces.

This has its uses if you need to render something like code, but is not appropriate for regular text. It is not only harder to read than properly formatted text, but it also will not be wrapped if the line is long (i.e. a long line will just run off the right side of the page instead of continuing onto the next line.)

This brings us to the most important rule of writing and rendering Markdown based files:

Warning

Always proofread the rendered output and check that the formatting is correct!

You can’t always tell how the rendered output will look just from the contents of your RMarkdown/Quarto file. It’s also very easy to make typos that will cause unintended formatting.

4.4 Code chunks

The great power of RMarkdown and Quarto documents is that we can also include sections of code within the document. These sections are called code chunks.

A code chunk begins and ends with a line of three backticks: ```. Note that a backtick is not a single quotation mark. On a US style keyboard it is the sumbol from the key in the top-left part of the keyboard.

For example:

```
code goes here
```

Backticks alone will format the code chunk like code, but you will not be able to run the code.

To indicate the programming language and thus to enable the code to be run, we need to include curly braces around the programming languages symbol at the end of the opening line of backticks. For example, this is an R code chunk:

```{r}
code goes here
```

A major reason for using RStudio to write Rmd/Qmd documents is that you can run code chunks from within the file editor tab. Either click on the green “Play” triangle in the top right, or click within the code chunk so that the cursor is flashing on a line of code and then simulatenously press the following keys: Ctrl + Shift + R (Windows/Linux) or Cmd + Option + R (Mac).

When you run a code chunk interactively from within the document, the code is sent to the Console to be executed, but any output is then displayed back below the code chunk. This allows us to preview what the output of a code chunk will look like without having to knit the entire document (which is a lot slower).

Because code in code chunks is sent to the Console, it runs in the same place (called an Environment) as non-code chunk code that you type directly into the Console itself. This also means that all code chunks share this same environment, and the results of one code chunk are available to another.

For example, if you create a R variable in one code chunk, a different code chunk can then use that variable.

There’s just one problem: order matters. You have to run the code chunks in order. I.e. you would need to run the code chunk that creates the variable before you run the code chunk that references that variable. Otherwise the variable doesn’t exist yet in the Environment and you will get an error.

There’s a few ways that new R programmers run into errors due to this code chunk execution in the shared environment:

You mix up the order of the code chunks within your file. You can run them interactively out of order, and so avoid the error, but when you try to knit you get an error. (This is because the knitting process runs the code chunks from scratch (in a new Environment) in order that they exist in the file.)
You delete a code chunk that creates something another (not deleted) code chunk needs. That thing will stick around in your RStudio Environment, and so be availabe to any code chunks running interactively, but when you knit you get the same error as in the previous point (for the same reason).
You restart RStudio after a break and your most recently written code chunk doesn’t work. This is because anything R creates in the Environment is temporary. It will only last for as long as R or RStudio is running on the computer. When you restart RStudio and reopen a Rmd/Qmd file, you will need to re-run any code chunks whose output you need. (It’s often easiest to just rerun all the code chunks, which you can do by clicking the “Run” button in the top right of the file editor pane and then selecting the last option to “Run All”.)

4.5 Inline code

We can also include code within paragraphs of text. Let’s say you calculated some important number in your analyses, and you want to discuss that number. You could type in the number by hand, but this is prone to errors where the calculated number changes for some reason (maybe you adjust the analysis’s code, for example) but then you forget to update the number in the text.

We can instead just include the variable’s value directly in the text using inline code, for example:

The answer to life, the universe, and everything is `r variable_name`.

Here, we use a single pair of backticks around the code to indicate where it starts an ends, and an r immediately after the first backtick to indicate that this is R code. When the document is rendered, the value assigned to the variable called variable_name will be rendered within the sentence instead of the code. For example (if variable_name = 42),

My results showed that the answer to life, the universe, and everything is 42.

We can put more complicated expressions within the inline code backticks. For example, you could put a calculation to get the same result:

My results showed that the answer to life, the universe, and everything is `r 40 + 2`.

or we could get the date at the time that the document is rendered by using the Sys.Date() function that is part of the R programming language:

This document was created on `r Sys.Date()`.

which would be rendered as:

This document was created on 2025-04-16.

4.6 The header section

We may wish to customize how our input document is rendered to an output file, instead of using the defaults. For example, you may wish to change the font size, or make all website links blue.

In both RMarkdown and Quarto documents we do this in the header section. The general structure of a header section is similar in both cases, but the details can vary.

The header section occurs before the actual contents of the document (i.e. before the text and code), and begins and ends with a line of three dashes: ---. You may have already seen a header section if you created a Rmd or Qmd in RStudio, because the example files that RStudio create come with simple header sections.

A simple example of a header section might be:

---
title: "Computing and Data for Scientists"
author: "Dominic White"
---

The header section is written in another markup language called YAML.² YAML uses a different format to the Markdown that we used for writing regular text. YAML is much better suited to writing a list of options, such as the rendering options you want to apply to your output file.

There are a few things to note about header sections:

The example header section above would work in either an RMarkdown or Quarto file, because both have a title and author option. However, many of the options are different, either because they only exist in one of the two file types or because they are called different things!

For example, in both document types we can specifif the type of output file we want to render to.

This is how we would could modify the example header section above to render to a PDF from an RMarkdown file:
```
---
title: "Computing and Data for Scientists"
author: "Dominic White"
output: pdf_document
---
```
And this is how we would write the same thing in a Quarto file:
```
---
title: "Computing and Data for Scientists"
author: "Dominic White"
format: pdf
---
```
Some options will be shared between all output file types (like title and author`), but other options are specific to a particular type of output file.

For simplicity, let’s consider just RMarkdown. We can specify multiple output formats with their own options (the first format will be the default), e.g.
```
---
title: "Computing and Data for Scientists"
author: "Dominic White"
output: 
  pdf_document:
    keep_tex: true
  html_document:
    theme: cosmo
    code_folding: hide
---
```
What does all this mean?

First we have specified options for rendering to both PDF (pdf_document) and HTML (i.e. web pages, with html_document).

keep_tex is a PDF specific option. A .tex is an intermediate filetype that is created from the RMarkdown file, which can then be rendered as a PDF. By default, that intermediate file is deleted once the PDF is created. Here we’ve set an option to keep it instead.

We’ve also set two html specific options. theme allows us to pick a style for our webpage (e.g. colors, font, etc.) from several pre-existing options. code_folding will hide the code of any code chunks, but create a button that can be used to expand the code (this is not something we could do in a PDF, since PDFs are static).

These examples also highlight a bit of YAML formatting syntax (applicable in both Rmd and Qmd files).

All of our options above have been specified in the format of key: value where key is the name of the option and value is what we set it to.
We can nest options under other options as we did with the format specfic options. In that case, we need to indent those nested lines two more spaces that the parent:
```
key: 
  value:
    nested_key: nested_value
```
Sometimes we want to provide a list of values to an option. For example, in RMarkdown header sections, we can provide multiple authors as a list to the author option.

We either enclose them in square brackets:
```
author: ["Joe Bloggs", "Anna Smith"]
```
or list them on separate lines (indented 2 spaces, and starting with a hyphen):
```
author: 
  - "Joe Bloggs"
  - "Anna Smith"
```

A detailed list of header sections options for RMarkdown can be found in the book RMarkdown: The Definite Guide (available at https://bookdown.org/yihui/rmarkdown-cookbook). For Quarto, the official documentation at https://quarto.org provides a similar list.

4.7 Images & figures

If you have an image in a separate file that you wish to include in the rendered output, then you can do so using the following Markdown syntax:

![](image_file_name.jpg)

You will note that this is very similar to the syntax for a web link (square brackets then parentheses), but with an exclaimation mark at the front.

We can add alternative text within the square brackets like this:

![description of image](image_file_name.jpg)

Alternative text is a description of the image that will be displayed if the image can’t be loaded. It’s a good idea to include alternative text even if you don’t think that’s a problem, because alternative text is used by blind or visually impaired users to understand what’s in an image.³

We can also include images using R code in a code chunk. This makes use of the knitr package:

```{r}
knitr::include_graphics("image_file_name.jpg")
```

4.8 Tables

You can create Markdown tables using the following syntax:

| column 1 heading | column 2 heading  |
| ---------------- | ----------------- |
| row 1 content    | other content     |
| row 2 content    | other row 2 stuff |

which creates a table like this:

column 1 heading	column 2 heading
row 1 content	other content
row 2 stuff	other row 2 stuff

A few things to note:

Vertical bars | divide columns. The hyphens --- on the second line of the table divide the headings row from the data rows.
Nice formatting is not required by Markdown, but it helps anyone reading the source file understand the text. However, you are not required to align the column separators | on each line, nor do you need more than one hyphen below each heading.

For example, this is a valid way of writing the same table as before:
```
| column 1 heading | column 2 heading |
|-|-|
| row 1 content | other content |
| row 2 stuff | other row 2 stuff |
```
(Although you would hopefully agree that it’s much harder to understand!)
You can adjust the alignment of content in each column (aligning it either to the left, the right, or centered) using colons : in the line of dashes.

For example:
```
| column 1     | column 2 | column 3      |
| :----------- | :------: | ------------: |
| left aligned | centered | right aligned |
```
The contents of a table cell have to be on a single line. You cannot put a linebreak in the source file to make it easier to read.

Tables can also be created by code chunks, typically by including a line of code that returns a dataframe, e.g.:

head(iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Here we are using the head() function to retrieve just the first six rows of the iris dataframe.

We could have included a line of code that was just the dataframe name:

```{r}
iris
```

However this would print out the entire dataframe, which has 150 rows! Since large dataframes make output documentshard to read, you generally don’t want to display the original data.

4.9 Math

We often want to include mathematical equations or other notation in scientific documents. Markdown does not include support for this, but RMarkdown and Quarto both allow us to include maths written in Latex notation.

To do so, we put the math between dollar symbols: $.

A single pair of dollar symbols allow us to write inline math, within a paragraph of regular text, for example:

The formula for the area of a circle is $A = \pi r^2$.

which becomes

The formula for the area of a circle is $A = \pi r^2$.

You may also want include a block of math as it’s own entity, separate from the rest of the text. To do so, we use a pair of double dollar symbols, e.g.

$$A = \pi r^2$$

which becomes

\[A = \pi r^2\]

Note how the math block is centered.

Here are some useful mathematical symbols for various operations in Latex, and what they render as in the output document.

Operation	Latex symbols	Example
Arithmetic	`+`, `-`, `=`	$4 + 2 - 1 = 5$
Basic division	`/`	$1/2$
Fancy division	`\frac{...}{...}`	$\frac{1}{2}$
Greek letters	,	$\alpha \pi$

4.10 Visual editor

So far we have written the contents of our input file in plain text, with markup symbols to indicate what the rendered output should look like. This can take some getting used to, especially if you are more familiar with word processing software like Microsoft Word or Google Docs where you can see the formatting as you edit the document. It can also sometimes be fiddly to edit markup symbols.

To help, RStudio includes a visual editor in which you can get a live preview of what your document will look like as you write it.

To switch to the visual editor when you have a source file open in RStudio, click the Settings dropdown (the white cog wheel) at the top of the file editor pane and select the first option (Use Visual Editor).

The visual editor can be useful if you find it easier to work with a visual representation of the output document. However, you should remember a few things when working with the visual editor:

The underlying file is still a RMarkdown or Quarto file.
The visual editor view is what the output will look like if rendered to HTML (e.g. to a webpage). If you render to a different output once you are finished (e.g. to a PDF), the output might look slightly different.
There may be times when you will need to switch back to the Source mode to edit the underlying source file, e.g. if you are trying to add some sort of advanced content that is not supported by the visual editor.

4.11 RMarkdown versus Quarto

Aside from the differences covered in the previous sections:

RMarkdown has an extensive ecosystem of extensions developed over its long history. Quarto can be extended, but seems to be designed to contain most features within the main system (and because it’s newer, has a smaller (albeit growing) set of extensions).
Both can support multi-file projects (e.g. if you wanted to write a book, with a separate source file for each chapter), but Quarto knits each source file separately, whereas RMarkdown make the code and results of each source file available to later files in the rendering process.

For example, if you were writing a multi-file book and you create a variable in the Chapter 1 file, you can then access it in a separate Chapter 2 file if you are using RMarkdown. In Quarto, you would need to repeat all the code to recreate the same variable in difference chapter files.
Personally I have found Quarto to be a little less flexible and more prescriptive, but not deal-breakingly so. It has some nice new features that are not standard in RMarkdown, and is being actively developed and improved (whereas RMarkdown is relatively stable at this point).

This book is written principally in Quarto, but due to the similarities, it’s easy to switch between them, or even include RMarkdown files within Quarto files (which this book does).

or click the New File icon in the top toolbar↩︎
If you are wondering how many markup languages this book is going to require you to learn, you may be dismayed to learn that YAML stands for Yet Another Markup Language. Also, two. And you don’t need to learn much YAML.↩︎
Such users may read your document with screen reading software that reads the contents of the file out loud, including the alternative text that accompanies any images.↩︎