Writing a thesis in markdown
In my dark and murky past as a full time PhD Student and in my current alter-ego as someone writing up a PhD Thesis on the evenings and weekends I have spent a lot of time writing things. A lot of academic writing occurs in either Word or LaTeX and since my undergraduate I've been firmly in the LaTeX camp; using it to write papers, essays, etc. When I started my PhD I was originally planning to produce my thesis in LaTeX and actually wrote the first drafts of my initial few chapters in it.
I ended up migrating away from LaTeX and these days I, like many others online, try to do most things in Plaintext or Markdown. I don't want to spend too much of this post saying why as there are entire blogs dedicated to this. Suffice to say that it dovetails very nicely with my views on minimalism and simplicity and allows me to focus on the writing. Just as LaTeX got out of my way when trying to write before, so markdown gets out of my way even more than LaTeX does. In this instance it has made an otherwise troubled PhD experience much more pleasant than if I were to attempt to finish my thesis by other means.
If you're a general fan of either plaintext or markdown then chances are you're familiar with the majority of these tools.
The core toolset:
- Atom for my text editor (although any text editor would do)
- Markdown for markup, using the Pandoc markdown flavour
- BibTeX for my references / bibliography storage
- Pandoc for converting to various output formats
- Zotero and its associated Firefox plugin to manage my bibliography and export BibTeX
- Git for tracking changes and publishing the source
- PureCSS for styling the HTML output
- BetterBibTeX is a Zotero plugin that generates nicer BibTeX citation keys for citing things
These may seem quite numerous and complex but thankfully I've been working with each of these tools independantly for years and it was very straightforward to put them together. It may be strange to hear a stack of 8 tools being described as "simple" or "minimalist" but the benefit of these is that they're each very good at one specific job and ultimately they get out of my way when it's time to write which is something that Word Processors just don't do. Whether it's MS Word or even LibreOffice Write; I just can't seem to master the art of sitting in front of a word processor and writing. I'm constantly fighting with formatting, pasting, and images jumping around. Not to mention the crashing.
Both PureCSS and BetterBibTeX literally disappear once you've added them to the toolchain. There's an initial 2 minute setup where you install BetterBibTeX into Zotero and, maybe, adjust the citation key format to your preferences. After that it just kind of fades away as you benefit from nicer citation key exports.
Zotero and its connector would be part of any academic toolchain as an alternative to proprietary systems so I'm not sure they count as additional burden to be honest. That said once the Zotero Firefox connector is installed it becomes second nature to hit the button and grab the citation for writing.
Git is effectively just my cloud storage and back-up solution. If you're using a Word Processor to manage this you probably have back-ups on USB keys (good) and a cloud solution (also good) such as (probably) Dropbox, OneDrive, or Google Drive. This supports syncing. Since my thesis is so tiny, effectively being in plaintext, this is handled by Git without any complaint and it also makes sense to allow me to track changes to individual files. The thesis is stored online in a Gitlab repo.
Pandoc facilitates the conversion between the markdown source and formats that people want to read it in. For fun and convenience I wrote a small build script that allows me to build the the thesis quickly since pandoc commands can become quite long. I run this once at the end of every writing session.
This is all well and good but what does it look like in practice?
Here's my folder structure:
* thesis/ * notes/ * note-files.md * out/ * src/ * figs/ * fig-files.svg * bibliography.md * chapter-01.md * chapter-02.md * chapter-03.md * chapter-04.md * chapter-05.md * chapter-06.md * chapter-07.md * chapter-08.md * harvard-newcastle-university.csl * index.md * thesis.bib * web.css * templates
notes folder is just that. If I'm working something through or wanted to take extensive notes on something to have by the thesis but that wouldn't make sense or would clutter it when it came read a draft of a section they go here.
out folder isn't actually included in the git repo as it is where the "builds" of the thesis end up. When you run the build script it automatically generates the thesis in this location.
src folder is the actual content of the thesis. It only has one subfolder called
figs for, you guessed it, figures. Each chapter has its own file which is pretty straightforward.
index.md contains some front-matter for configuring the builds and adding metadata. This effectively just makes it easier to manage the pandoc commands. It looks like this:
title: A Rough, Transparent, Draft of my PhD Thesis author: Matt Marshall bibliography: src/thesis.bib css: - https://firstname.lastname@example.org/build/pure-min.css - src/web.css link-citations: true csl: src/harvard-newcastle-university.csl
web.css should be pretty self-explanatory as files: the former is my BibTeX library generated from Zotero and the latter is some custom css that I apply on top of PureCss to make the HTML version look prettier.
templates folder contains a template for a HTML frontpage used by the build script. In the future it may contain custom pandoc templates for LaTeX or such to generate a thesis with some obligatory frontmatter such as a Newcastle University logo (blerugh).
That's all there is to it really. 99% of the time I just live in a markdown file for each chapter, and then run a build script to build the thesis in my desired output format.
When I need to reference something I need to interact with Zotero but it's so simple it's almost embarassing.
- In my web browser I hit the Zotero connector button to trigger saving the reference to my Zotero library
- In Zotero the reference is already highlighted so I'll check it has all the information it needs
- BetterBibTeX has already done its thing so I copy the citation key over into my document using the pandoc citation syntax e.g.
- Atom's autosuggest magically starts suggesting it to me whenever I start typing
@in case I need to type it again.
When I'm done writing for a bit or want to check how a paragraph reads I'll export the Zotero collection used for my thesis into
Working with others
When you're writing a thesis it's generally recommended that you send your work to your supervisor and hopefully they'll get back to you with comments and opinions on it.
Unfortunately my supervisor isn't really a markdown person so I was worried initially that there would be a tool/workflow gap. Thankfully from my writing-papers-in-LaTeX days there was the well established practice of using Pandoc to convert the document into a word file and sending it over to receive feedback which is what we've landed on.
Originally I was going to try to get dokieli set up on the web version of my thesis to facilitate feedback there however I didn't want to create any additional hoops to jump through. I landed on the workflow of sending my supervisor my chapter in a DOCX file and then receiving that file back with comments which I keep open while I work on the changes in markdown.
I don't store the feedback in the git repository as this would get bulky quite quickly and I feel that's a separate concern. I manage feedback by sticking the feedback into a folder that's synced to my NextCloud instance.
There is one very distinct area that I've found a challenge when choosing to write my thesis in markdown which is automatic numbering for sections, tables, and figures. Sadly Pandoc doesn't support this to my knowledge. There is a fork of Pandoc called Scholdoc which is puported to understand Scholarly markdown; a markdown flavour that is purpose-built for academic writing. Its syntax includes provisions for figures and float environments which is pretty neat and the output formats are limited to HTML 5 and DOCX which are fine by me. Theoretically it is exactly what I needed.
Sadly I never got Scholdoc to work and it looks like the last update to the Github repo was way back in 2015 so I suspect it may be abandonware. My solution thusfar in my thesis has been to manually number figures by chapter e.g. Chapter 3 Figure 1 is
Fig 3.1 but it would've been nice to be able to have this done automatically and update as I add/remove/adjust figures.
If I'm honest it doesn't bother me too much and forces me to keep it simple and not rely too much on figures in a chapter. If it becomes a problem in later chapters when it comes to crunch time I may introduce an intermediary step where the thesis is converted to LaTeX and tweaked before being transformed into its final PDF form although that would sadly clash with my original plan of using print styles on HTML to manage this.
I've put together a very simple toolkit and structure to write my PhD thesis in markdown. This enables quite a nice and relatively natural rhythm for writing as well as allowing me to present the thesis in various forms for the web and collaboration with my supervisor. There are still challenges and I lose some benefit from not getting automatic numbering which I do with LaTeX, but overall has resulted in a very nice writing experience. I'd recommend this to anyone.
In fact I wrote the first draft of this blog post before I searched the web for writing a thesis in markdown and it turns out this is already an established practice. I'm glad to say that, at a brief glance over the landscape, many of the same things I've said are shared experiences. I'll stick to my own toolchain here but I recommend people look at Tom Pollard's PhD Markdown Thesis template and I found this post from The Urbanist a pleasant read as well.