Matt Marshall

July 2020

Writing a thesis in markdown

In my dark and murky past as a full time PhD Student and in my current alter-ego as someone writing up a PhD Thesis on the evenings and weekends I have spent a lot of time writing things. A lot of academic writing occurs in either Word or LaTeX and since my undergraduate I've been firmly in the LaTeX camp; using it to write papers, essays, etc. When I started my PhD I was originally planning to produce my thesis in LaTeX and actually wrote the first drafts of my initial few chapters in it.

I ended up migrating away from LaTeX and these days I, like many others online, try to do most things in Plaintext or Markdown. I don't want to spend too much of this post saying why as there are entire blogs dedicated to this. Suffice to say that it dovetails very nicely with my views on minimalism and simplicity and allows me to focus on the writing. Just as LaTeX got out of my way when trying to write before, so markdown gets out of my way even more than LaTeX does. In this instance it has made an otherwise troubled PhD experience much more pleasant than if I were to attempt to finish my thesis by other means.

My toolkit

If you're a general fan of either plaintext or markdown then chances are you're familiar with the majority of these tools.

The core toolset:

  • Atom for my text editor (although any text editor would do)
  • Markdown for markup, using the Pandoc markdown flavour
  • BibTeX for my references / bibliography storage
  • Pandoc for converting to various output formats
  • Zotero and its associated Firefox plugin to manage my bibliography and export BibTeX

Additions:

These may seem quite numerous and complex but thankfully I've been working with each of these tools independantly for years and it was very straightforward to put them together. It may be strange to hear a stack of 8 tools being described as "simple" or "minimalist" but the benefit of these is that they're each very good at one specific job and ultimately they get out of my way when it's time to write which is something that Word Processors just don't do. Whether it's MS Word or even LibreOffice Write; I just can't seem to master the art of sitting in front of a word processor and writing. I'm constantly fighting with formatting, pasting, and images jumping around. Not to mention the crashing.

Both PureCSS and BetterBibTeX literally disappear once you've added them to the toolchain. There's an initial 2 minute setup where you install BetterBibTeX into Zotero and, maybe, adjust the citation key format to your preferences. After that it just kind of fades away as you benefit from nicer citation key exports.

Zotero and its connector would be part of any academic toolchain as an alternative to proprietary systems so I'm not sure they count as additional burden to be honest. That said once the Zotero Firefox connector is installed it becomes second nature to hit the button and grab the citation for writing.

Git is effectively just my cloud storage and back-up solution. If you're using a Word Processor to manage this you probably have back-ups on USB keys (good) and a cloud solution (also good) such as (probably) Dropbox, OneDrive, or Google Drive. This supports syncing. Since my thesis is so tiny, effectively being in plaintext, this is handled by Git without any complaint and it also makes sense to allow me to track changes to individual files. The thesis is stored online in a Gitlab repo.

Pandoc facilitates the conversion between the markdown source and formats that people want to read it in. For fun and convenience I wrote a small build script that allows me to build the the thesis quickly since pandoc commands can become quite long. I run this once at the end of every writing session.

In Practice

This is all well and good but what does it look like in practice?

Here's my folder structure:

* thesis/
  * notes/
    * note-files.md
  * out/
  * src/
    * figs/
      * fig-files.svg
    * bibliography.md
    * chapter-01.md
    * chapter-02.md
    * chapter-03.md
    * chapter-04.md
    * chapter-05.md
    * chapter-06.md
    * chapter-07.md
    * chapter-08.md
    * harvard-newcastle-university.csl
    * index.md
    * thesis.bib
    * web.css
  * templates

The notes folder is just that. If I'm working something through or wanted to take extensive notes on something to have by the thesis but that wouldn't make sense or would clutter it when it came read a draft of a section they go here.

The out folder isn't actually included in the git repo as it is where the "builds" of the thesis end up. When you run the build script it automatically generates the thesis in this location.

The src folder is the actual content of the thesis. It only has one subfolder called figs for, you guessed it, figures. Each chapter has its own file which is pretty straightforward. index.md contains some front-matter for configuring the builds and adding metadata. This effectively just makes it easier to manage the pandoc commands. It looks like this:

title: A Rough, Transparent, Draft of my PhD Thesis
author: Matt Marshall
bibliography: src/thesis.bib

css:
 - https://unpkg.com/purecss@1.0.0/build/pure-min.css
 - src/web.css

link-citations: true
csl: src/harvard-newcastle-university.csl

thesis.bib and web.css should be pretty self-explanatory as files: the former is my BibTeX library generated from Zotero and the latter is some custom css that I apply on top of PureCss to make the HTML version look prettier.

The templates folder contains a template for a HTML frontpage used by the build script. In the future it may contain custom pandoc templates for LaTeX or such to generate a thesis with some obligatory frontmatter such as a Newcastle University logo (blerugh).

That's all there is to it really. 99% of the time I just live in a markdown file for each chapter, and then run a build script to build the thesis in my desired output format.

Referencing workflow

When I need to reference something I need to interact with Zotero but it's so simple it's almost embarassing.

  1. In my web browser I hit the Zotero connector button to trigger saving the reference to my Zotero library
  2. In Zotero the reference is already highlighted so I'll check it has all the information it needs
  3. BetterBibTeX has already done its thing so I copy the citation key over into my document using the pandoc citation syntax e.g. [@strohmayerTechnologiesSocialJustice2017]
  4. Atom's autosuggest magically starts suggesting it to me whenever I start typing @ in case I need to type it again.

When I'm done writing for a bit or want to check how a paragraph reads I'll export the Zotero collection used for my thesis into thesis.bib.

Working with others

When you're writing a thesis it's generally recommended that you send your work to your supervisor and hopefully they'll get back to you with comments and opinions on it.

Unfortunately my supervisor isn't really a markdown person so I was worried initially that there would be a tool/workflow gap. Thankfully from my writing-papers-in-LaTeX days there was the well established practice of using Pandoc to convert the document into a word file and sending it over to receive feedback which is what we've landed on.

Originally I was going to try to get dokieli set up on the web version of my thesis to facilitate feedback there however I didn't want to create any additional hoops to jump through. I landed on the workflow of sending my supervisor my chapter in a DOCX file and then receiving that file back with comments which I keep open while I work on the changes in markdown.

I don't store the feedback in the git repository as this would get bulky quite quickly and I feel that's a separate concern. I manage feedback by sticking the feedback into a folder that's synced to my NextCloud instance.

Challenges

There is one very distinct area that I've found a challenge when choosing to write my thesis in markdown which is automatic numbering for sections, tables, and figures. Sadly Pandoc doesn't support this to my knowledge. There is a fork of Pandoc called Scholdoc which is puported to understand Scholarly markdown; a markdown flavour that is purpose-built for academic writing. Its syntax includes provisions for figures and float environments which is pretty neat and the output formats are limited to HTML 5 and DOCX which are fine by me. Theoretically it is exactly what I needed.

Sadly I never got Scholdoc to work and it looks like the last update to the Github repo was way back in 2015 so I suspect it may be abandonware. My solution thusfar in my thesis has been to manually number figures by chapter e.g. Chapter 3 Figure 1 is Fig 3.1 but it would've been nice to be able to have this done automatically and update as I add/remove/adjust figures.

If I'm honest it doesn't bother me too much and forces me to keep it simple and not rely too much on figures in a chapter. If it becomes a problem in later chapters when it comes to crunch time I may introduce an intermediary step where the thesis is converted to LaTeX and tweaked before being transformed into its final PDF form although that would sadly clash with my original plan of using print styles on HTML to manage this.

Summary

I've put together a very simple toolkit and structure to write my PhD thesis in markdown. This enables quite a nice and relatively natural rhythm for writing as well as allowing me to present the thesis in various forms for the web and collaboration with my supervisor. There are still challenges and I lose some benefit from not getting automatic numbering which I do with LaTeX, but overall has resulted in a very nice writing experience. I'd recommend this to anyone.

In fact I wrote the first draft of this blog post before I searched the web for writing a thesis in markdown and it turns out this is already an established practice. I'm glad to say that, at a brief glance over the landscape, many of the same things I've said are shared experiences. I'll stick to my own toolchain here but I recommend people look at Tom Pollard's PhD Markdown Thesis template and I found this post from The Urbanist a pleasant read as well.

Happy writing.

minimalism phd markdown technology thesis plaintext

Last night I had a dream where landlords had started installing vending machines in people's front rooms to capitalise on snacking during lockdown. And I can't help feel that would be unsurprising if it actually happened

capitalism life dreams landlords

Kicking myself that I've (begrudgingly) been using Google Chrome for work things to separate out my work and personal lives but Chromium which is more friendly and open-source is right there.

Luckily migrating was fine. It's still a Google product and a little too integrated with Google services for my taste but it's a step in the right direction

life google degoogle work FLOSS open source web

Your brain on liberalism (a quick socialist rant)

I've just discovered via r/swoletariat this absolutely fucking unreal 2018 Guardian editorial from Zoe Williams

Do you boast about your fitness? Watch out – you’ll unavoidably become rightwing

Great start there. Bit rich coming from a person whose editorial board posts transphobic shit and Israel apartheaid apologism

Yesterday was Fitness Day. Sorry, let me give that its proper title: #FitnessDay. The space bar is always the first casualty of a manufactured social media movement.

Sweet, hot take! It's not like hashtags are the most basic way of linking together commentary on a topic in our modern age. Hypertext is based on linked documents, Zoe.

Do too much, and the self-love develops a carapace of self-sufficiency. This is especially a problem for cyclists, who come to think of themselves as an off-grid warrior class, having performed their commute drawing on no more resources than their own glutes, and maybe a sports drink. Unavoidably, over time, this makes you more rightwing, as you descend into an aerobics-powered moral universe where only the weak need each other, and all the strong need is a waterpouch in their backpack that pipes straight into their mouths.

Bit of a fucking leap there imho. How does that work? I enjoy exercise for a variety of spiritual and physical reasons. Not once have I ever thought of myself as self-sufficient, a "warrior" (in a non day-dreamy / roleplay sense). I also don't own a water pouch. Rude.

How heroic do you find the armed forces? And is that just those in active combat, or also the ones who fix army IT and count parachutes? I found the questions on YouGov’s recent poll peculiar, but I often do when they ask us to make qualitative judgments about one another (do benefit claimants want to work? Are migrants ambitious? – there is no possible answer beyond “I’d have to take this on a case-by-case basis”).

So we've tried to draw a straight line from liking exercise to soldier-hero worship? Sweet. No problem there at all.

From the people who brought you the Ostrich Pillow – which lets you nap anywhere, the next best thing to being a baby – comes the three-way hood: you can wear it as a hood, or as a snood, but its unique selling point is “eclipse mode”, where you pull it right over your face and that alerts people to the fact that you don’t want to talk to them. So, someone has just reinvented a pillow case, for a generation of people who have forgotten how to deploy a simple, offputting grumpy face. It’s the hood that says hell-in-a-handcart.

Wait what? This is the conclusion of the article. I'm really confused now. What has this to do with anything? Are we just trying to glue together random pieces of "individualism bad"? I get the sentiment; rugged individualism is misconceived at best and outright fascist propaganda at worst. But as mentioned before we're hardly the voice of solidarity are we The Guardian. That concluding paragraph indicates that this is nothing more than a strung-together vitriolic ramble. What the hell?

Don't fucking read The Guardian folks. It's centrist tripe.

liberalism socialism The Guardian rants