Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Ph.D. thesis with Org Mode (github.com)
136 points by quazar 63 days ago | hide | past | web | favorite | 27 comments

I have been writing my thesis as an org-mode file in git that is automatically published [0] as a pdf every time I commit using a bash script tangled as a commit hook. Figures are (or can be) computed from live data, and org-ref can automatically manage and format all my references as I go without the insanity inducing workflows that one usually has to resort to. Another amazing thing about org-mode is that I can write a code block to hit a remote api (e.g. google docs) and fill and format entire chapters from a collaborative editing source that my advisor is comfortable using, C-c C-c and it is embedded in the document. Perfect for stitching papers together from a variety of sources.

While there is a learning curve, the peace of mind of having a single tool that enables me to use any tool I need to use was such a relief after years of other painful writing and publishing workflows.

If you are starting a PhD today and are going to do anything at all with code, I seriously suggest you learn org-mode. By the time you finish, not only will you have a PhD, but even if you don't you will have the power of one of the most amazing authoring tools under your command! (I would say, "Learn org-mode, simplify your life!" but emacs tends to lead to other complications).

0. https://orgmode.org/manual/Publishing.html

A while back when I still worked on academic papers, I had a great workflow where I had a Makefile that knew how to build everything (deriving .eps files from xfig diagrams, deriving a main pdf file from all the .tex files etc), and I used inotifytools to notice when any of the build dependencies changed, and that triggered a run of `make`.

xpdf has (... had?? apparently it got removed in the 4.0 series to get a release out the door) a "remote" option where you can send commands to a running xpdf instance, so I would have the above makefile notify the running xpdf to re-read the file.

Put $EDITOR on one side and xpdf on the other, and you've got a near-realtime-updated view of what you're working on, just save the file and see the rebuilt pdf. It usually took around a second or two depending on the complexity of what I was working on.

I didn't use emacs at the time, but if I had, I would try to integrate all of the above with flymake-mode to automatically build without me needing to save the file. (there are probably problems with this because if it builds while you're in the middle of typing a latex command, the latex process will fail to build)

Back in the stone age when I did mine, my approach was more local, but similar.

I wrote all the text in LaTeX using AucTeX package, but had a makefile and a commit hook (to CVS, later SVN, this was pre-git). All figures & tables etc. were generated from other make targets, which chained back to a mix of (usually) lisp code, matlab, or maple/mathematica. Oh, and figures generated by things like MetaPost. Structuring things at the chapter level .tex files made it all pretty manageable.

I had figure dependencies on both input data and executable targets/scripts, so changing either would trigger a recompute, with the exception of some long running stuff - of course had global rebuild rules. I wrote many papers this way too, it was nice to know everything was synchronized properly.

I did use org-mode to organize my work though.

These days it would be on github and I probably wouldn't have lost it in a move.

How do you deal with different sections from google docs? As in, how do you format these into markdown or latex sections?

I've been working with psychologists and I've taught them the basics of a latex document, but I see how extracting the sections they have written from a google doc could be useful.

I wrote a docs api v1 -> org converter in python. I just pulled it into its own repo and it still has a bunch of dependencies related to how I handle auth for other projects, but if you want to take a look it is at [0]. There is an example of how I use the Docs class in an org source bock in the readme (note that I use evil mode, and that you need to view raw to see everything that is going on with the headers). It currently requires a little dance to get everything in the right place at the right indent level, but it could be fully automated. I'm in the process of writing a converter going the other direction in elisp, and I will probably rewrite the python into elisp at some point as well, but for now, following the keystrokes is faster for me than trying to reimplement it to integrate with the org-mode api.

As an overview I use the heading level in the google doc as the corresponding * level (e.g. h1 -> , h2 -> *, etc.). Nearly every construct in the google doc has a 1:1 mapping with some org mode convention (there are some things that I still write in org-syntax in the google doc, such as figure captions and stuff like that.

If I need to export to markdown from org I usually try pandoc (I very rarely do this though). Importing from markdown I use pandoc as well, newer versions do a pretty good job and only need a few tweaks from time to time. Latex export is straight forward using org-mode's built in exporter [1] via the export dispatcher. For full pdf export I use lualatex via texlive with all the pain that that entails.

0. https://github.com/tgbugs/gdocorgpy 1. https://orgmode.org/manual/LaTeX-export.html 2. https://orgmode.org/manual/The-Export-Dispatcher.html#The-Ex...

Could be be done, but does not feel like the appropriate tool. Org mode translates to a basic latex only, that is a huge waste of TeX power.

I did my thesis entirely in LaTeX, so I have some experience in the theme, and it was a rollercoaster. There is always some really special table, graph, arrangement, extra space needs, that you can do only programming with the real stuff. Doing the same with org mode would have been stuff of nightmares (even if org tex has some nice qualities that LaTex does not have, like fast shrinkable text).

I missed only a thing. Being able to do in situ calculus in LaTeX (filling columns and totals in tables excel style), but is not really the right tool for that so you can't blame TeX for that. The team of R+Latex filled the gape nicely.

Apart of this, TeX environment has planned and build yet anything that you could dream when you write a PHD thesis. From basic, to really esotheric. Can be difficult sometimes, but is awesome.

You can insert any LaTeX markup in org mode as well as just using their syntax.

This is more fragile than you would expect. If most of what you're doing is going to be LaTeX, it's easier just to write in .tex and eliminate the .org middle man. (I say this with a great love of Org mode, which I use everywhere else.)

Org mode would (probably) make a mess with R chunks included in the .tex file, for example. You can't compile from R to org to latex to pdf, so this can be a problem. Same problem with Python to Latex path if you put org in the middle of the way.

By the way, I use .org files a lot also. Is handy for small and simple documents.

Even with plain LaTeX I've had trouble with Org mode trying to decide when to 'smartly' switch into maths mode, and what should be literal, e.g. `\latexfunction{...}` is sometimes translated as `\\\latexfunction{...}` because Org decides it should escape the backslash.

And you can inline display latex fragments. Not the prettiest display but it does the job!

When I was doing my thesis I was a big fan of Mathematica's right click -> copy as LaTeX feature. It saved a lot of time entering formulas and copying tables from the notebooks I was using for analysis.

+1 for this, it really helped me.

Also, Mathpix is neat - especially if you're citing results and need to include some equations from those papers.

With arxiv papers, I usually download the Tex file and copy the formulas from there. Besides, sometimes you find interesting ideas that were commented due to space.

What's the benefit of Latex over LyX? With LyX you can insert latex if you need it, but otherwise you are in a very nice Wysiwym environment

Really just a difference in preference in my experience. I personally really like declarative workflows.

Essentially, Emacs + makefiles :)

Lyx is great for shorter things. I use it for presentations, where being able to see the images, colors, etc really helps.

For longer stuff, with a lot of text, a nice editor has its advantages. And you can split of chapters in extra files easily and all that.

You can see the images in emacs with org previews. http://kitchingroup.cheme.cmu.edu/blog/2016/11/06/Justifying...

Writing LaTeX in emacs is faster

Back in the late 2000's I looked around for the right tool for my Ph.D. dissertation and quickly ruled out Word, Libre (or Open Office) Writer, or any wysiwyg tool, because they 1) lacked fast, easy ways to work at the structural (tree) level of the document, which is important when you have many sections and subsections to organize thoughts and arguments; 2) couldn't do mathematical notation well; 3) didn't integrate well with the reference management tool I was using then, Mendeley.

As an Emacs lover, I soon found Org, and it was (and remains) the perfect tool, for working in plain text — which will never be obsolete, and works easily with git or version control. Then and now, nothing could match Org speed and flexibility: structural editing (creating nodes, moving nodes, promote/out-dent a node, demote/in-dent a node) was and is fundamental to Org (unlike Markdown etc), and it's ridiculously easy to reorganize thinking and writing as you go. You can export to LaTeX (or HTML) and customize formatting as needed, while also including code blocks from multiple languages. Integration with BibTeX was tight, and made handling hundreds of references easy.

Where other writing tools for complex documents previously made me cringe and cuss, Org makes writing a pure joy, freeing the mind to work entirely on content and its structure.

I'm not sure about other fields, but most theses in computer science these days are usually a stitch-up of your papers + some glue text/backstory. I.e. 80% of your thesis is already in LaTeX format. Org Mode only adds complexity to your workflow.

This was common in the hard sciences way back when. I remember fellow students saying things like: "I have my three papers, that's three chapters, now I just need my intro and conclusions." In psychology, they had a phrase, which was: "Published and outa here."

Some advantages: It meant your work was demonstrably publishable, you got 3 pubs out of it. Disadvantages: You were basically a hostage until your advisor agreed to let you publish 3 papers. It probably tended to flood the literature with papers of lesser quality.

Unless (like me) you write your papers in org-mode too!

I doubt that anyone can be familiar with "most theses in computer science these days".

Depends on your threshold for "familiarity". Anyone currently working on theirs will likely have a number of friends/colleagues working on ones of their own, include second and third hand knowledge accumulated from conferences and likely any modestly sociable CS doctoral candidate would know at least the loose structure, if not 30 second blurb of quite a large number of theses in progress.

That is very much dependent on department.

I liked the latex template, could you give more details about it? such as font name and so on.


Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact