Hacker News new | past | comments | ask | show | jobs | submit login
Efficiency Comparison of Document Preparation Systems – LaTeX and Word (plosone.org)
73 points by thesumofall on Dec 25, 2014 | hide | past | favorite | 60 comments

I question the validity of their methodology.

At no point in the paper is exactly what is meant by a "formatting error" or a "typesetting error" defined. From what I gather, the participants in the study were required to reproduce the formatting and layout of the sample text. In theory, a LaTeX file should strictly be a semantic representation of the content of the document; while TeX may have been a raw typesetting language, this is most definitely not the intended use case of LaTeX and is overall a very poor test of its relative advantages and capabilities.

The separation of the semantic definition of the content from the rendering of the document is, in my opinion, the most important feature of LaTeX. Like CSS, this allows the actual formatting to be abstracted away, allowing plain (marked-up) content to be written without worrying about typesetting.

Word has some similar capabilities with styles, and can be used in a similar manner, though few Word users actually use the software properly. This may sound like a relatively insignificant point, but in practice, almost every Word document I have seen has some form of inconsistent formatting. If Word disallowed local formatting changes (including things such as relative spacing of nested bullet points), forcing all formatting changes to be done in document-global styles, it would be a far better typesetting system. Also, the users would be very unhappy.

Yes, LaTeX can undeniably be a pain in the arse, especially when it comes to trying to get figures in the right place; however the combination of a simple, semantic plain-text representation with a flexible and professional typesetting and rendering engine are undeniable and completely unaddressed by this study.

It seems that the test was heavily biased in favor of WYSIWYG.

Of course that approach makes it very simple to reproduce something, as has been tested here. Even simpler would be to scan the document and run OCR. The massive problem with both approaches (WYSIWYG and scanning) is that you can't generalize any of it. You're doomed repeating it forever.

(I'll also note the other significant issue with this study: when the ratings provided by participants came out opposite of their test results, they attributed it to irrational bias.)

At no point in the paper is exactly what is meant by a "formatting error" or a "typesetting error" defined.

No, and they clearly don't mean it in any way that a designer would find intelligible. Let's compare the two in terms of things like kerning, hyphenation, text figures, ligatures . . .


In some sense, it's not even a fair comparison. TeX is a typesetting system, which Word makes no claim to be. You can do some primitive "formatting" in Word, but you can't layout a book or an article to the standards required by contemporary book/journal design.

I have lots of books on my shelf that were designed using either TeX or LaTeX (though InDesign is far more common). I have exactly none that were designed using Word.

> Word, but you can't layout a book or an article to the standards required by contemporary book/journal design.

Of course you can. Plenty of books are done in Word and many papers and conferences release word templates for paper submissions.

Many books are indeed initially written and edited using Word, but aren't they generally passed on to someone who does the final layout in InDesign/QuarkXpress before printing?

In contrast, authors using LaTeX often produce camera-ready copy themselves.

For submission. The designer is not using Word. Ever.

This is quite interesting, and given my experience with LaTeX (I am a post-doc and prefer working with LaTeX), I often wonder if it is all worth the trouble. Although I have only used Word (and LibreOffice Writer, odd the article doesn't even mention it) sporadically over the last years (and only for simple documents), I do wonder how Word and friends perform in settings that were not part of the experiment that is reported here:

* Large documents (50+ pages) (I remember having to deal with file corruptions, figures appearing at random places, formatting suddenly has a free will, ...)

* Lots of figures that get updated during the writing process

* Collaborating: merging several documents into one big report, especially if other authors do not follow formatting guidelines etc.

* Citations and references

* The authors mention it in the conclusions, but I think the test should also have included a scenario based on using templates instead of building something from scratch.

Other aspects why I prefer LaTeX:

* Version control with plain text files is rather convenient. And so is collaborating.

* Comparing different versions of the same document(s) is much easier with plain text (diff), although you can do something similar with PDF's

* I agree to a certain extent with the authors that scientific content is more important than the form, but I do prefer a traditional LaTeX look over Word documents. By far.

* I always use templates, and this speeds up the writing process significantly. Ideally, you can forgot about the formatting in those cases.

I think the authors make a good point though. Maybe we should invest in smarter/better/more productive LaTeX editors?

edit: formatting, added citations point...

i used word for a large project (undergrad project report of 100+ pages with 4 member team) in 1997 and even then word had features for most (if not all) of the items in your list:

* large docs can be handled by splitting them into master-child documents. You can format across all child documents from the master.

* figures can be edited in-place or embedded from original source.

* collaboration was possible then with child and shared docs, it should only be better now.

* citations and refs are supported, although I dont know if all styles of citations are.

* templates have been in word from a long time and imo are quite natural because they're prototype-based (ie, you can make any document AFTER you create it into a template. other documents that use that as a template inherit all styles and so forth)

* the visual "View changes"mode in word is quite natural and even allows for some offline discourse with your collaborators as each user's comments and changes are marked with a different color and comments are allowed.

*words symbol editor (which also existed pre-1997) is quite up to the task of most equations (again, imo; i've not done a lot of hairy equations)

word is just a better tool for large documents, and i say this asn ardent anti-ide guy. my preferred setup for code documentation is sublime text and markdown; but when you want to Just write a document, word it is.

Citations and refs are good until someone tries to copy paste them between documents. This wouldn't work in latex either but the visual temptation to do so with word is much easier to fall into, what is worse is that after pasting the ref looks OK but once you try to update reference numbers your copied references will get broken.

Track changes is the pest and its use has actually been banned at all companies I worked for. It only works when you want to show the most recent changes but it actually breaks standard document comparison which means you can no longer compare version 1 with version 3 of a document. I won't even comment on how horrendous the diff view in word is, even on a qhd display the 4 small panes you get are so confusing that I'd rather diff it manually with 2 documents open aide by side.

The participants were instructed to reproduce the source text within thirty minutes.

Given such a visual task, it's no wonder a WYSIWYG editor like Word would make it much easier to get things looking exactly as they were instructed to. In other words, many of the LaTeX users probably spent a lot of time trying to "reverse-engineer" the formatting, something that very very rarely occurs in practice.

Exactly. If you ask a Word user to reproduce any of research papers in pdf format, they might end up spending days without success. Getting pretty 2 column layouts, properly aligned headers, spacing, figure numbering etc that doesn't look like a letter from your bank is very hard, if not impossible. Exactly reproducing Word like document in TeX would be similarly hard because you might also have to reproduce all the alignment oddities and layout uglyness.

As a long-time LaTeX user, I'm a little disappointed to read these results, but I fully believe them. I've spent countless hours fighting against the layout algorithm to get images to go in the right place, among other frustrating issues.

Efficiency aside, using LaTeX is expected in fields like physics and math, and if you write your paper in Word, readers will be biased against it (consciously or subconsciously). On the ArXiv, the vast majority of papers are typeset using LaTeX, and of the non-LaTeX papers, a large fraction of them are low-quality or written by cranks, hence the negative association.

> if you write your paper in Word, readers will be biased against it (consciously or subconsciously)

I've found that happens mainly with papers that are not only in Word but formatted a bit weirdly. Those can be avoided by people who use Word regularly, though, and in that case you usually have to look pretty closely to tell if something was done in LateX or Word, if they both use the same template (e.g. the Word vs. LaTeX versions of the ACM paper template).

Telltale "Wordisms" that I run across fairly often, and probably do have a negative reaction to: 1) large spaces due to justification in two-column formats without hyphenation (solution: turn on auto-hyphenation); 2) a paragraph being in a totally different font or font size from those around it (solution: paste without formatting); and 3) PDF title set to something like Paper.docx (solution: set a title when exporting to PDF).

>I've spent countless hours fighting against the layout >algorithm to get images to go in the right place

This probably means you were using it wrong :) LaTeX almost always knows better than you, and you should trust it. You are much better off "guiding" the image placement rules than trying to set them manually!

HCI is almost entirely Word; it really varies by field in computer science.

The point of LaTeX is to not worry about format, just the content. "Reproducing a document" is rarely done by normal LaTeX users. Most people just use a template that is appropriate for the task, and let the algorithm taking care of the format.

Basically, the study set up a straw man to attack, and it doesn't not have any ecological validity.

Looking over their data, it seems that they coded incomplete text as an error. So, for example, participants 34 and 37 completed none of the table exercise and were coded as having produced 513 errors (both were LaTeX users). This accounts for the vast majority of the variation in the observed errors. I think they've convincingly shown that Word is more productive than LaTeX for some of the study tasks, but not that the resulting output is more correct.

That being said, I don't doubt that LaTeX is harder to use and more error prone than Word. Since I find myself frequently writing mathematics-heavy text, I personally prefer LaTeX ...

It's strange they didn't show side-by-side comparisons of the resulting PDFs. Do Word and (La)TeX produce equally pleasant results? Last I checked, Word-papers looked pretty amateur.

Also, it came across as two people with an axe to grind. For example, are the LaTeX people using editors with automatic spell checking or correction? Which system was used to come up with the examples? Which system did the person creating the examples normally use?

Almost all scientific papers have more than one author, so collaboration and version control features are also important, but were not considered in this study. (For example, since .tex files are plain text they can be conveniently managed with git)

Also, if the intent is to determine which system should be used to "save time and money", the cost of licensing proprietary software must also be considered.

Although I mainly use LaTeX and prefer its semantic markup, the revision control is one thing I do prefer about Word, at least for my workflow. There are cases where I could imagine the git tooling/workflow being superior, but for me personally it hasn't worked well. Whereas Word's "track changes" mode works well for me as a way of suggesting, commenting on, and integrating revisions (replacing a lot of email traffic).

Agreed! "Track Changes" and master-child documents made it really easy for me to work on 50+ page documents with more than just another collaborator.

Personally, after doing several large documents in LaTeX and only then trying Word, I'd go with Word most times.

Well, it's not just "git"... it's any of the tools people use to compare and collaborate on text files. Throw it in a Rietveld instance or Gerrit, and you can review changes and make comments like with any other code review. If you prefer just throwing it into kdiff3 or something, you can do that, too. If you like Word's "Track Changes", LyX has that (though that means everyone has to use LyX as their editor).

The point is that a simple text-based source format opens up the possibility for lots of different collaboration models that the Word software monoculture prevents.

> (For example, since .tex files are plain text they can be conveniently managed with git)

Unfortunately, it is a rare occasion that all of the coauthors know how to use git.

What's still missing in the LaTeX ecosystem (to my knowledge) is a visually pleasing writing environment. The end result might look great but code editors are simply not made for prose. Even editors specifically designed for LaTex do not invite to write - they invite to code.

Until then I stick to my iA Writer --> Markdown --> pandoc--> Word workflow

"What's still missing in the LaTeX ecosystem (to my knowledge) is a visually pleasing writing environment."

I assume you're not aware of Lyx, then?: http://www.lyx.org/

My own preference is to edit in Emacs/Orgmode, which automatically exports to LaTeX and processes to pdf. Somewhat similar to what other people have mentioned with markdown and pandoc, except that working with Orgmode allows you to insert LaTeX directly if you want, although you can certainly do an entire document without any LaTeX at all, and it will still export to LaTeX/pdf beautifully. E.g., https://www.mail-archive.com/emacs-orgmode@gnu.org/msg04582....

Actually, there are a variety of LaTex editors which I think must meet most people's standards. When I decided to write a book last year, I couldn't face doing it in Word (or anything which would not allow me to view the formatting codes). I was spoiled for choice when it came to LaTex editors. I settled for TexMaker, and within 6 months I had produced a 500 page book, complete with tables, charts and several thousand footnotes.

I'd never used LaTex before in my life. But I wanted the book output as PDF and it was important that the book looked professionally produced. I was really quite happy with the results. Before biting the bullet and using LaTex itself, I had looked at pandoc and other technologies which could output to PDF (e.g. python's ReST). But after some initial tests, I realised they were quite limited in their ability to output a complex PDF document.

TexMaker allows one to have the LaTex source and the PDF end result open in adjacent windows. MikTex/TexWorks was another editor which functioned in a similar fashion. With TexMaker I could jump from a line in the PDF to the originating line in the LaTex file (and vice versa). One can go to images.google.com and search for screenshots of LaTex editors to see the variety available.

The only major enhancement I would like, is if the LaTex editor would allow one to hide all the LaTex codes (perhaps colour-coding the text in the editor to indicate that a paragraph section contains hidden formatting codes). I do find the Latex markup distracting as I try to read the text. However, being able to read the PDF output and then jump to the LaTex source does mitigate this to some extent (I have a large monitor which permits me to have 2 A4 document windows open side by side).

That's exactly what I'm talking about: Distracting formatting codes, too wide text width (by default), mono-spaced fonts (by default), ... Yes all of this can be dealt with but it would still be great to see a LaTeX editor that assumes writers and not coders as its user.

I really like Vim for writing LaTeX stuff. It has automatic text wrapping (:set textwidth=80 or whatever width you prefer) and the quick movement/editing commands make writing prose a dream. Being able to sling around words, sentences, paragraphs quickly makes my writing much faster.

On the other hand, I suppose whether Vim is visually pleasing depends on preference.

I use text editors for editing, but I still use the workflow of "Markdown with embedded latex --> pandoc". That way, you can write readable plain text when possible, and restrict the coding to formulas where latex layouts shine.

One important point of software that is forever sidelined is that the emotional experience is equally as important as the usability side.

Was this properly weighted?

The abstract does mention this with a single line:

> however, more often report enjoying using their respective software.

Whereas the results involving usability/functional output accounted for about 90% of the text summary.

They attribute the increased emotional satisfaction of LaTeX users to cognitive dissonance. Page 12:

"A striking result of our study is that LaTeX users are highly satisfied with their system despite reduced usability and productivity. From a psychological perspective, this finding may be related to motivational factors, i.e., the driving forces that compel or reinforce individuals to act in a certain way to achieve a desired goal. A vital motivational factor is the tendency to reduce cognitive dissonance. According to the theory of cognitive dissonance, each individual has a motivational drive to seek consonance between their beliefs and their actual actions. If a belief set does not concur with the individual’s actual behavior, then it is usually easier to change the belief rather than the behavior [6]. The results from many psychological studies in which people have been asked to choose between one of two items (e.g., products, objects, gifts, etc.) and then asked to rate the desirability, value, attractiveness, or usefulness of their choice, report that participants often reduce unpleasant feelings of cognitive dissonance by rationalizing the chosen alternative as more desirable than the unchosen alternative [6, 7]. This bias is usually unconscious and becomes stronger as the effort to reject the chosen alternative increases, which is similar in nature to the case of learning and using LaTeX."

I find that conclusion gratuitous. It's much more likely that there are legitimate reasons underlying the satisfaction which they didn't consider.

Ironically, the authors' move to rationalize away this unexpected finding as cognitive dissonance could itself be characterized as an instance of cognitive dissonance. The authors probably started out with a particular belief, and the findings didn't agree; instead of accommodating the data, they attempt to neutralize it.

So their conclusion is that LaTeX is best saved for documents with much mathematical content. But it's also worth considering that LaTeX workflows offer better compatibility with revision control systems. The change tracking features in MS Word have never impressed me the few times I've worked with them.

Fascinating! I wonder if these findings scale to larger documents. As a novice user (both self-professed, and by the standards of the report) of both packages I've long found that Word becomes rather inconvenient once you pass a certain number of pages. Its internal structure is rather opaque, so sometimes when formatting goes awry it turns out to be quite difficult to fix. And it won't put non-text items in sensible places itself, though of course if you don't mind carefully positioning each one by hand then you can do that.

I actually much preferred using latex, even though it was so comically horrible in many respects (my Makefile had to run pdflatex 3 times, for example, and needed a whole other step to pre-convert PNGs into PDFs beforehand). More upfront investment, but less ongoing bother.

Or cognitive dissonance, perhaps that was it?

When formatting goes awry, press Shift+F1 to open the 'Reveal Formatting' pane.

That just shows you the properties though, doesn't it? My usual problem is that the style properties seem OK, but the styles themselves are being applied to the wrong bits of text. Then when I try to fix it by removing the unwanted style from the incorrectly-styled portion, the section that was OK loses its style too! It's as if word's internal record of where each block of formatting begins and ends has got out of whack somehow, and when called upon to change a part of one styled section ends up changing the whole thing. (Or perhaps my mental model is wrong? But most of the tine, this sort of thing does work...)

This doesn't have to happen that many times to become really annoying...

It was critical for my mental model to note that Word applies styles to the trailing carriage return.

The paper does not mention Word Perfect, which some people seem to prefer in some cases. I'm not a user (or affiliated in any way) but I was researching document editors for a project when I found this epic comparison of Word vs WordPerfect in real world use:


25 years ago, I was a secretary. I used WordPerfect, Ami Pro, Word, DisplayWrite and other WP programs. Microsoft Word was just about the worst of the bunch. If it hadn't been for Microsoft's criminal monopolistic practices (OS + office suite), Word would never have become the de facto word processor in the world.

Even 15 years ago I was working as a systems administrator in a company which still insisted on using WordPerfect because of its superiority. A Microsoft fanboy became the IT Director, and WP was phased out for Word.

Read the article quickly and they mention that they actually tested LaTeX with the editor used by the participants (not just LaTeX), but they don't mention if they used the orthographic and grammar correction in the editors. I know the one provided by Word is excellent, so if the other editors' are not as good, that might explain some of the difference in textual errors.

This is my experience with LaTeX, it's very powerful but I spend more time thinking about formatting in it than I do in Word.

Try using Word with a makefile. I published a book using Latex with a makefile / scripts. When the printer came back and said two pages were off (wanted colour pages at the centre of the book), a simple update did the trick. Copy-and-paste with Word? No thanks.

I don't understand LaTeX or TeX or their relationship or how to use them. I tried to. I'm on Windows, so latex-project.org suggests [1] I download proTeXt.

The proTeXt website [2] tells me that "the self-extracting protext.exe file ... is well over 1GB". In fact it is 1.7GB.

Considering Office 2013 Professional Plus x64 clocks in at under 1GB I rofl'd my way out of there straight back to Word. And Excel, Outlook, PowerPoint and OneNote.

[1] http://latex-project.org/ftp.html

[2] http://www.tug.org/protext/

Most of the size come from optional packages. There are lean and mean distributions under 15MB of download.

And most of that is documentation of the optional packages IIRC.

"The participants were instructed to reproduce the source text within thirty minutes."

As a Latex user I would never use it for writing a 30 minutes text. I will use LibreOffice for that.

If the source text is made in Word of course it is going to have errors, defining error as "a different outcome that original word document text generates".

We DO use Latex for writing 300+ pages books, and Latex is great for it:

-It is not proprietary.

-Very easy to program and use different UI programs to modify the same document. We write using mostly voice, we edit it visually.

-Very easy to interface with our own software, for example for creating automatic graphs from sensors.

Efficiency in LaTeX vs Word always depends on what you are doing. At one job I was using bash, Powershell, pdftk and LaTeX to extract data from PDFs, VLookup using Powershell, and then annotate the PDF with the result using LaTeX transparencies (remember to run twice). Really simple stuff. It was a shit task to do before it was scripted with LaTeX. (five hours of mind numbing work vs 15 minutes of run time, plus it gave me a window of time for a coffee run). Use the right tool for the job, which in my case, turned out to be a Rube-Goldberg machine.

no. I like being able to manage the data as text readable. I like fine control over my formatting... and I especially like not being tied into a clunky 'ide' to manage the document.

How is being tied to a clunky text editor different from being tied to a clunky GUI? No matter which alternative text editor you use, apparently it's still clunky (slow) according to that somewhat limited research.

Text editors are clunky? If a text editor is 'clunky', it is almost assuredly not clunky in the same sense as an IDE.


Write it in markdown/commonmark (possibly with an extension for tables a la knitr).

Pick your favorite text editor and just type away merrily. Use latex ($\gamma$) when you want formulas.

Then compile it to PDF with pandoc (which will deal with the intermediate latex step). Then, resist the urge to endlessly play with the formatting and just work on the simple markdown text.

Does pandoc support commonmark?

I don't think it currently does, but it almost certainly will in the future. The primary author of Pandoc is the primary author of CommonMark.

Yes, I've been wanting to ask John for quite a while now about it, just always kept forgetting. The reason I doubted is because there's already pandoc-markdown...

Has he stated anything about the future between pandoc and commonmark? I imagine support for commonmark is likely, but is pandoc-flavoured markdown then doomed?

(I don't know how similar commonmark is to pandoc-markdown, but when I checked out commonmark at an early stage, it did handle lists differently, so I have reason to believe there would be more differences. [But I actually prefer how commonmark handles lists though.])

What i miss in the discussion is that using latex you concentrate on the content and not on the formatting. If you are looking for good layex editors there is one that outshines them all : emacs.

Yes, that's the theory, but in practice I find that I spend much more time futzing with the formatting in LaTeX than in Word, especially in two-column mode with figures. The article and various comments here suggest I'm not alone in this...

I'd love to have a system that actually delivers on the latex philosophy of separating content from formatting.

I find in Word I don't try too hard on formatting, especially in 2 column situations, because it is just going to look ugly no matter what I do. Using latex, in contrast, I really focus on aesthetics, which means some more time spent for decent results.

Why in a world, a person that is going to write an academic paper with charts, tables, graphs and stuff would choose LaTex? What is the point of comparing them? Even LaTeX says that it is not a word processor. Do not compare damnt.

Quote from LaTeX website: "LaTeX is not a word processor! Instead, LaTeX encourages authors not to worry too much about the appearance of their documents but to concentrate on getting the right content."

If you are going to write a philosophical article or a big novel, choose LaTeX, if you want to draw charts and graphs choose Word. Cristal clear.

If you draw graphs/charts, then use a WYSIWYG program like Word. If you generate graphs/charts from data, then use a compiler like TeX, which nicely integrates into a build process.

word is easier to get started, however, as the document grows, you will miss latex badly. image and table positioning are equally annoying if not worse than latex.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact