Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Book with Unix (joecmarshall.com)
132 points by webappsecperson 49 days ago | hide | past | web | favorite | 78 comments



I will not name the company, but a book deal I was working on fell through because I refused to install MS Office (since I didn't have a Mac and I refused to install Windows), they refused to accept markdown or LaTeX, and I couldn't get their template working in LibreOffice.

The part I find funny was that the book was about doing network server development with Haskell...on Linux.


There is a reason for word: Reviewers are used to Word's versioning and change tracking system, where they can suggest edits and the author can accept them or not. My publisher also offered using .prn files but then the reviewers wouldn't have done the full job ... (in the end I cancelled the contract for other reasons, so no idea how well that would have worked out)


Sometimes I wish non-programmers knew how to use Github...


I will admit that, after a few years of writing papers on GitHub with co-authors, that I do occasionally miss the MS Word track changes. Not how I like composing my papers overall, but I don't think I'll ever be happy using Git to properly version prose.


Can Google Docs replace MS Word in this workflow ? It seems intuitive enough and there can be real time collaboration. This might allow authors to not install MS Windows and MS Word


I don't have experience with Google docs, but I don't think they produce layouts which are good enough™ for book printing.

Google docs change tracking probably could be taught, but you are dealing with people who spent lots of time in their workflow and who are resisting to adapt for a single author. Mind that those reviewers and editors switch from book to book depending on frequency of author's feedback. The author is just one between many ...


I beg your pardon? I have never heard of a decent publisher doing the publishing in MS Word, i.e. the process from formatting to printing. The editing most often gets done in Word. As far as I know, Word lacks the formatting capabilities to do professional publishing. But maybe I just misunderstood you, or things have thoroughly changed in the last couple of years.


My story was ~10 years ago. The publisher asked to install a specific printer driver (as Word's rendering depends on printer settings) and using a specific Word template (defining margins, formatting, ...) and final result would be printing to .prn files.

If you look at many contemporary books typesetting is not an important topic for many.


Have you seen https://bookiza.io?


The idea is very interesting, but it seems broken on Edge and not fully working on Chrome: https://imgur.com/a/fL6MyiT


My wife most of her colleagues have moved all their collaborative academic writing to Google Docs, but she still has to copy and paste the final text into Word and fix the citations in EndNote to send off to the publisher.


This. I am regularly working on co-authored papers both in Google Docs and Overleaf, and they both have pros and cons.

Google Docs have a nice change-tracking and commenting system and next to no learning curve. The downsides are the formatting, which is rather unpredictable for complex-ish documents (figures are always a mess), and citations.

Overleaf helps get formatting and bibliography out of the way, but everybody must know at least some LaTeX, and you have to come up with your own comment-and-response system to keep track of what's going on. There is something baked in, but it is not satisfactory. On the other hand, it is very easy to just comment out stuff that is no longer needed in the main text but may still be useful or just serves as easy-to-see revision history.


You could have installed them on a VM for the sole purpose of writing the book. You didn't really wish to see the book published.


I mean, I don't know that I'm fully qualified to psychoanalyze myself, but I was going through a bit of a Stallman-esque phase where I said "No Windows in my apartment!"

I realize it was petty, and there are some regrets in how I handled it, I just thought the anecdote was amusing.


At this point I've decided refuse to deal with publishers/journals/etc. who don't accept .tex. (Unless it's at the explicit request of a close colleague.) .doc(x) is not a professional file format. (Nor is .odt for that matter.)


Did you end up publishing the manuscript elsewhere?


No; I never finished the book as a consequence of the deal falling through. The work I have is probably hidden somewhere on my NAS, though I suspect now the work would make me cringe since I have improved in my programming skills immensely in the last four years.

If you're asking because you want help with some network programming you are doing in Haskell then you can PM me on some kind of social media.


I suggest publishing it. Or publishing something else that's more relevant to your current or future work. And self-publish. A $30 book that you give to a prospective client is like a $30 business card. There's a wide range of self-publishing methods and services, but there are books on that subject that can help you make that decision, how to price it, and market it, etc. If you can provide a sane PDF, you've done most of the work a printing company cares about; the companies that assist in the self-publishing process should be able to accept almost anything, that's their job.

Write it as a reference that even you'd find useful, with emphasis on defining the basics and getting them right, and increasingly lighter when it comes to intermediate and advanced concepts. Those can either form future books or consulting or both. Since you've started writing, it's worth it to go through the whole process and publish. I did it with a conventional big publisher some time ago, but I wouldn't do that again today. The idea a big publisher would require you use Word is very familiar to me, and I think it's ridiculous.


Alternatively, if you don't want to spend any more effort on it, why not find a coauthor. Someone can take the text you have written, update it and refactor it and generally fix it up.


Why didn't you just download virtualbox, download a Windows 10 iso, and pirate >= office 2016 just to please them? If windows deactivates before the book is finished just run kmspico one more time.


Because one shouldn't be forced to illegally obtain and run a second-rate operating system to run a third-rate application?


Umm, because it was illegal? I didn't really want to do for-profit stuff on pirated software.


Lol the office file won't know that it's pirated. Noone will know just get virtualbox, free, win10 iso, free, and pirate kmspico and office 2016 or newer. As far as anyone will ever know you've paid for all of it. Just truck it.


I've been writing on Linux/Unix systems since the late 1980s, including now a few book-length projects. Tools have varied over the years, including some uni work in nroff, HTML, and more recently, LaTeX and Markdown, among other markup languages.

LaTeX strikes me as the ultimate tool, and far less intimidating than most people seem to think (Laport's book is an excellent intro), though for most purposes, Markdown is more than sufficient. In practice, I tend to use Markdown and either include inline LaTeX as needed, or convert to LaTeX and continue editing in that where finer-grained control is necessary.

I'd add pandoc to the toolkit, as well as GNU Make. With the two, I've got a standard makefile that can output a wide range of formats (I refer to them as "endpoints") ranging from ASCII text to standalone or snippets of HTML, PDFs, PS, MS Word, OSX, Mediawiki, and others. Adding a new endpoint is a simple matter of tweaking the makefile.

One piece of organisational advice: Do NOT apply your chapter numbers to your filenames. Instead, allow your principle document outline, using an include structure, to define the flow of the text. Depending on the project size and complexity, I'll either directly include chapters within that level, or have a top level of parts with chapters specified (as second-level includes) within those.

As you decide you need to re-work flow, this becomes much easier to manage and rearrange than if you'd pre-labled the files themselves.


TeX is necessary when you want to go to print and have a high level of quality, especially if any math is involved. For screen display Markdown is sufficient.


In addition to HTML for online viewing, I tend to produce PDFs or ePub formats for reading, either onscreen, on a tablet, or (very rarely) printed. I've discovered that consistent pagination really matters to me for content retention, an argument which still favours PDFs over many alternatives (though Postscript and DJVU offer similar capabilities).

Markdown is very nearly fully sufficient, and for almost any nontechnical work with minimal art or layout, will suffice. It falls flat in some interesting areas:

There's no underbar/underline markup.

There is no native colour markup.

There is no formula support -- not something typically encountered in most texts, but when you need it, you need it.

There is no fine-grained placement control for callouts, boxes, figures, images, etc. They simply appear where they happen to be dropped on a page.

(In several of these cases, you can revert to embedded HTML or styles, which are fine when rendering to HTML, but this won't be picked up by all Pandoc endpoints.)

Mind, if a work consists of nothing more than text, bold, italic, strikethrough, super/sub-script, lists, tables, sections, footnotes/endnotes, and images whose placement is not critical, Markdown is entirely sufficient.

But if you find you need more control, exporting to LaTeX and doing your final editing there will buy you a great deal more control.


I'm glad there's always LaTeX.

For any serious writing, Microsoft Word is one of the worst pieces of tools out there. It starts showing its ugly side when start using anything remotely advanced.


For any serious writing, Microsoft Word is one of the worst pieces of tools out there

Why? I'm no big fan of Word and doing any sort of layout in Word is painful, but for actual writing I don't see a problem. It has good tools for outlines, TOCs, reviews and comments, tracking changes, footnotes and citations (when combined with EndNote) and just about anything else I've ever needed. Hell the equation editor even lets you use LaTeX if you're into that sort of thing. What makes Word so terrible in your eyes?

And yes I wrote my Masters thesis and most of my other university work in LaTeX, so I know what I'm comparing it to.


A couple of things I've noted over the years:

- Fields, or anything that uses fields: They're completely broken. They only seem to function in straight-forward case. For e.g: it's very easy for sequences to completely go off whack.

- File comparisons: this feature can only tolerate up to a certain number of pages. It will crash and burn for large documents. I sometimes wonder why this is even offered!

- Formatting: No matter how carefully paragraph spacing etc. is controlled there are always instances where things won't work as intended. A good example is cover pages.

- Hanging and freezing on large documents while Word figures out how to render them.

- Font rendering: it's never great. It never looks the same as what it would on a printer (irrespective of the printer). The Mac version does a decent job, but the Windows version, even with ClearType configured, is not great.

- Lastly, my biggest issue: feature incompatibilities between the versions of Word available on Mac and Windows. The Mac version (the latest O365) release doesn't have a lot of advanced features that are available on Windows such as document signing and style breaks.


Word is horrible, but then word-processors are generally a horrible paradigm for anything significant. (They can handle basic things, but they're such a hostile environment for creating text, I don't understand how so many novelists work in them.)


Maybe some people actually like a graphical UI that does not require learning arcane syntax or keyboard shortcuts, provides a passing simile to actual paper and allows them to be productive (if not proficient)?

As for "anything significant", even OP shows one file per chapter. That's very likely what most users of word processors do, I'd imagine, even though Word (for example) had master/child documents support in the late nineties.


No. I get that people think that, but it's a mistaken belief. Word processors are at least as arcane, but they disguise and hide their arcane bits. That is, word processors give an illusion of being easier, but really they end up much more complicated.

Graphical UI? There are dozens of text editors that work with TeX and most of them provide graphical UIs, with menus and buttons similar to a word processor. In fact, as soon you want to do anything evenly mildly 'advanced', word processors end up being visibly more complicated.

In a word processor, I can click the 'bold' button, or press Ctrl-b to switch into 'bold mode', or highlight some text, and use the button/keyboard shortcut to bold that text.

In a TeX editor, I can also click the 'bold' button or press a shortcut to auto-create a LaTeX bold environment `\textbf{}` with my cursor placed in-between the {}s. Alternatively, I can highlight text, e.g 'my text', and click the bold button or press the shortcut and the editor will wrap `\textbf{}' arount the text, producing '\textbf{my text]'.

Up to this point the two approaches are equivalent. But now say that I want to make all instances of 'important phrase' bold. With the TeX/editor approach, it's just like any other search and replace, I tell the editor to replace all instances of 'important phrase' with '\textbf{important phrase}'. In the word processor, I have to figure out how to click into an advanced search-and-replace and choose something about replace/add styles etc.

In LaTeX, for something complicated, I can figure it out and write my function(s) for it, which are easily re-usable. In a word processor, what one 'knows' in the case of doing something complicated is a series of mouse clicks through menus - which is not only more arcane than an explicit function, but is likely to be disrupted by version changes.


The vast majority of users will forever be beginners - they'll learn a few techniques and wont bother with the rest - simply because that's not the only thing they do. For such users, a canned, ready-to-understand UI that looks like paper is pragmatically better than providing the capability to do whatever they want at the cost of becoming a power user in word processing.

People use spaces for tabs and newlines instead of page breaks despite there being solutions for them even on mechanical typewriters. As long as it works for them, that's ok.


Making a full fledged latex document is as easy as making a plain text file with the following content:

\documentclass{article} \begin{document}

Your whole life story goes here.

\end{document}

That's fscking difficult.


The real answer are products like FrameMaker and Oxygen XML.


Those look horrifying.


Only for those doing documents as if we didn't progress.

I was using LaTeX for university reports 20+ years ago.


Yes, being locked into to some rigid, fragile proprietary system is such great progress.

There's a reason that LaTeX was used 20+ years and is still the gold standard: the other options really suck.


DITA and Dockbook are not a fragile proprietary system, quite on the contrary.

There is a reason LaTeX is barely used out of academia.


> There is a reason LaTeX is barely used out of academia.

That's simply untrue. Look at any serious programming books. And LaTeX is used as the back-end for quite a number of text-processing tools.


Those written by academics, published by academic related editors, in old style layouts, black and white.

Professional publishing has long ago switched to DTP tooling, able to handle layouts and colouring in modern presses.


That's not a change. 'Professional' publishing has long used (inferior) DTP solutions.


Where is the LaTeX superior color management solution for typography press?


Great, you've got colour management with lousy type-setting!


Yep,better tell to the professional book editors the big mistake they are making in their high profile printing.


I've used DTP tools before too (though it's been a number of a years), so I'm not unfamiliar with the differences.

To be fair, the use cases are rather different. Regular fiction books are not so complicated in terms of layout. Magazines are generally involve much more complicated layouts - where there are questions of getting colours right, having text wrap, and so on.

But for technical work, it's insane to use anything not TeX-based (e.g. Scribble), even if you don't use TeX directly.


So DITA and Dockbook are other names for FrameMaker and Oxygen XML? No, of course they're not. I don't know why you continue to switch the topic.


They are industry standards for document interchange of books and technical documentation, handled by PageMaker and Oxygen XML, among many others supported by them.


Was hoping that this would be using the original UNIX typesetting tools like troff. Also, surprised to see that asciidoc wasn't deployed over markdown.


i was recently turned on to asciidoc from seeing that its supported at github

https://github.com/github/markup

and ironically development seems to have stalled on markdown with the advent of commonmark

https://github.com/commonmark/CommonMark/issues/558

https://github.com/commonmark/CommonMark/issues/559

https://github.com/commonmark/CommonMark/issues/560


Emacs Org-mode is also supported on GitHub and can be integrated into some pretty fancy workflows involving inline LaTeX rendered in Emacs. Org-Babel allows for a “notebook”-esque interface with embedded code samples, which show up non-interactively on GitHub.


I don't understand how these links indicate that markdown development has stalled.

All these issues were only created recently, by one person, and are actually not in the right place for the type of discussion they're trying to prompt. CommonMark have a separate [forum](http://talk.commonmark.org/) for feature discussion, which seems quite active.


Looking at https://spec.commonmark.org/ , it does seem it has stalled. They'll need to put in another gear if they're ever going to reach 1.0 (if that's a goal..?).


I like that you mentioned `bat` and `ag`, two of many favorite CLI tools.

Others you might want to checkout not necessarily for writing a book but general CLI pleasantness:

- fzf (https://github.com/junegunn/fzf)

- autojump (https://github.com/wting/autojump)

- jq (https://stedolan.github.io/jq/)

- fd (https://github.com/sharkdp/fd)


Many interested in autojump could probably get what they want out of nice, pure https://github.com/rupa/z

Here's an alternative to fzf, for comparison's sake:

https://github.com/jhawthorn/fzy


If you use and like `ag`, I suggest taking a look at ripgrep (`rg`). It seems to be by far the fastest out of three (`ack`, `ag`, `rg`). And it has a pretty interesting codebase (written in Rust).


If you're working in a git repository then IMO the most appropriate search tool is simply `git grep`. I don't think there's any reason to use ripgrep, ag, ack etc in that situation. (Personally, if I'm working with text files, then I'm nearly always in a git repo.)


(author of ripgrep here)

Well at least one reason is because ripgrep is faster. On simple literal queries they'll have comparable speed, but beyond that, `git grep` is _a lot_ slower. Here's an example on a checkout of the Linux kernel:

    $ time rg '\w+_PM_RESUME' | wc -l
    8
    
    real    0.127
    user    0.689
    sys     0.589
    maxmem  19 MB
    faults  0
    
    $ time LC_ALL=C git grep -E '\w+_PM_RESUME' | wc -l
    8
    
    real    4.607
    user    28.059
    sys     0.442
    maxmem  63 MB
    faults  0
    
    $ time LC_ALL=en_US.UTF-8 git grep -E '\w+_PM_RESUME' | wc -l
    8
    
    real    21.651
    user    2:09.54
    sys     0.413
    maxmem  64 MB
    faults  0
ripgrep supports Unicode by default, so it's actually comparable to the LC_ALL=en_US.UTF-8 variant.

There are other reasons. It is nice to use a single tool for searching in all circumstances. ripgrep can fit that role. Maybe you don't know, but ripgrep respects your .gitignore file.


Thanks! I knew ripgrep was praised in particular for its performance but I didn't know the difference was that large. The repo I usually work in has 8.7M lines of code and I had been finding `git grep` performance very adequate (I use it in combination with the Emacs helm library where it forms part of an incremental search UI, and hence gets called multiple times in quick succession in response to changing search input.) It looks like it will be fun to try swapping in ripgrep as the helm search backend; I'll try it.


jq's a useful utility, but I'm curious as to how you're using a JSON query tool in writing.


Org mode and pandoc work quite well for a similar workflow. The ability to move around chapter trees in org mode is a godsend. It's crazy to see how far the art of "word processing" deviated from WordStar days. MS Word's proprietary doc binary didn't help either (people would mess up formatting and lose entire documents). It's nice to see the focus come back to content and streamlining production with reproducible formatting.


I love articles like this. I write quick daily notes on my computer in markdown and back them up in a GitHub repo. (Using this fun little script: https://github.com/dcchambers/note-keeper) It's worked really well for me and helps me easily synchronize my notes between systems.

I love the elegance and simplicity of plain-text notes.


vscode has these two phenomenal plugins[1] that together convert vscode into a true journal with basic check lists and the ability to add arbitrary markdown notes to any particular entry.

It works wonderfully as a programmer journal since I generally have vscode open anyway (for gitlens even when I'm working in intellij) the friction is close to zero.

[1] https://marketplace.visualstudio.com/items?itemName=pajoma.v... and https://marketplace.visualstudio.com/items?itemName=Gruntfug...


i wonder why he has the "wrapper over wc" which does just what wc does?

    #!/bin/sh

    total=0

    for FILE in `find . -type f -name "*.txt"`

    do
        wc -w $FILE
        words=`wc -w < $FILE | tr -d ' '`
        total=$(($total + $words))
    done

    printf "%'d" $total

    echo " words"
all this achieves is

    wc -w $(find . type f -name "*.txt") | sed '$s/total/words/'
and frankly, i'm not sure the total->words substitution is worth the trouble.

then there's the inefficiency of running wc twice per file. while this is not exactly bitcoin-level disaster, it rubs me the wrong way...

    wc -w $(find . type f -name "*.txt") |
    awk -v t=0 '
      { print; t += $1 } 
      END { print t, "words"; }
    ' 
personally i'd just do this (in zsh):

    wc -w **/*.txt(.D)
the (.D) is two "glob qualifiers": the . (dot) limits the result to plain files, the D turns GLOB_DOTS on for the pattern.


Of possible interest is my open-source, Java-based desktop Markdown editor with live preview and variable interpolation.

* https://github.com/DaveJarvis/scrivenvar

* https://github.com/DaveJarvis/scrivenvar/blob/master/USAGE.m...

The software provides a simple way to include variables in technical documentation. It also integrates with an R engine for editing R Markdown files, which can also use variables sourced from an external YAML file. (Editing XML documents that have stylesheets is possible, too.)

My authoring workflow involves Scrivenvar, Markdown, pandoc, knitr, and ConTeXt. As Markdown separates content from presentation, I prefer ConTeXt to LaTeX for the same reason.


I am writing a book on introduction to theoretical computer science in markdown and use pandoc to transforming it into HTML, Latex (and from there to PDF) and MS Word. (The latter format is rather buggy at the moment, but I am including it because I've heard from visually impaired students that it is often the easiest format to read as you can control the font size.)

I've now put my scripts on https://github.com/boazbk/tcs/tree/master/scripts in case anyone finds them useful. (This is not a "plug and play" package that you can install and use, but people that are better programmers than me might be able to adapt it and improve on it.)


Was hoping to see some sed tricks, and would have settled for some vim, but I guess I need to stop being such a gatekeeping grump about stuff. Hell, maybe I'd like SublimeText if I tried it.


Looks like the author is just using some apps that happen to run on Unix.

I write all my books using vi (not vim!), troff and friends, make, and ghostview (gv) for the layout. Plus a couple of shell/awk/sed scripts for making the TOC, index, etc. I cannot imagine any better tools for the job. I tried LateX, which only got me into trouble, and Lout, which was fun, but too complex in the end. After 20-something books, above turns out to be the sweet spot.


Looks like everything he does can easily accomplished using emacs orgmode and pandoc

Ripgrep is maybe an alternative for ag


Any sufficiently advanced text management system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Org mode.


Or even just Org Mode.


Is a better ag.


> I could use a git repo to keep a backup of the book and ... ag (basically a faster grep)

IMO if you are using git then you should use `git grep` rather than ag/ripgrep/ack/grep etc.


On the "txt" script I want to note that this does the job for files like those in the example that have no whitespace:

  wc -w $(find . -type f -name "*.txt")


I will give bat a try. That looks nice :)


> I will give bat a try. That looks nice :)

Two of the three comments already referenced bat. It's also what impressed me the most. It looks great. Link below:

https://github.com/sharkdp/bat


Misleading title, I thought we has writing his book on an ancient AT&T UNIX mainframe. He's using Linux or MacOS, bfd. Why do people throw around the word Unix to describe Linux is beyond me. MacOS claims it's Unix too, they payed some organization to get them a Unix cert, but we all know it's BSD/Mach/Darwin rewrite. Nobody is really using the original Unix.


It's because most people don't confuse the etymology or history of a term with its actual use and current meaning -- and aren't stuck up with BS pedantic distinctions.

There's no "original Unix" (except the first Unix back in the day). There's lineage of operating systems.

Heck, even the people who actually created UNIX in the 70s and early 80s don't have such stickups.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: