Hacker News new | comments | show | ask | jobs | submit login
The future of education is plain text (simplystatistics.org)
368 points by simplystats2 160 days ago | hide | past | web | 334 comments | favorite

Plaintext certainly seems more attractive the more docs I write. Over the years, with both work and personal projects, I've used every format from:

- Notepad

- Microsoft Word


- Twiki

- Various proprietary WSYWIG that compiles to HTML


- Raw HTML

- Markdown (several flavors)

With nearly every kind of migration, there are numerous pain points. The "raw" formats are a nightmare to edit and update, and the compiled ones require several hours of changing syntax, image locations, etc.

I've been getting so tired of having to re-do stuff on different platforms that more of my docs are starting as Plaintext and then written in pseudocode markup for areas that I know will change on every platform (e.g. generating a table of contents, image tags, etc).

Having just coded an entire website from scratch that was basically just documentation, Markdown comes remarkably close to doing what I want, except when the common format fails to meet my needs, which forces me to then have to switch to a specific flavor of Markdown in order to get something as basic as tables.

The docs of mine that seem most resilient to platform shifts (other than plaintext) are the ones that are written in or compiled to longstanding formats like LaTeX or HTML.

So perhaps my takeaway is, write in something readable that compiles to something widely available. That will provide the least headache.

If you're interested in adding LaTeX to that list, you might like to try Overleaf [1] -- it's an online collaborative LaTeX editor we built to try to lower the learning curve (I'm one of the founders).

It includes a rich text mode for easier collaboration with non-LaTeX users [2], and you can also write in Markdown if you like :) [3].

Feedback always appreciated if you do give it a try, thanks.

[1] https://www.overleaf.com

[2] https://www.overleaf.com/blog/81

[3] https://www.overleaf.com/blog/501

Thank you so much for Overleaf, as a new user, its great!

I really like how easy it is to experiment with the LaTeX formatting (something I wasnt super familiar with) and immediately see the output.

I had someone send me their resume template on overleaf and it was super easy to get a similar product with my personal touch.

The only feedback I would have is it was a little awkward to do folders within folders (this was a few months ago) and I had to ken the "hey put a path separator in the name" before I got it.

Thanks for the feedback - great to hear you've been finding it useful and yes, improving the file tree is definitely on our list.

Good luck with your future projects!

Overleaf was invaluable while I wrote my thesis. Thank you!

Congratulations on your thesis -- it's a lot of work, and I'm glad Overleaf helped :)

Did you try org files ? Even if you're not into Emacs it might worth the try reading your needs

Org mode format is great. Pandoc works with all of those formats, including org mode.


Pollen seems really cool.

I've found that Pandoc [1] is a wonderful converter if you want to write in a future-proof format (like markdown) but also generate nice PDFs.

[1]: http://pandoc.org/

I've been writing daily notes in the same text file since 1996. Fascinating to look back through old entries.

Format is dirt simple: date alone on a line is the only special element. The main thing I really miss is the ability to scribble drawings inline.

I just use markdown for what it's good at and 'drop' to HTML for the stuff I can't do in 'plain' markdown.

Python has a lovely library for it and combined with jinja you can get a hell of a long way in a couple of hundred lines of python as a static site generator.

> Having just coded an entire website from scratch that was basically just documentation, Markdown comes remarkably close to doing what I want, except when the common format fails to meet my needs, which forces me to then have to switch to a specific flavor of Markdown in order to get something as basic as tables.

I just did essentially the same, in reStructuredText.

The battle is so lost that I don't even bother fighting it, but there are many nice things about reST: you have to explicitly escape html to get it passed through as html (feature/bug), tables are easy, especially the list tables, I'm used to it (feature? bug?), ...

AsciiDoc seems the way to go then.

It's like DocBook with Markdown like syntax

Btw, AsciiDoc could be used with gitbook-cli. It's not Sphinx but still powerful enough for all kinds of documentation.

hmm..... having seen a bunch of demos of almost-compiling LaTeX produced by neural networks, I wonder if there's a good space for building seq2seq document format converters...

(also makes me wonder what style transfer from latex to markdown would look like; automatically convert your years-in-the-making research project into a snarky blog post!)

Plaintext / basic Markdown with an "organizer" like Ulysses [0] or Quiver [1] can serve just about all needs.

[0] https://www.ulyssesapp.com

[1] http://happenapps.com/#quiver

No Windows or Linux? Okay.

Ulysses (and probably Quiver too) can just give you a "pretty view" of your existing folder structure, and since it's all plaintext/MD you can work with your data on Windows/Linux just fine, and there's nothing keeping someone from makes similar apps on the other OSes.

I used to use OneNote for everything, until I moved to Macs... Not only there was no Mac version at first, OneNote STILL doesn't let you import old local notebooks into the Mac version.

With Ulysses/Quiver at least I know that what I write on the Mac will still be readable on Windows/Linux..

>With nearly every kind of migration, there are numerous pain points. The "raw" formats are a nightmare to edit and update, and the compiled ones require several hours of changing syntax, image locations, etc.

And plain text is itself a nightmare to parse and do anything beyond reading it as a block.

How is it harder to parse than a PDF or a docx?

I'd argue that it's hard to parse because it's so open-ended. E.g. Markdown is only one example of how to structure textual data in plain-text with additional "syntax".

Imagine it wasn't decided on Markdown. E.g.: I don't use markdown when making plain-text notes. I make it all hierarchical and indent using "- ". I.e. headings are top-level, sub-headings are indented by one, and so on. Eventually, you realize that all things can be represented as just lists of nested lists. No need for some arbitrary "heading" type, just nest it all based on the concept.

Certainly, but once you obey some conventions (generally accepted or your own), it's easy to parse and transform.

I mean compared to structured file formats like JSON, YAML, TOML, XML and whatever (including binary such).

Obviously it's not harder to parse than a PDF.

Uh, "plain text" is not a pair of words that I would use to describe "structured data" at all. I don't even know what someone would intend to mean in such a situation.

If someone told me they needed me to parse a "plain text" file, to me it sounds like they'd want me to provide some statistics on an unstructured text blob.

Free form text does have properties, such as byte length, character count, word count, line count, sentence count, paragraph count, and so on. Most of which is simply variations on whitespace delimiters.

But an individual plain text file... all on its own? Could be a shell script? Could be a newspaper clipping? Could be a dictionary file? Could be an array of XYZ vertices, edges and faces? Could be all of the above, and an email signature at the end?

Someone tells me: "here have this plain text object," and I presume it is a monolithic blob of ASCII, if I'm lucky, and maybe it's War and Peace or Moby Dick expressed in emoji if I'm not.

>Uh, "plain text" is not a pair of words that I would use to describe "structured data" at all. I don't even know what someone would intend to mean in such a situation.

Well, most would agree markdown is a plain text format for example, but it's not totally unstructured.

Still it's a bitch to parse compared to something like JSON.

I'm building course notes right now for a college class using markdown & MathML/LaTeX, that i'll be hosting on Github pages for the students to browse. It's by far the best balance that I've come across yet.

That's.... pretty much my exact experience (oh, I get to add multiple years of latex use).

Html or plaintext is all I trust now. A nice editor on top of it and I'm good. If you have to do larger document writing, I recommend Madcap Flare.


what, recommending just one more format to learn? I don't think that's what the parent post is looking for : )

Asciidoc/Asciidoctor is non-ambiguous (unlike Markdown), works with different workflows and unlike org mode you can use different editors, easy to learn (unlike LaTeX). ReStructured text is the second closest but there you have to have Python in order for it to be useful, so Asciidoc is the only one plausible universal text markup as I see it.

I couldn't have said it better.


I'm going to second that.

RestructuredText is very powerful, and has an official specification. Basically what Markdown is missing. Yet, it is plain text and easy to parse.

I like reStructuredText, but the format is grating compared to markdown. The two things I do the most are sectioning and links. Inline links are ok in rst, but I prefer markdown. Sectioning though is awful. I hate having to hold the damn == or -- or ~~ down and getting the length right. It's easier to parse when you read the plaintext, but makes me loath writing it. Most the time I deal with md or rst, they end up formatted anyway, which is the point IMO.

Personally I don't think sectioning is that bad, a little annoying when working with other people that don't do it, though...

... the easy and built-in extensibility is a huge plus though, as is Sphinx and the pretty good output writers for many formats. Sphinx btw. works with Markdown, too.

Quite possibly my biggest annoyance with rST/Sphinx is the hanging toctree issue, and poor support for multi-format images/figures.

The 100 % offline search in Sphinx is somewhat of a mixed bag and doesn't always give good results, though it's fine for reference works.

obligatory xkcd: https://xkcd.com/927/

Plaintext OrgMode notebooks exhibit every benefit listed in Minimaxir's [0] post, and have the additional advantage of powerful editor integration beyond R code.

Here is an example of using the IPython kernel to evaluate inline Python code within an OrgMode document.[1][2]

More information on how to create multi-language notebooks with OrgMode Babel here[3]

[0] http://minimaxir.com/2017/06/r-notebooks/

[1] http://kitchingroup.cheme.cmu.edu/blog/2017/01/29/ob-ipython...

[2] https://github.com/gregsexton/ob-ipython

[3] http://orgmode.org/worg/org-contrib/babel/

Org format is great, but the designs of many of its more advanced features rely heavily on the Emacs editor's ability to collapse metadata. Once you've built a file that uses drawers, tags, some spreadsheets, and several levels of nesting, the resulting file can be too cluttered with metadata to be easily understandable in other plaintext editors.

Meh. Most of the metadata is actually not that hard to understand in source form.

Though, I think this is a valid concern (and I hate that). It is just similar to how people used to have markup in their documents to see what they were typing, but people are lured by anything that hides this markup. See, nobody actually likes typing \bold{hello} instead of just highlighting and bolding. More, people just want the markup to be hidden.

Orgmode shines not because it hides anything. But because it made a very coherent set of macros to type the markup that I want to use. That and source blocks. (Ok, mainly source blocks.)

> the resulting file can be too cluttered with metadata to be easily understandable in other plaintext editors.

This is basically why I switched to Asciidoc. And if you need to collaborate with others you can point to different editors that people can use, there is even AsciidocFX which is made specifically for new users.

Not using those other editors is an added benefit.

Not unless you work alone. When you work with other people, it's important to use a common format for the benefit of the group. Trying to force other people to use emacs won't win you any friends.

Org exports to anything including utf8 plain text with a nice style reminiscent of IEEE RFCs.

Without using Emacs Org-mode however, you're stuck with something that only you can properly enjoy and edit, so everyone who doesn't like Emacs is SOL.

Using emacs is a giant drawback however.

If you are more of a vim person and don't like emacs modifier keys, I suggest checking out spacemacs.

Oh no, I'd rather use a modern editor. Both are cults that need to die.

There is an open source Javascript package for OrgMode file parsing and converting here[0]

Example demo javascript webapp[1]

Orgmode Vim[2]

OrgMode IntelliJ [3]

OrgMode Atom [4]

OrgMode Sublime [5]

OrgMode VSCode [6]

[0] https://github.com/mooz/org-js

[1] http://mooz.github.io/org-js/

[2] https://github.com/jceb/vim-orgmode

[3] https://github.com/skuro/org4idea

[4] https://atom.io/packages/organized

[5] https://github.com/danielmagnussons/orgmode

[6] https://github.com/jsonreeder/vscode-org-mode

Unfortunately org-mode is not backwards-compatible and there have been many changes which break things unnecessarily.

For example, I once attempted to run this org-mode "notebook", ironically titled "Reproducible Research with Emacs Org-mode", and found I had to make significant cosmetic changes to get it to build: https://github.com/eliask/orgmode-iKNOW2012

Maybe things have improved since, but backwards compatibility is important for these kinds of formats.

Plain text: so that no one can own the presentation method.

Plain text: so that no one can own the distribution method.

Plain text: so that no one can own the creation method.

Plain text: so normal people can recover data even when partially corrupted.

Plain text: so you aren't forced to see jarring ads.

Plain text: so that there are no tracking pixels.

Plain text: because connecting information with hyperlinks doesn't require all of HTML or even computers.

Plain text: because it's good enough for metadata.

My future and knowledge is in YAML-fronted markdown and YAML metadata for binaries. Let's take back our data. Look out for Optik.io.

Started with a Stallman-esque rant I could get down with and then ended in a very un-Stallman-esque and opaque ad. :(

I was half expecting it to end in a plug for GOPHER.

Ok fair, this is the first time I've gotten public interest! Awesome. Please hold shortly, website incoming...

(Just for reference, I do want to build a business on open data, but not by owning your data. I would like to help people manage their data.)

EDIT: Website is live! Try https://www.optik.io until DNS catches up, if optik.io is not responding for you.

Plain text: so when Optik.io melts down immediately from an HN "hug of death" one can use high-availability services instead.

There's nothing to melt down, there is no (public) website there. I can barely get a conversation started on this topic without HN's cynicism, as you demonstrate.

But as evidenced by this and other articles, as everyone is getting fed up with the current state of data, people are coming around to the idea.

> There's nothing to melt down, there is no (public) website there

You might want to be a bit more clear in your communication. When someone links a URL/domain, you'd expect that there's something there.

I mean, why not put up a splash page on your domain, telling something about your product (what is it? Something with plain text? A Markdown-based static site generator, maybe? Who knows.), or maybe a form to subscribe to a mailing list.

Did you think people would put that domain in their bookmarks and re-visit it themselves?

Thanks for your feedback, HN does not pull any punches. I've been pitching this slowly for a while as I honed the product.

I want to help organize data using an open format. I think user control of presentation is crucial as AR and VR work further into daily life -- especially considering accessibility.

Splash page is now up, including the product description, and form to subscribe to a mailing list. Please check it out, I'd love to hear what you think!


There's no product description whatsoever...

How do you plan to monetize plain text?

As places to hustle for your project go, HN is probably one of the gentler ones. Just build first, THEN plug.

  There's nothing to melt down
This site can’t be reached

optik.io took too long to respond. Search Google for optik io


The thing about jarring ads is a great point. I have lynx installed for those times when I've been jarred by one too many. It can be refreshing to navigate the web in a console window.

Nope, because plain-text is incompatible with centuries of mathematical notation.


My wife is a math teacher, and the piss-poor experience of writing mathematical notation outside of MS Word keeps them stuck on Word. As much as she and I love LaTeX's math notation, she can't get her departmentmates onboard with that kind of syntax.

They need to maintain notes, guides, tests, quizzes, etc. and basically running a team OneDrive is really the only option because all the alternatives utterly fail a group of non-technical mathies.

Fun fact!

Microsoft Office has two internal math formats, one of them is Ecma Math[0], the other the other is the "Unicode Nearly Plain-Text Encoding of Mathematics"[1], it is a Unicode standard and uses only Unicode standard characters. It has the unfortunate property of being hard to type on its own (the integral character isn't on most people's keyboards), but it is pretty easy to read as just plain text. If you copy an equation out of Word you'll get something like this: ∫(x^2/2)

[0] https://blogs.msdn.microsoft.com/murrays/2006/10/06/mathml-a...

[1] http://unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf

Disclaimer: I work @Microsoft improving on some math features, and we are the main implementors of the spec. Which makes me sad, since it is an open spec and it is really powerful!

Funny that it's not ∫(x²/2), though!

OK, I went and looked at the standard and it seems like they're thought about this: they allow input as either ∫(x²/2) or ∫(x^2/2) but they apparently output as ∫(x^2/2) because it's more general and editable, like if you wanted to change the exponent to something that didn't have a predefined Unicode superscript.

I like that, except I prefer "un-necessary" parenthesis to be explicit about the ordering of operations.

Thus assume only + - and * / are unambiguous, everything else be explicit about ordering.

EG: ∫((x^2)/2)

It's alright, somebody has to do it first. Though you shouldn't be surprised that a specification written by an employee at Microsoft is still primarily implemented at Microsoft.

Maybe it could be integrated into the Unicode shaper (HarfBuzz/Uniscribe/AAT) with a language code for "maths".

Since you're here, is something like that relatively easy to use in a UWP application? I have an idea for a chemistry app and am looking for a good way to show formulas.

> Since you're here, is something like that relatively easy to use in a UWP application? I have an idea for a chemistry app and am looking for a good way to show formulas.

Unfortunately not! I am wondering if I should make requests to the Windows team internally, the Managed wrapper class they ship has the math features compiled out. :( I'm in Office, where we use an internal UWP safe C++ version of Rich Edit. We bundle it with our AppX and load it up like any other C++ DLL.

So in other words I shouldn't hold my breath for this to appear on NuGet. That's a shame. It seems like it would play nicely with MarkDown.

I fired off an email to the team responsible. :)

Please post over on https://wpdev.uservoice.com/ and get others to upvote! If internal and external asks line up, it becomes much easier to argue in favor of doing a feature.

Ugh ugh ugh no.

Storytime. A couple years ago, I was in the midst of copyedits on a book with a bunch of math in it. The copyeditors were using a different version of Word.

When they sent the edits back to me, the math was gone. Completely.

Not only that, but when I tried to copy-paste the math in from a prior draft, the Word file refused to save.

Eventually, I had to reconstruct every one of the damn things, by hand.

(Happy consequence: I caught and corrected an error doing so. But still!)

That would never happen in a plain text format. And this was the experience that made me abandon word processors for good.

"What You See Is Not What Others Get"

Well, one advantage of OneDrive is you have full history.

Then try AsciiMath[0]. It does a fine job of representing math as plain-text, and I find it far more sane than LaTeX for my typical use cases.


        0 & 1 \\
        1 & 0 \\

    [[0,1], [1,0]]
[0]: http://asciimath.org/

Neat! I wonder if asciimath could be extended to an "intermediate form". Just like how there are two forms of MarkDown, there's the easy to type

    # this is a header
and the more elaborate

    this is a header
which conveys more "headerness" and is more readable than the first.

the same could be done with something like ASCIIMath, where a plain-Unicode representation could be an intermediate form between ASCII and MathML. Why? Because keyboards don't type in Unicode, but you don't want to be storing only the final output - storing an intermediate Unicode form seems best, assuming you can keep modifyign it using ASCII and then going ASCII + Unicode => Unicode.

Try Lyx. It's a front-end for LaTeX. It's UI for math notation is very similar to MS Word's. (IIRC - I used them about a decade apart.)

Something Overleaf [1] might help -- it's an online collaborative LaTeX editor with a rich text mode [2] to help make it easier for non-LaTeX users to collaborate on LaTeX docs. If you do use it, feedback would be appreciated -- I'm one of the founders :)

We also published a short post on 'the stoic resilience of the PDF within the digital ecosystem' recently [3], which seems relevant...although it's just a short background piece.

[1] https://www.overleaf.com

[2] https://www.overleaf.com/blog/81

[3] https://www.overleaf.com/blog/509

> piss-poor experience of writing mathematical notation outside of MS Word

Mind blown. I only write a little math, and I can't imagine my first choice not being MathJax/LaTex in a plain text file.

I use Microsoft OneNote to take notes. The math notation is almost exactly the same as in LaTeX and you basically compile it on the fly with the spacebar. <ctrl>+'=' toggles math mode on and off from the Mac. There are some hiccups, especially with OneNote keeping track of fonts (in my experience) but you get used to them.

This is mostly a social problem, not technical. It's a shame, as MS Word has at best mediocre support for typesetting mathematics.

TeX users would like a word. Of course, that requires learning TeX notation for those reading it without an interpreter.

TeX is a joke.

I mean, it's not a joke for its intended purpose; that's fine. It typesets the #$@! out of documents. But TeX is a joke for mathematical notation in particular. Put another way: the best way to understand what an arbitrary mathematical expression in LaTeX really means, is to render it as an image and read that image. TeX can be understood best as a concise way of writing a certain class of vector images, and when you are reading TeX you are reading a computer program which generates an image.

I'm not saying it can't be used for this context, of course it can, I have used it a ton and found it quite enjoyable. The fact that it's an image-based representation makes it very easy to switch from `\int_A dx~\int_B dy~f(x,y)` to `\iint_{A\times B} dx~dy~f(x,y).` However let's not mistake the fact that TeX does not know and does not want to know how you are using the `\int` and `\iint` symbols, is 100% OK with omitting those `~` characters, and has no semantic conception of what `dx` and `dy` are. If TeX were the CAS that it doesn't claim to be, as far as it's concerned that expression could be canonically refactored to `\iint d^2fxy(x,y),` since no one wrapped the `dx` and `dy` in curly braces. TeX claims to be a typesetting and layout system, and it does that well; it's not trying to be a universal mathematical notation.

The contention of the original link posted is, all of these image-based formats like PDF and lecture videos are going away. This may or may not be true, but if it is true then TeX is not going to survive the death of images, precisely because it is a programming language for a class of images.

Maybe something else will survive. The biggest player right now is the Wolfram language, of course, but that can look terribly unwieldy too.

Very well said. I only use TeX over Word for writing math because emacs 1. makes writing it easy with macros and helper functions, and 2. is able to display the rendered visual snippets inline in the text. If I could not have the second feature, TeX would be borderline unusable, and this dependence on editors and tooling means it's not truly open. Can't envision what could be better for mathematical notation in plain text, though.

I've been using LaTeX for about 5 years now. I don't recommend picking up TeX notation for reading maths in plaintext. It's the nicest way to produce nice looking mathematics PDFs or MathJax, if you have a powerful text editor. And after a while, you learn things like "underscore is for subscript" and "^ is for superscript".

But the point is, mathematics written as a plaintext document needs an interpreter, unless you have been writing papers with math notation for some years. But it is still far more cumbersome (and looks ugly) than writing formulas by hand. And that's not exactly plaintext learning material anymore, then, even though the sources are text files (unlike Word documents).

My ideal workflow (the one I dream about) would be a document camera -like setup that parses math I write on a paper (or a blackboard) into LaTeX (or MathML) style format and ~immediately renders it as beautiful document (with MathJax or similar tool). Like Overleaf, but without typing. (And of course then I could open the source file in text editor to make edits.)

Not sure if you are aware of detexify [0], but it recognizes my trackpad-scratch quite well. The backend and all of the training data are open source, so you could indeed implement this yourself if you wanted.

I'd buy mobile app any LaTeX OCR software.

[0] http://detexify.kirelabs.org

Yes, detexify is amazing! That's probablly how I got the idea in the first place... "Imagine if detexify would automatically read all of my math writing, not symbol by symbol what I draw with mouse."

TeX is completely avoidable. I have never used it apart from at university decades ago, before I learned Word. I have done proper science jobs where papers get written but still not needed TeX. Furthermore, in all my years surfing the internet, I have never downloaded a TeX file willingly or unwittingly.

It varies by field. Try publishing in Math, CS, or Physics without using TeX.

Yeah, go teach a 60-year-old math teacher to use TeX.

I'm over 60 and have been using TeX since '85. Same manual after all these years... I still seem to be able to learn new stuff, though it's not always clear it's worth the bother.

So I'm guessing you were 28-38 when you learned TeX, rather than 60+.

If you were over 60 in 1985 and are still posting to HN, that would be really cool.

Indeed, I was 30 and my brain was still plastic. Really, the only notable thing from my post was the fact I'm still using the same Latex manual after all these years. Kinda ragged now, but still useful.

I will be 60 in a few months. I am a math[s] teacher. I have been using TeX since late 80s/early 90s when I started using small computers.

why would someone with 40+ years of math experience have such a hard time learning a new notation?

I don't think their assertion is that someone with 40+ years of math experience would have difficulty learning new notation, it's that LaTeX is composed of a lot more than straightfoward notation. LaTeX can get pretty complicated pretty quickly. While it's basic math notation is quite fantastic, it's the basic document formatting that I think would give most people trouble when learning.

Because the error messages from latex are truly awful, even just putting an underscore or less than in plain text is painful.

Assuming they have something that already works for them, lack of motivation.

Because knowledge about maths is independent of computer and programming literacy.

if we use a liberal definition of "computer and programming literacy", then it's just as much a requisite for math as it is TeX.

if we use a conservative definition of "computer and programming literacy", then it's not a requisite for either.

markup languages are not programming languages. most people have some familiarity with some markup language.

The 60 year old math professors at my faculty read and wrote LaTeX perfectly fine. One time I helped find errors in the manuscript of a book one of my professors wrote in LaTeX by himself. He was 79 years old at that time

Actually, the standardization of most symbols from mathematics and logics that we use today happened in the 20th and late 19th centuries. In my opinion, with UTF-8 one can represent most formulas in a very readable way, and also many diagrams. But if more expressive power is required, plain text TeX is available.

If you distribute things in Markdown with (e.g.) MathML markup, then the reader can format that nicely for display, with PlainText / Markdown being the distribution format.

I think it really stretches the definition of "plain text" by the time you need a parser.

I don't think it does, as I thought the title refered to the ubiquitous "view source" web tech compared to binary blobs.

Unicode has some symbols. But there is limited ability to arrange them freely. e.g: representing a continued fraction in plain text seems a very futile exercise.

The "Unicode Nearly Plain-Text Encoding of Mathematics" standard[0] can do this just fine. :)

a_0+1/(a_1+1/(a_2+1/(⋱+(1/a_n ) )))

Past that into an editor that accepts the spec, and you'll get something like http://imgur.com/a/7hBwv

Disclaimer: I work @Microsoft improving on some math features, and we are the main implementors of the spec. Which makes me sad, since it is an open spec and it is really powerful!


PDF and OOXML are also open specifications.

Try to implement them though, you will run into a lot of issues.

I'm curious about that one - was the space necessary?

I just tried it out in Word, apparently it isn't. :)

Subscript and superscript are the bigger problem than fractions, also larger-than-one-line parentheses. Fractions can be aped in the ugly approach of dashes just like how Markdown underlines headers.

Matricial data is quite hard to explain well in a non-graphical mode, though.

To be honest, human math notations is a weak type system combined with confusing single letter variables.

Some kind of plain text math notation using numpy expressions would be cool.

The single letter variables are opposite of confusing: they bring simplicity and clarity. This is not programming, where if an integer represents a destination address, you can meaningfully call the variable destination_address (or destinationAddress or dest_addr or whatever). In math, an integer is often just any integer, and rewriting formulas to replace "i" with "any_integer" will not make them any clearer.

If math notation is unclear to you, it's only because you spent little time learning and using it. Mathematicians care about clarity even more then you, as they actually do indeed spend large parts of their lives reading and writing math. They are very quick to adopt new notation, if it brings meaningful benefits over the old one -- for example, the commutative diagrams and category theory language is now commonplace in all fields of algebra and topology, because it is much easier to draw a diagram and claim it commutes rather than name all the maps involved and write down all the equalities.

markdown (pandoc compatible) + mathjax mathematical notations within dollar sign is future proof. Easily convertible to html/pdf etc, easily readable in raw.

Then, mathematical notation should be simplified.

Perhaps we should also all learn Esperanto as well, since English is such an inefficient and inconsistent means of communication.

There's been some work on english language versions of mathematical expressions, which at the very least can help with accessibility.

Handbook for Spoken Mathematics (1983) http://web.efzg.hr/dok/MAT/vkojic/Larrys_speakeasy.pdf

MathPlayer https://www.dessci.com/en/products/mathplayer/

TalkMaths http://talkmaths.sourceforge.net/

Language and Mathematics: Bridging between Natural Language and Mathematical Language in Solving Problems in Mathematics http://file.scirp.org/pdf/CE20100300008_45591409.pdf

I am not sure that is a good idea. The reduced entropy of our writing system, as easy to learn as it is, doesn't work very well with the density of mathematical formulas. If anything, I would favour more symbolic formulas.

Have you tried? You start to run into trouble the moment you have fractions and exponents.

There are plain text versions for that as well.

Not sure it's so simple. Is LaTeX plain text? Officially it is, but by the same standard so is PostScript or PDF.

Truly the limits are blurry. Even assembly code has an ASCII representation.

Well, plain LaTeX is quite readable, so IMO, it's good enough of a compromise. It does get difficult with a lot of parentheses though.

Which example did you have in mind?

http://asciimath.org/ is the best one I've seen, by far.

AsciiDoc with inline LaTeX?


You need be a little more detailed with this.

As others have mentioned, once you dig into the innards and nuances of plain text representations and formats, things can get hairy. Still, I think the author is correct in that plain text formats are certainly a better base for sharing curricula, and for knowledge production in general, than something proprietary like word docs, pdfs, etc.

I think markup languages like markdown which are both fairly easy to convert into other formats and deliciously human readable are the way to go.

PDFs are still ok though. There are many implementations of PDF parsers, many Open Source. It may not be as "universal" as plain-text, but it is definitely universal. At this point, I believe that all major operating systems ship with the ability to open PDFs for display.

PDFs presume that the writer has "control" rather than the reader. Actually I probably shouldn't put quotes around "control" -- formatting is rigidly constrained, on purpose.

In addition PDF adapts, by design, a model of paper to the web. It's a "horseless carriage" file format.

Font size, page width, cut/paste and presentation in general should be the reader's choice, not the writer's. The Web manages this, sort of.

The OP is right on in this regard. Even the TeXs of this world, while better than the binary formats, have upgrade complexity.

>should be the reader's choice, not the writer's

Sometimes. There are a lot more design options available if the creator of the content maintains control over layout, fonts, etc. Sometimes this doesn't matter--if it's a block of text for example. Fairly simple layouts also render pretty well on the web.

Different content works better or worse with different approaches. One isn't intrinsically superior.

On the other hand, how does one write a standard invoice or a tax form using only plain-text / markdown? Formats with control over the layout have their place.

Maybe not just writer/reader, but also a presenter. In the context of education, a teacher will often be using someone else's material but presenting in their own style.

"PDFs are still ok though."

They are OK in theory, for the reasons you state.

The problem is the gratuitous use of PDF which we have all experienced - here's a common (pathological) example:

Document author starts with plain text - no special formatting or fonts, no images, etc. Somehow their toolchain converts that into a PDF file that contains No text, but rather an image of text.

The result is a big, bloated, unnecessary use of PDF that cannot even be parsed or used with anything but a graphical PDF viewer since the text is now gone - there is nothing but a picture. Of text.

A PDF without text still kind of works. It not accessible which is a big problem. However, this happens when there is a mistake somewhere in the chain. This is all too common, but what is the correct way to handle tables, diagrams, formulas, images, graphs, links in plain text. There is no right way to do these things in plain text. You will eventually need html or pdf for education.

Also: PDFs have pages and thus page numbers, which are an important tool. When working with long texts you need reference points to share your state.

That's what hyperlinks are. They even have the better property that they don't break if you change the font size for the whole text.

They don't mean much in a classroom of say 30 or 40. A lecturer will have n people with n texts and will have to synchronise them. In a typical lesson you jump to different pages also non-sequentially. The best option is a PDF etc. that you can optionally print. Anything else is a recipe for chaos, especially if you're not in a CS classroom.

Are they? I can't even copy text from a PDF without the new lines and indentation getting completely messed up never mind the rest of the formatting...

PDFs are awful for the blind, most PDFs are very painful to extract text from.

    PDFs are awful.
Could have stopped right there.

I agree that PDF is effectively universal as a distribution format for graphical and formatted text. There are some cases where plain text is still nicer, like when trying to read documentation without an X server.

PDF is a terminal format.

Even Pandoc gives up on trying to read it. To be fair, though, I think that the office suite file formats are even worse.

I have done a lot of work with code for handling pdfs.

The spec is 1000+ pages, references other docs, has many omissions and contains much that is apocryphal, or at least wildly inaccurate


right now, today, tuesday june 13th, 2017, safari will not open a .pdf to a specific page or bookmark, neither on the desktop nor in ios.

chrome can. firefox can. but safari cannot. which means neither the iphone nor the ipad can do it.

imagine if -- on the web -- you couldn't deep-link to an anchor in the middle of a webpage, but merely to (the top of) the webpage itself.

that's not the only deficiency of the .pdf format. it's not even the most galling one. it's just the one that happens to be hamstringing a certain project of mine at the moment. and it's illustrative.

PDFs seem to render poorly on ebook readers.

Mainly because the layout code (Postscript?) in the PDF file assumes an A4 or US letter sized page. Most PDFs just cannot handle being reflowed.

> Mainly because the layout code (Postscript?) in the PDF file assumes an A4 or US letter sized page

Technically, it could be anything. The layout is specified, and everything stems from this.

ebook readers yes. But they can work quite well on full-size tablets.

Kindles etc. are great when you're mostly reading a flow of text. For anything that benefits from design layout -- positioned graphics, sidebars, footnotes, etc. PDF on a 10" tablet is often better.

Ideally you'd provide a PDF as the most convenient format, but have a latex or similar root file that could be processed into other formats, like maybe .mobi or .epub.

> I think markup languages like markdown which are both fairly easy to convert into other formats and deliciously human readable are the way to go.

I believe that MediaWiki, AsciiDoc or LaTeX are particularily well-suited for this purpose.

MediaWiki is already widely known and widely used for knowledge accumulation, namely, in Wikipedia. The downside is, of course, that this wiki language has quite some limitations.

AsciiDoc is a well-designed language with a stable definition for years. (Compare this to Markdown: Are you using the original Markdown? Or a fork of some fork of Markdown?) Also, AsciiDoc can be converted to beautiful HTML as well as beautiful PDF. Also, the clean definition of AsciiDoc means you have no trouble with nesting. For example, in a table cell you can put everything: enumerations, code listings, and so on. You can even put a new table within a table cell if you need to.

LaTeX is the de-facto standard in mathematics and parts of computer science, and has proven to be a stable standard, too. For example, arXiv doesn't accept your generated PDF as black-box, they want your LaTeX source and generate the PDF themselves. (That way, they can, for example, automatically produce PDFs with hyperlinks from documents which originally had no hyperlinks.) The downside is, of course, that LaTeX is not as readable as a plaintext-like format.

So for any "serious" / rigorous documentation purposes, either AsciiDoc, MediaWiki or LaTeX are the way to go.

The big upside to Markdown is that unprocessed readability is fully first-class; if something doesn't provide at least some utility when reading the raw document, then it doesn't exist. Of course, that's also one of its downsides.

Markdown has CommonMark and GitHub's Flavored Markdown, based on it.

Github is widely used too, and being used in code, it's likely that it will appear in more places than just online wikis.

AsciiDoc has a serious adoption problem compared to MarkDown.

Why is that? Given the comparatively low quality of Markdown (as a language), isn't GitHub the only reason why Markdown gets pushed so much? If GitHub chose AsciiDoc instead of Markdown as their base, we would be in a better position now.

> Given the comparatively low quality of Markdown (as a language)

Markdown stresses ease of readability. I don't find the language low quality at all, just limited. Limited isn't necessarily bad, depending on its purpose.

> isn't GitHub the only reason why Markdown gets pushed so much?

Reddit's use seems to significantly predate Github, and I would say reaches a far greater audience. Unique users visiting in April 2017 is 1.285 billion[1]. Some fraction of that is probably unique people (given anonymous desktop and mobile usage), but given how large the number is, I imagine it's still a very large number.

> If GitHub chose AsciiDoc instead of Markdown as their base, we would be in a better position now.

I agree that Github would be in a better place now, but I don't think that would really have changed anything for anyone else. Even if you want to make a case that Github would have influenced programmers who would have then used it in other projects, I think you need to account for Stack Overflow also. I think it was arguably much more popular and used by a far wider audience than Github for a long time (and may still be, I'm not sure).

Markdown was used widely because it was simple, and users would actually bother to learn and remember the very few options they had. People are more used to it now, and at this point, sure, they might accept something more complex (especially if it built on the rules they already internalized), but I don't think we can blindly assume they would have accepted something more complex.

1: https://www.statista.com/statistics/443332/reddit-monthly-vi...

Edit: s/Unique IPs/Unique users/

You act as if AsciiDoc is difficult to learn. It seems mostly equivalent to Markdown, just nicer in a lot of small ways

AsciiDoc is much more complex, in that it supports many more things. Check out the user guide[1]. Markdown takes a couple paragraphs to describe. You can know everything there is to know about using markdown within a couple minutes at most.

1: http://asciidoc.org/userguide.html

Isn't that comparison unfair, given that AsciiDoc gives you more functionality in a curated whole?

Nobody forces you do use all of them. If you just want to use AsciiDoc as "Markdown with more coherent syntax", stick to a small subset that is equally trivial to learn.

And if you need more, AsciiDoc's comprehensive user guide is a huge advantage. In Markdown, you'll have to look around for all kinds of forks. And god forbid if you want to use two additional features that were not designed to work together in the first place. Compare this with AsciiDoc's clear extension system where you can hook up everything and it won't interfere with each other.

> Isn't that comparison unfair, given that AsciiDoc gives you more functionality in a curated whole?

I don't see how it's unfair. I think AsciiDoc is much more complex, by just about any metric you want to use. I'm not saying it's necessarily worse by many metrics, just that by the particular metric of getting average people to use it, and not just a random subset of it, Markdown's simplicity is beneficial.

> Nobody forces you do use all of them.

No, but for random internet user there's a real trade-off between what you're trying to accomplish and what it takes to accomplish it, when you're just trying to write a simple comment. Reddit has a link that says "formatting help" below the comment box, and when you click it, it shows every option you have for special formatting with examples in a table with nine rows and two columns, including the header. They could have chosen AsciiDoc, but then they would have to make a decision about what features to elevate to the quick help and which not to, and very possibly which to disallow for their use case.

Markdown is simple for users to use, simple for user to understand, simple for developers to implement, and simple for developers to decide about. That simplicity is both why it was adopted by developers and why users bothered to learn and use it. As I alluded to before, AsciiDoc may have been a better choice in the end, but I don't think it's quite as simple as AsciiDoc does more stuff, so it's better. It's all about the trade-offs, and there's been plenty of discussion on that[1] before, from both sides.

1: Just google "worse is better".

gruber's original markdown is "simple" because it's brain-dead.

which is why so many people had to "extend" it with different "flavors", which has now created a massive mess of inconsistencies.

sometimes worse is better. and sometimes it's just plain worse. and sometimes it's the worst kind of situation you could ever imagine.

There's a difference between something that's not good and something that just doesn't go far enough for your needs. If it were that brain-dead it wouldn't be extended, it would be replaced.

gruber's brain-dead version _has_ been replaced. by better versions. the problem is these "better versions" are all inconsistent with each other. and each of them has an installed base which insists that the egg be cracked on their preferred end.

if instead of adopting markdown, people would have extracted a small subset of asciidoc (which predated markdown) or restructured-text (which also predated markdown) to serve the brain-dead use-cases that markdown claimed, those subsets would've been just as "simple" to learn, but also leveraged more cleanly when people sought to extend the light-markup toolkit to longer-form documents.

but the blogosphere thought it was hot shit back then, and took great delight in pushing things viral. ergo markdown. so now we're stuck in a bad situation.

> gruber's brain-dead version _has_ been replaced. by better versions. the problem is these "better versions" are all inconsistent with each other.

They are all mostly consistent with the core markdown. They are inconsistent in their extensions. Markdown itself does have problems in that there was no formal spec, but that's mostly been resolved with CommonMark[1]. They even go so far as to document the different extensions that have been developed with their different syntaxes[2]. You might be tempted to call CommonMark a replacement, but it's not, it's really just a formalization of a spec based on Markdown.pl the the test suite that resolved some ambiguities.

> if instead of adopting markdown, people would have extracted a small subset of asciidoc (which predated markdown) or restructured-text (which also predated markdown)

In that case, why not Setext, which is from 1991? I'll tell you why, because Markdown was meant to codify already in use norms, and to emphasize readability over all else:

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

To this end, Markdown’s syntax is comprised entirely of punctuation characters, which punctuation characters have been carefully chosen so as to look like what they mean. E.g., asterisks around a word actually look like emphasis. Markdown lists look like, well, lists. Even blockquotes look like quoted passages of text, assuming you’ve ever used email. - Markdown Syntax "Daring Fireball – Markdown – Syntax. 2013-06-13.[3]

> those subsets would've been just as "simple" to learn

I think not. For some, including me, markdown was almost zero-cost. It's how I wrote email.

> but the blogosphere thought it was hot shit back then, and took great delight in pushing things viral. ergo markdown.

I think that's highly simplistic, and ignores the realities. One of which is that it was pushed on Reddit, which has become one of the largest and most used sites on the internet. I find it hard to believe the blogospere opining on it (because it's not actually used on all that many blogs) has had more sway than them on this topic.

1: http://commonmark.org

2: https://github.com/jgm/CommonMark/wiki/Deployed-Extensions

3: http://daringfireball.net/projects/markdown/syntax#philosoph...

> They are all mostly consistent with the core markdown.

it's fairly easy to get the brain-dead part "right". even down to replicating gruber's original bugs and his corner-case complications.

> They are inconsistent in their extensions.

that's precisely my point. and the crux of the problem.

> Markdown was meant to codify already in use norms

markdown's markup did not differ significantly from that of asciidoc or restructured-text. all of them, including setext, leveraged existing conventions from e-mail and usenet.

> and to emphasize readability over all else

since nobody is meant to actually _read_ raw markdown, i've never understood why everyone cites that passage so religiously, other than that is part of the origin story mythology.

> I think that's highly simplistic, and ignores the realities.

due mostly to netnewswire, which installed gruber as its default mac-blogger, gruber's reach was phenomenal when blogging first went viral. if you don't understand the power of that reach at that time, it's probably because you weren't around. and that group of "cool internet kids" still flaunts itself, most notably recently in the nearly-immediate widespread uptake of json-feed.

the _only_ reason markdown was the choice of the masses was because it looked "easier" to a lazy tl;dr mentality. which is a false economy for which the light-markup revolution will have to continue to pay for years down the line.

well, that coupled with the fact that markdown has a catchy name. one cannot deny that. that helped too.

at any rate, kbenson, i'm off to a school reunion, so the last move here will be yours, if you choose to make it. we've hit the point of severely diminished returns anyway.

> since nobody is meant to actually _read_ raw markdown, i've never understood why everyone cites that passage so religiously, other than that is part of the origin story mythology.

Because that's not a universal feel, and some people do read it. I write a subset of markdown normally in text. I use asterisks for bold, use a hash for section headings, and use unordered and ordered lists as defined. I value that I write the same thing, and sometimes it's just text and sometimes it gets prettified, and I really don't need to care the majority of the time whether it does or not, because for the most part people understand the conventions used in the plain text.

Here's the kicker, in one job I designed a system to send email to customers that took advantage of this, and if you supplied a text message to email and the markdown version was different, automatically generated a multi-part email with the plain text part being the markdown, and the HTML part being the generated output from the markdown.

> due mostly to netnewswire, which installed gruber as its default mac-blogger, gruber's reach was phenomenal when blogging first went viral. if you don't understand the power of that reach at that time, it's probably because you weren't around. and that group of "cool internet kids" still flaunts itself, most notably recently in the nearly-immediate widespread uptake of json-feed.

I think you vastly overestimate the pull Gruber had over the general people at that time. I didn't know anything about him, but it wasn't because I wasn't around, I was already working in the industry. It was because I didn't have anything to do with Apple products and didn't care. Which is the same for most people. We're talking about three years pre-iphone here. Before the unibody macbook. Apple's core product that was tapping a wider audience was the iPod. If you weren't following Apple as a customer and fan, chances are you didn't know or care who Gruber was. I certainly didn't.

But Gruber wasn't the only author. Arron Schwartz invented it with him, and Aaron Schwartz was helping out an early Reddit a year later. Again, I think you vastly overestimate Gruber's role over actual use in popular sites, such as Reddit, and later Stack Overflow.

> well, that coupled with the fact that markdown has a catchy name. one cannot deny that. that helped too.

I won't deny that at all! I think that probably has more to do with it than Gruber's advocacy as well. :)

> at any rate, kbenson, i'm off to a school reunion

Enjoy! I've got another year before I have my 20th.

> we've hit the point of severely diminished returns anyway.

Agreed. We're really just refining our prior points but not making any headway in persuading each other.

back from the school reunion.

my only note now is that i was never trying to "persuade" you. or anyone else. think whatever you like, wrong or right.

Markdown is also what RStudio uses for markup of comments in notebooks, so it is very common in the bioinformatics and statistics worlds.

Same in Python Juypter Notebooks. I wish I could use rst though. I have to google it every once in awhile to remind myself that they won't support it (https://github.com/ipython/ipython/issues/888#issuecomment-2...)

You make some good points regarding LaTeX. In my opinion, anything that you can express in markdown is not that hard to read in TeX source. Most of md is centered around section headers, links, basic text formatting (bold/italic) and lists. Most of those are very readable in TeX source.

What important problem in this domain does the author think plain text uniquely solves? I'd say that the arguments aren't specific to education, and that they're also pretty weak.

Remember that one of the major breakthroughs of the World Wide Web was that HTML meant documents were no longer plaintext.

You could replace "plain text" in the article with "non-binary", and it would probably make more scence. Markdown is also not plain text in the most strict sense. Or HTML is plain text in a vague sense. In the end, what really matters is how easy it is to build parsers and tools for a format. Being non-binary is a huge plus. I think that was the point of the article, and I agree with that.

I don't think anyone would claim that HTML was easy to parse, would they? It took decades for the HTML5 consensus to emerge.

I like text-based formats, but I'm not convinced that "Being non-binary is a huge plus" for parsing. With binary formats you can assume that documents are generated by a tool, which is at least trying to be compliant with a spec, so barfing on noncompliance is more acceptable. With text you have to be prepared to cope with any kind of rat dance imaginable.

Parsing semistructured text as markup is a problem solved over 30 years ago [1].

SGML has the SHORTREF feature which allows custom Wiki syntaxes such as markdown, but also casual math. It works by applying a context-dependent (parent element dependent) mapping of tokens (such as the `_` token for markdown emphasis) to replacement text (eg. the `<b>` start-element tag). Within the `<b>` context, the `_` token is mapped to the `</b>` end-element tag, in turn, ending the emphasis. In combination with tag omission/inference (such as in HTML) and other markup minimization and processing features, SGML is a quite powerful plain text document authoring format.

[1]: https://en.wikipedia.org/wiki/Standard_Generalized_Markup_La...

We may be talking about different things. Parsing valid, standard-conforming HTML/Markdown/whatever is a solved problem. Getting multiple parsers to deal with arbitrary tag soup, authoring errors, variously-supported extensions etc in a consistent way is a lot uglier. The problems may be commercial/political/educational/organizational rather than technical, but that doesn't mean they aren't real.

I'd say that's exactly what SGML is about - a meta language to describe those things consistently.

What many developers either don't understand or refuse to accept is that when it comes to distribution you don't control formatting. It doesn't matter if that white space is explicit like white space characters or inferred from rules like CSS.

There is a naive assumption that all platforms and operating systems will treat your text (everything is either text or binary before it is parsed into something else) equally. This is false. When when this fallacy becomes self-evident many developers will refuse to modify their assumptions in the belief that consuming software will figure it out properly. Sometimes that is true and sometimes will absolutely break your code/prose/data. Clearly that assumption carries a heavy risk, but this is just data at rest.

When it comes to data moving over a wire the risk substantially increases, because all software that processes that text may make custom modifications along the way. You don't see it so much when the protocol is primitive like HTTP, pub/sub, or RSS (but it still does happen frequently). There are many distribution protocols are that less primitive and absolutely will mutilate the formatting of your documents, such as email (which is why there are email attachments).

> There are many distribution protocols are that less primitive and absolutely will mutilate the formatting of your documents, such as email (which is why there are email attachments).

That's not entirely true. For email, the only characters that have special meaning are carriage return, line feed, period, and the null ASCII character.

Other than that, you can transfer data via SMTP without any issues.

That might be true in theory but it certainly isn't true in practice. I know because I have done this work before. Documents passed through email tend to get mutilated by each application that touches it, such as: email servers, user agent applications, and sometimes network proxies and other application tools on the line. Microsoft applications were huge offenders, particularly MS Exchange which added all kinds of crap to the document.

The worst was webmail, which is an email client embedded in a web page. The documents would have to be mutilated so that contents didn't leak outside of a bounded area on the page and visually kill any advertisements or other controls on the page.

If you embed HTML in email and then embed other grammars inside the HTML these applications will brutalize your document at every step. If you are fortunate and extremely defensive your document arrive at a first destination mostly undamaged, but after that any retransmission will thoroughly crush its soul.

> That might be true in theory but it certainly isn't true in practice. I know because I have done this work before.

My testing was limited to three commercial SMTP servers that I had credentials for. One of them was the SMTP server that I could access using my Hotmail account credentials. Other than changing the Message-Id header that I had manually set in the test message I was sending, I wasn't able to to see any other changes in the message that I had sent (a string of ASCII characters (0-255) excluding the ones I noted in my previous reply).

On the other hand, I have no idea what MAPI does with text.

After re-reading your original post, it appears that you're taking applications and the transfer protocol as a single unit rather than separating them out. If you use protocols like HTTP, IMAP, SMTP, or NNTP over telnet, you'll find that they don't typically mangle text (bytes) that you send outside of certain control characters like I mentioned above.

But you're definitely correct about the problems that applications pose in terms of preserving the text that they process.

Try putting HTML with some exotic CSS (inline of course because this is email). Ensure that CSS contains some position absolutes and other properties that allows presentation outside the accepted bounds.

Also put some JavaScript in there and see what it does.

I guarantee Hotmail will destroy the original document and do so in such a way that the document evolves from machine that touches the document. The document, from the perspective of SMTP 821/281/5321 (and so forth) is still just plain text.

> I guarantee Hotmail will destroy the original document [containing HTML/CSS and/or JavaScript] and do so in such a way that the document evolves from machine that touches the document.

I'm not disagreeing with you here and you're most likely correct. Exchange has historically not complied with SMTP RFCs. I'll try it out and see how it changes just to satisfy my curiosity.

I suspect that you were to send email like you specified through Postfix, Sendmail, or Exim, it wouldn't be altered before being sent to the next MTA or delivered to the user's mailbox.

Text is text, isn�t it?

It is. When a machine modifies it in transit it is still text... just not the text you thought you sent/requested.

And the future of operating systems is the command line.

We deserve better than this reductionist thinking. Constraints can breed innovation; but they can also just constrain.

it is, but computer isn't smart enough for the future to happen yet

The computer is, PEBKAC is still the name of the game.

It's the Unix/text fetishizing developers who aren't the smart enough ones I think...

yet it is the unix/text fetish that forms the foundation for toasters, smartphones/watches, clouds, and supercomputers you'd think that a superior alternative would be overtaking this clearly inferior symbolic text system makes you think, don't it? not like people haven't been trying, but why is it so persistent? do you blame the fetish? why can't some other technically superior system win? WHY?

Systems are funny. People do what works. Don't think about it too hard.

It seems worth mentioning that plaintext is discussed here as the storage / source format.

That doesn't mean it has to be the distribution / consumption format.

One of the great things about something like Markdown is that it can be rendered to HTML trivially, to display video, equations, etc.

Same thing for ebooks, PDFs, whatever (thanks Pandoc!). It's also easy to translate between formats (e.g. .md, .org, .rst, etc.).

If a new format comes along that everyone wants, there's an extremely good chance that plain text can be rendered to it.

The reverse is not true.

A thought occurs to me about source versus consumption formats. It would seem to me that source code should be the distribution format. That way, the recipient can choose their own preferred consumption format.

When I first heard about HTML, described in some magazine article, it was touted as a way to give people a chance to have their own unique readers. For instance, a blind person might want to have their own HTML reader, that uses the hierarchy of header tags to help them navigate the document.

Today, my impression is that HTML and its successors are treated more like a general purpose programming language intended to drive visual-specific browsers. This is why we have to create target specific web pages (e.g., mobile and desktop), rather than let each target's browser render generic pages.

Yes, browsers used to have "user style sheets" where you insert style rules for how big you wanted text to be, etc.


That's nice and all but in a classroom you want to be able to say "open page n". Anything else causes terrible confusion.

That's a good point. I suppose there has to be at least a preferred distribution format that everybody has access to.

Multiple formats can be available for distribution. I think that web pages make a sensible default, but having the option to take an ebook and whack it on an e-reader, or open the source in an editor (e.g. to run code snippets yourself) is great.

While plain text is compatible with virtually 100% of every OS ever, when I try to open a txt file on my windows machine it asks what application to open it in, since there are no default apps (Windows 7 and earlier.) This might make it appear that plain text files are a "special" format because it doesn't open up when you double click on it.

I have sent plain text files to some of my colleagues in the past (so that there is a 100% chance that they could read the file), and they were unable to open them because of this issue with choosing the default application, and asked me what app they needed to download to view the files.

I'm pretty sure .txt opens in notepad..?

It does, but sometimes the line breaks don't show up so it's all on one line with what appears to be tabs. WordPad works fine though.

Save your .txt files with CRLF(Windows) line endings, not LF (Linux) or CR (Mac). Your txt files should now work on all (relevant) OS.

May I share a related vignette?

I have a kid in middle school, and he has a tablet. These things are often pushed as "educational". Pop quiz: you walk by, and you need to determine, within a couple of seconds, if what he's doing is actually educational. Here's what you see:

lots of graphics, whizzing around the screen


black alphanumeric characters against a white background

Now, you don't actually know, but generally speaking, the second is a better indicator than the first.

I realized this applies to my own work as well. There are parts of my job that I consider extremely useful to the world, and parts that I really gotta wonder about.

If I'm looking at green or white alphanumeric characters against a black background (easier on the eyes), I'm probably at a UNIX prompt, writing code that is doing something very analytical, or writing SQL. If I'm looking at graphics whizzing by, I'm either trying to figure out how to get a drop down to repopulate with the right thing pre-selected in the latest javascript framework, or, worse, I'm so irritated with javascript frameworks that I've decided to browse the web.

Again, it's not a guarantee, but I'm starting to consider a very general guideline: if you are looking at symbols and alphanumeric characters, the odds that you are building something of lasting value is much higher than if you are looking at things with elaborate UI elements.

It's not 100%. My kid could be watching Citizen Kane and developing an interesting critical point of view. He could be reading 101 fart jokes. It's not a perfect match. There are worthy and unworthy things on both sides.

But as a general rule, for culture and career - if you're looking at plain text, that's a good sign.

Perhaps educational materials should be designed to benefit the student rather than somebody looking over the student's shoulder...

Or the student could be a visual learner.

The learning styles theory was recently called a "neuromyth" by thirty researchers in a public letter.


People are dogging on you for suggesting learning styles. Learning style aside, seeing something with your eyes helps you see it with your mind. I think very few people would find it easier to understand a paragraph or a few than a good illustration or demonstration.

Learning styles are a myth, don't you know? https://www.psychologicalscience.org/journals/pspi/PSPI_9_3....

And there are many more such studies available with a simple search.

But there's a reason scientific books aren't very rich in illustrations.

I'm coming in a little late here, but I really did mean it was just an indicator. Visualizations are still important. In fact, a moving graphic might mean that the kid on the kindle is, in fact, using a visual aid to conceptualize how back propagation works with neural networks.

I also don't want to come off as knocking video games. I had an Atari in the 80s. Super fun. I would have played that thing 8 hours a day if my parents had let me. They put a cap on it, along with a rule that I also spend some time with this odd object involving black letters printed on a series of white pages if I wanted to play the video game.

The Atari is long gone (though you could say it's as present as ever, in a greatly enhanced form). But those black letters on white pages are still pretty excellent, and are identical to how they were 30+ years ago.

What, because all the good ones were written before scientists could afford books with pictures? :-p

Check this out:


I totally agree and I strongly prefer text. So much that I made this: https://www.libre.university/ for me and my classmates to learn some topics. Note: the English one is mostly empty except for [1], the main one is the Spanish page.

[1] https://en.libre.university/subject/4kitSFzUe/Web%20Programm...

Plain text would've been THE usual way to go for most things (1), but it's so hidden in major OSs, even in Linux distros. Many use Word or Wordpad mostly as a plain text editor becausr they don't know the difference between plain text and a word document. Also, there is thpage metaphor in those programswhich people like, because we think of text in pages. This is not that hard to emulate in plain text but not that straightforward too. People often share documents so pages and paragraphs must be equal for all (paragraph numbering might be a solution). We have the tooling fir generating convenient formats for consumption likr PS and PDF, but default tooling for plain text and the visibility thereof is what's needed.

(1) When youcompose text you want to compose first style later. Wysiwyg mixes the two and you end up with crappy spelling and half arsed formatting most thetime.

A long text explaination of how a piston engine works or a short explanation and an animated GIF.

Which is better?

I would say the second one is more effective.

Depending on who's doing the advocating, plain text doesn't inhibit images or any other media, as any site with media written in markdown or reStructuredText shows. Lots of GitHub READMEs are written in Markdown and include images, for example.

There are a number of image formats that are commonly supported, as well as video and audio. Once you've decided to go beyond face to face speech in the same language, you have to make compromises. Just try to minimize those, and try to stay within conflicts that have compromises, like line endings.

I think starting from "authorship in plaintext," along with sharing plaintext files, would move things pretty far forward. (I guess there's some irony in there.)

Odd to see a statistics-focused site to emphasize plain text as opposed to formats that promote usability/reproducibility like Notebooks such as Jupyter (or R Notebooks, as discussed last week: https://news.ycombinator.com/item?id=14522795)

The post correctly notes that R Markdown files are plain text, but the benefits of such (version control) are not discussed by the OP.

What are the possible benefits of doing something that could be done using plaintext, using a JSON blob?

One advantage of plain text formats that I haven't seen mentioned is that they're easily and meaningfully diff-able.

You can run two versions of a markdown file or a LaTeX source file through a diff, and see what's been changed. Try that with a PDF or Word file or what have you.

As I like to keep my files in version control, I use plain text formats as much as possible.

People look at you like your a damn wizard when you diff two texts to find the differences. I'm sure there are diff tools for Word as well, but almost no one use them, because they aren't aware that such a tool could exist.

The average person run around with a smart phone in their pocket, a marvel of engineering and yet they still don't know how to use their computer to do trivial tasks.

Word has built in diff. Review -> Compare.

It's kind of slow though in my experience. Diffing plain text is amazing. I recently used it to diff a research paper I was peer reviewing that the TeX source was provided for. I felt like a wizard.

If anything, the "track changes" feature in Word is actually nicer than diffing by hand (and this is coming from someone who positively hates Word!).

Does it track changes in one document (when enabled), or can you give it two documents and let it highlight changes?

(A fairly common occurrence when many people's distributed version control system consists of emailing around "thesis_v0.9.doc", "thesis_v1.0.doc", "thesis_final.doc", "thesis_final_v2.doc", etc. See also: http://phdcomics.com/comics/archive.php?comicid=1531 )

That is an excellent point. I recently switched from Google Docs to Org files with Unison and ediff for syncing. Having everything work offline and on, over multiple machines, and being able to review and merge changes really makes things much more convenient. I now have multiple local and remote backups and no longer have to worry about losing my Google account. Another common pattern is to diff files that come in over email, or to diff when someone takes my code changes and modifies them by hand/rebases them.

At this point I think that plain text files in a distributed version control system that can also import/export patches for emailing (like darcs and git) is clearly superior to cloud-based document storage. The promises of the latter have just not panned out - what we got instead is vendor lock-in and the Damocles sword of account lock-out/deletion/hacking. In many ways (UI responsiveness, control of data and privacy, service discontinuation due to vendor shutting down product/acquisition/bankruptcy, etc), the cloud apps are a big step backward from PC applications.

Text is also accessible to people with sensory disabilities (blind, deaf, deaf-blind...). Hypertext is also fine, of course; I won't insist on "plain" text. But this is in contrast with graphics, audio, and video.

The future of education might be plain text, but not for the reasons mentioned in the article. Also, I think something can be the future of XYZ if it solves the problems XYZ face currently.

The current content for education is good, but is definitely bandwidth-heavy and is tough to maintain. But dropping to plain text will force us let go a few things that otherwise make learning more effective.

I think HTML (or a WYSIWYG style editors - that seem like plain text, but can be powerful with images, videos, animations when required) also does the same thing like plain text,

- it is always compatible (I give it to you that it takes effort to run hifi stuff on browsers, but still better than plain text).

- is easy to mix and match

- is easy to maintain (thanks to many editors)

- is light weight (not compared with plain text of course)

- is forward compatible (not possible when all browsers decided to not support HTML in its current state).

I appreciate the thought behind bringing this up. I think writing something in plain English, which can then be turned into some super cool learning material that runs everywhere would be awesome. It helps both in solving the issues mentioned in the article at the same time keeps learning effective.

Disagree. HTML is a fine output format, but one that can always be generated from plain text. HTML is a poor storage format for the editable 'raw' content itself.

Images, videos, and animations can all be done in markdown -- they're just references to files.

Even something as small as the the MathJax CDN getting retired means that I have old HTML notes which are now 'dead'. If I don't have the markdown/rmarkdown source for them, realistically, they're just not coming back.

> If I don't have the markdown/rmarkdown source for them, realistically, they're just not coming back.

That's why I try to use HTML as the universal storage. It's not friendly to edit by hand, but with a basic, limited editor it's wonderful.

The problem is the lack of agreement on how to apply styles. Some tools do inline, some build stylesheets which use name mangling to format pretty names for CSS classes.

What about the existing "standard", PowerPoint slides and PDFs? Although most of my notes could be plain-text, any graphs or figures need pictures. ASCII art doesn't cut it.

Plain-text just needs to be the format. Checkout (e.g.) Markdown editors like Typora. The underlying format is Markdown, but it can be edited and displayed in a more visual fashion.

With the text being in plain-text, it guarantees someone the base ability to just open up (e.g.) Notepad.exe if all else fails vs. trying to open up a Powerpoint '97 presentation with embedded RealMedia files in 2017.

You almost certainly can open PowerPoint 97 files today. Probably RealMedia too. Even better, going forward, pptx (and docx, etc.) are open standards.

"Open" yet there fails to be a compatible office suite that can render pptx and docx in the same way.

There are two answers to that. One is that to fully support these rich formats you have to have compete feature parity with Office, which isn't trivial. The other is that we're comparing to plain text, which doesn't even pretend to give you any control over how things look.

The alternative offered is Markdown. How can you say that it's ok for Humans to revert to raw Markdown, but unacceptable to open a legacy .ppt format in an application that may not render with 100% fidelity?

No one ever said that humans need to write raw Markdown. Take a look at Typora as an example Markdown editor. Raw Markdown would be the fallback scenario, which is still readable/grok-able. docx/pptx/xlsx don't have such a fallback scenario.

There are countless software packages that can edit and display Office XML files besides just the Office package, so I'd think the fallback scenario is using one of those.

Typora is surprisingly pleasant to use, so +1 for that.

If you're going to propose an standard for graphs and figures, please make it SVG. It's hard to find tools for editing PDF that would support text and graphics, and PowerPoint exists only as a proprietary closed format.

unfortunately, since svg is a web standard, it can reference external objects

that severely limits its adoption with people that spend all their days defending the idea of 'attack surface'. so google draw uses svg, and exports svg, but won't import svg. github won't inline svgs in md, etc

i don't personally like the svg design, its got a lot of weird corners and has the usual screwy relationship with the DOM. but having a fully neutral vector format would be such a massive win, by all means svgs if you can get it more widely adopted...maybe some sort of sanitized/sandboxed svg subset.

postscript should have been that format, but they were so focussed on driving rasterization that they made any kind of other interpretation (i.e. editing) impossible.

What does that link intend to prove? The Office Open XML standard relies heavily on hidden implementation details of the Microsoft Office software suite.

"The specification is inadequate to imitate Office" is a different claim from "it only exists as a proprietary, closed format."

From what I've heard, the MS Office applications do not even entirely comply with the published specification. So OOXML is still not a definition of the format with which PowerPoint stores its files.

The spec isn't actually self-consistent, so it's not clear that it can ever actually be implemented:



Your source claims the spec contradicts previous adapted ISO standards, not that it contradicts itself (which is what I assume you mean by it not being self-consistent). It does, however, raise the issue of bits like legacy compatibility for Word 97 behavior and scripts not being documented.

Well, no, it is self-inconsistent in places [0]. However, as you mention, those are probably minor points compared to how it contradicts existing standards, like the Gregorian calendar [1], date representation, and language codes.

0: http://www.groklaw.net/articlebasic.php?story=20070123071154...

1: http://www.groklaw.net/articlebasic.php?story=20070123071154...

Those are annoyances and give increased complexity. They are not, as you described earlier, inconsistencies that make the spec impossible to implement.

So they're just like every other standard on the Web, basically.

Other standards, and web standards in particular, tend to have several open source implementations to guarantee that it's possible to build tools that support it without being the reference implementation. OOXML has no such thing, and the reference implementation is closed source, so it's basically impossible to replicate in any practical sense.

There aren't several implementations of Office Open XML formats in open source projects? I don't think that's accurate.

Not fully compatible with PowerPoint, no. That's why I said the PowerPoint format is proprietary; it's impossible to build a working interoperable tool, because either parts of its format remain secret and do not follow the published standard, or the format itself makes references to hidden implementation details.

For most purposes I think that's basically academic.

Having an open source tool that can open files created on PowerPoint without them becoming horribly mangled, or vice versa, is a very realistic concern. Microsoft Office is well known for being extremely difficult to make its file formats portable beyond the very basic layout features.

Are you stating that there are people that like pdfs?

If your purpose is to exactly represent a printed page it's a great format.

Given a sufficient device it's actually a fairly decent format. The problem is that that device has to relatively closely correspond to printed paper sizes, within a factor of about 75% - 125%.

What PDF offers is a consistent, space-persistent, formatted output. For reading longer works, it actually does matter to me where a passage appears on a page, or within a work. Spatial memory is important that way.

I read a lot of material, in a lot of different formats: paged text (manpages, console-mode browsers), formatted HTML, ePubs, DJVU, image-scanned books.

If I'm reading on a largish (9-10") tablet, PDF in one-page-up format is actually really good. Fills the screen, is almost always suitably readable. Scans of old books (thank you, Internet Archive and Google) in particular are a delight -- there's something about reading a century-plus-old library copy with markings and (hopefully not too much) marginalia, as well as the original typesetting and images.

The main problem I have with fluid formats ultimately is their fluidity. I realise that that's perverse, and that there are times when it's a real benefit, but again, I can't seem to get away from that spatial memory thing.

If I'm extracting content from works, I prefer source (LaTeX, Markdown, DocBook, etc.). Though that's another story.

The ability to spin out formats on demand would be an ideal. I'm looking into ways of doing that.

dr. ed said:

> If I'm extracting content from works, I prefer source (LaTeX, Markdown, DocBook, etc.). Though that's another story.

except it's not actually another story. it's just a different part of the same story. and a format (like .pdf) which only handles one part of the story well (such as reading) but falls apart on another part (like text reuse) is not -- ultimately -- a good solution.

but that doesn't mean .pdf is worthless. yes, it's worthless as an archival format, and as a distribution format. (and those two are the ones which people commonly pitch as _strengths_ of .pdf, unfortunately, which is misguided.)

but .pdf is fine as a one-off output-format, spun out in an on-demand fashion by an end-user who wants .pdf for their own personal reasons (which require no justification to us). this is what you mention at the end of your comment, and i, too, am working on that...

As you note: if PDF is what you want, then the option to request it, or whatever other format is your preferred option, would be excellent user-centric behaviour.

The idea of requesting, say, <item>.<extension>, where extension is [html,pdf,epub,djvu,txt,json,tex,md,csv,dir,...] would be interesting.

This presumes that there's a way to represent the content as, say, a directory listing, CSV, or JSON archive.

pdf is my prefered format of storing information, after markdown and source code, of course. It allows me to portably crystallize visual information without resorting to images which completely wipe out textual information.

It's my preferred reading format for sheet music.

I think the concerns about them are entirely theoretical, yes.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact