Don't write just in plain text (longevity vs. authenticity)

dang · on March 2, 2022

Related large thread from yesterday:

Write plain text files - https://news.ycombinator.com/item?id=30521545 - March 2022 (345 comments)

nonrandomstring · on March 2, 2022

This essay is actually deeper than its surface appearance, about text versus other formats. It's about semantics and richness of content, although I am not sure Miris fully grasps what s/he is wrestling with.

The author invokes the concept of "authenticity", and that's where it gets interesting.

I used to set my students a question about information content in a class on the philosophy of procedural representation.

We had a very high resolution photo of the aviation pioneer Amelia Earhart, and a short grainy video clip of her getting into a plane and smiling and waving.

My question was: Which one of these two media conveys more information about Amelia?

One gave extraordinary detail of her face, eyes, and seemed to many was a much better "fidelity" document. Others noticed that although you couldn't see her face in the video, you could feel from her gait, waving, body language and the way she shook hands _much more_ about her than from the static photo.

Both files are the same size in bytes.

So which one has more "information"? Which one is more "authentic"?

Not to attempt to answer here with a deep dive into phenomenology, but each carries a different kind of information, which can be static, dynamic, or meta-dynamic in higher orders relative to a matrix of assumptions that must be carried forward in parallel by the culture that wants to decode the message later.

I like that Miris tries to explore this by questioning the richness of text. But maybe the question doesn't hold up well under those conditions of investigation - because one might say that a great poet using only a few words might capture a landscape better than a painting, but if our culture drifts toward a visual one where poetry is no longer understood we cannot say that the medium itself degraded.

II2II · on March 2, 2022

Then there is the value of the written word. While the grainy video may reveal more than the high resolution photograph, the written word may reveal more than a grainy (or high resolution) video. A diary is much better at exposing motivations, emotions, the perceived relevance of events. A newspaper article can articulate the events of a day much better than a film that captures a fragment of time in the framed image of space. It's not that written accounts are necessarily better on these accounts (one could, for example, have a video diary). It is simply that they turn out to be better than the often fragmentary accounts from higher fidelity sources.

(It is also worth noting that these higher fidelity sources are often left to decay or are intentionally destroyed due to the difficulty and expense of maintaining them.)

nicbou · on March 2, 2022

I agree. My written travel journals are far more interesting than the pictures, because they show how I actually felt while I was there, with all the little stories that go with the trip.

deltarholamda · on March 2, 2022

I get what the author is trying to say, and I certainly agree. Sometimes you simply need it to look a certain way or use an image that cannot be summarized in text in a meaningful way in order to get your point across.

All those monks slaving away doing doodles in the margins weren't just staving off carpal tunnel. There's meaning there you can't get any other way. Before Man could write, Man made art.

The other HN article about "plaintext only" I also agree with. HTML is the synthesis of the two. Sometimes I forget what a great idea and a blessing HTML is. Even if you don't have a browser that can render it, reading an HTML document isn't difficult if it isn't festooned with auto-generated nonsense.

Moru · on March 2, 2022

A digital format needs to be as simple as possible. If you need a 1000 page long definition for the word document format to be able to get the information out of it, it's not really good for anything when the format is forgotten.

causi · on March 2, 2022

one might say that a great poet using only a few words might capture a landscape better than a painting, but if our culture drifts toward a visual one where poetry is no longer understood we cannot say that the medium itself degraded.

That has little to do with the medium. If the painter takes as much effort as the poet, and the viewer as much time and effort as the reader, just as much information and emotion can be gleaned from the painting as the poem.

jacobr1 · on March 2, 2022

The medium can matter in both cases, in the sense that the medium is not just the format, but also the cultural context of interpretation. There can be subtleties in word choice that evoke shared stories, or word connotations or otherwise make reference outside the work itself. You can have the same references in the form of visual symbols, stylistic choices or more. The viewer/reader must make much more effort to gain that cultural context for interpretation, which may very well be lost or degraded over time.

I would posit that while a painting has can be very high context, that the tendency is for poetry to be even more context dependent. Transplanted outside its native culture, I suspect visual works (again, on the margin) can be grasped with more depth by the viewer than than a reader of the (on the margin) poem.

aasasd · on March 2, 2022

> very high resolution photo of the aviation pioneer Amelia Earhart

I thought photo quality was rather meh until the 50s or at least the 40s? Even with large films the results are often muddy in olden shots—while 70 mm movie film from the 60s will probably still be redigitized into super-duper-hd formats in the late 21st century (e.g. https://youtu.be/sCv-dIFGcd0).

adzm · on March 2, 2022

There are a bunch of crisp photographs from the late 30s. The hard part was getting the subject to stay still long enough and not blur the background, since the higher resolution / finer grain required a longer exposure time / wider aperture. You can check out various archives for examples, https://www.shorpy.com/ is one.

aposm · on March 2, 2022

This is not quite correct... For motion picture film, yes, anything older than the 70mm film you're talking about tends to be low-resolution because of the physical constraints of moving a foot or more of film through a camera each second. However, still photos from that era were much better - the limitations of film stocks were lighting, and with enough light to gather, a large format photo could be extremely sharp and high-resolution (if that was the priority). It sounds like this is referring to a formal portrait setting, with potentially very bright studio lighting and high quality film, so it could easily be sharper than all but very recent digital cameras (as large-format film still is).

kubb · on March 2, 2022

I think the thought experiment still "works" (as much as philosophers can say something useful about the question it poses) even if the photo was upscaled.

nonrandomstring · on March 2, 2022

Yeah, it was taken in like 1938 or something. Scanned using modern gear. Of course, it's surprising how much detail comes out of chemical photography of that era. But you're right, the "resolution" is arbitrary and probably oversampled.

nicbou · on March 2, 2022

Have a look at Ansel Adam's photography. The sharpness is outstanding.

InitialLastName · on March 2, 2022

It would be interesting to compare the video and the photo to a similarly-sized excerpt of her writing.

gilleain · on March 2, 2022

I'm curious : what is "meta-dynamic" information?

nonrandomstring · on March 2, 2022

Good question. So as I see it: something that's about generative behaviours. For example: The rules that describe the dynamics of generative systems like Collatz (n%2==0 ? n' = n/2 : n'=3*n+1) there's specific integers that make it work while for a Lorenz attractor you'd have three floats that could take different initial values. That would be a conversation about "meta-dynamics".

Or maybe, what changes are behind systems that change?

cupofpython · on March 2, 2022

ngl I am also curious about meta-dynamic information and I do not think this response helped me understand the definition much.

Do you mean the rules for Collatz are the meta-dynamic information? because from the rules we can generate additional information dynamically?

Are you able to describe it a different, less mathematical, way maybe?

for example, can meta-dynamic information be observed by seeing the way someone jumps up and down?

thomascgalvin · on March 2, 2022

This argument feels ... not quite like a strawman, but more pedantic than I think it needs to be.

I don't think anyone really argues that everything should be plain text, even if that's an easy shorthand. The real argument is "use the simplest, most open format possible."

Nobody is suggesting you go through all of your photos, transcribe your emotional reaction to each picture, and then delete the image. But, if you want to view those same photos when you're fifty years old, or seventy-five, you're better off storing them as a JPEG than a PSD, and you're better off storing them on a hard drive you have access to in addition to whatever cloud they're currently occupying.

"Write plain text" is a shorthand for "use open formats." Because so much of what this audience does is test-based, plain text is the most common format we use, from source code to journaling, but that message applies to pretty much anything: if you lock yourself into a proprietary format, or a proprietary editor, you will almost certainly lose data over the long term.

logifail · on March 2, 2022

> Nobody is suggesting you go through all of your photos, transcribe your emotional reaction to each picture, and then delete the image. But, if you want to view those same photos when you're fifty years old, or seventy-five, you're better off storing them as a JPEG than a PSD, and you're better off storing them on a hard drive you have access to in addition to whatever cloud they're currently occupying.

OTOH there are many photos I have, taken a decade or two ago, where I wish I'd written down my thoughts and reactions at the time, rather than just taken the picture. A picture may be worth a thousand words, but just having lots of pictures and no contemporaneous words, leaves more of a gap the longer ago it was.

spc476 · on March 2, 2022

I have a ton of family photos where I'm lucky if someone scribbled the year on the back; a bit more lucky if it's the month and year. The number of photos where names were written is way far less ("I remember Bob, but who is standing next to him?" "I don't know.").

Also, the 70s and early 80s were a bit more orange than I remember.

mekoka · on March 2, 2022

Spot on.

A few years ago, when on a hike, if I came about some beautiful scenery, out came the camera. I'd spend most of the time capturing images with the device, rather than take in the landscape through my own senses.

Later, I'd look at those photos and noticed that they failed to convey a great deal of the emotional dimension. Now, I spend more time looking at the landscape, trying to notice all the details, and only take one or two snapshots. The idea of writing down my thoughts and reactions is worthwhile, or for practicality, maybe just audio record them and transcribe later.

Vivtek · on March 2, 2022

This is an excellent point. I've been trying to go back and fix that, by correlating pictures with journal entries or blog posts and the like. I wish I'd taken better notes about who all these people were and what they meant to me at the time.

lisper · on March 2, 2022

This is the reason I'm still runnins OSX Mavericks. I have a HUGE investment in iPhoto in the form of curated albums and photo captions. All attempts to migrate the iPhoto albums to Photos have failed. So I keep Mavericks machines mainly so I don't lose my iPhoto albums.

lawkwok · on March 2, 2022

Apple Photos and I’m sure many other photo clients has a caption feature that lets you type notes for a picture. I haven’t hit into any character limits so you could presumably write a journal entry for each image.

Zak · on March 2, 2022

It's a response to this, which does advocate the use of literal plain text files where possible: https://sive.rs/plaintext

The author mentions converting to other open, text-based formats like HTML and LaTeX for publishing and writes:

> Keep your graphics files alongside your text files. But keep your text as plain text.

pdonis · on March 2, 2022

> It's a response to this

Seems more like a misunderstanding of it than a response. As you quote explicitly from the Sivers article, he is talking about keeping text as plain text, not about keeping images as plain text. And the Miris article is basically saying the same thing (at the end he even says plain text is still his first choice), yet appears to think he's giving some kind of opposing viewpoint.

freddie_mercury · on March 2, 2022

I think you've over corrected too far in the other direction.

"Write plain text" is definitely not a shorthand for "use open formats".

PDF is an open format.

Approximately nobody who says "write plain text" thinks putting everything in PDF is an acceptable alternative.

They don't even want you writing in HTML, for that matter. They want Markdown.

They really do mean something fairly close to "plain text".

thomascgalvin · on March 2, 2022

To quote myself:

> The real argument is "use the simplest, most open format possible."

For most collections of words, that means Markdown, not PDF. But if the words you're saving are a mortgage document or power of attorney, PDF is actually a better choice.

selfhoster11 · on March 2, 2022

I'm never going to write directly in PDF, but if I want to preserve something? 100% I will save it as a PDF if it's document-like.

raman162 · on March 2, 2022

I got the same feeling as well. Open formats and avoiding proprietary lock-in is what the spirit of "write in plain text" is about.

mark-r · on March 2, 2022

I took a different conclusion from the plain text article. The argument isn't about open formats, it's really about plain text, simply because there's no need to have a tool to make use of it. Even open formats can get abandoned and become unusable.

necovek · on March 2, 2022

I took the original article on plain text to mean that you should aim for plain text formats which are human-readable even without specific tools to process them.

Thus HTML, Markdown and LaTeX make sense:

  \begin{document}
  Blah
  ...

Is completely understandable to a reader even 50 years down the line, even if they don't have LaTeX on-hand.

But, it does bring an interesting counter-point: what does $$\frac{1}{n}$$ mean (to not even bring up more complex examples). It's probably no surprise that LaTeX is the lingua franca of math input because it brings in terseness, simplicity and some readability to plain text. Still, it's a programming language, so literally all bets are off in a document (you can redefine \frac to mean something else entirely).

I guess both articles, as noted elsewhere, attempt to nail down one familiar truth: use the simplest expression possible, but not simpler. One thinks that's always plain-text except for images, but there are just more contexts where this applies.

paxys · on March 2, 2022

> I don't think anyone really argues that everything should be plain text, even if that's an easy shorthand

Pretty sure this article is a rebuttal to the front page post on HN yesterday which said exactly that

CRConrad · on March 3, 2022

> Pretty sure this article is a rebuttal to the front page post on HN yesterday which said exactly that

* It may be an attempt at a rebuttal, but in actuality it mostly agrees. But yeah, rather obviously in reaction to that article.

* That article didn't say quite exactly that everything should be plain text; only that most text should be plain text.

Chris2048 · on March 2, 2022

> I don't think anyone really argues that everything should be plain text

  Plain text just works, everywhere, all the time.

  -- https://news.ycombinator.com/item?id=30525605

llarsson · on March 2, 2022

That "some" proprietary formats from the 80's and 90's are still readable is already causing real problems: because not *all* are. So text, possibly with Markdown or similar hints regarding emphasis and structure, is still vastly better than any alternative I can think of.

dhosek · on March 2, 2022

I bumped into this recently with a couple Kodak PhotoCDs I uncovered last month. Trying to get the pictures out of the PCD files is turning out to be more challenging than I expected.

js2 · on March 2, 2022

I converted mine years ago using iPhoto but according to Wikipedia, there are several programs that can do the conversion:

https://en.wikipedia.org/wiki/Photo_CD#Converting_Photo_CD_i...

dhosek · on March 3, 2022

iPhoto no longer exists and the replacement, Photos doesn't support it. A free command-line tool I found doesn't run any without signing and I need to find the dependent library for JPEG handling to use it. Other apps are paid.

dhosek · on March 3, 2022

And an online service that claims to do it only extracts the lowest resolution scan from the file.

mro_name · on March 2, 2022

what was the issue - hardware access (a CD reader), bit rot or else?

The image format was jpeg if I remember correctly, wasn't it?

mark-r · on March 2, 2022

I don't think it was JPEG, it was a custom format that contained multiple resolutions.

dhosek · on March 3, 2022

It's a custom undocumented format with a lot of oddities to it (particularly in its color profile handling, or more precisely lack thereof).

deltarholamda · on March 2, 2022

Is ImageMagick not doing the job? It's been a while, but Photoshop used to be pretty adept at this as well.

dhosek · on March 3, 2022

I haven't checked ImageMagick to see if it supports PCD, I'll have to see if I can do it in my wife's PhotoShop license.

dhosek · on March 3, 2022

And ImageMagick will only do the lowest resolution images from the files.

c0balt · on March 2, 2022

Additionally the option to (relatively) easily transform markdown or richtext to another format is a great when you want to try a new tool and/or format.

eddieroger · on March 2, 2022

That's the real secret, I think, keeping content up to date. Text isn't the only medium that can be treated that way, too. My family converted lots of analog audio and video tapes to DVD some years ago, and I immediately turned around and ripped them to digital, lossless types and stored them on a few hard drives and eventually a cloud backup. Will FLAC and MP4 last forever? Nope, probably not, but if I check in on these files every few years, update the players that I've saved (VLC), and periodically convert them to newer formats, I feel comfortable that my grandchildren will be able to hear their great-great-grandparents' road trip to California, or see video of my bar mitzvah on whatever screen they're using years from now.

nayuki · on March 2, 2022

Nitpick - DVD is digital. When you said "digital", you probably meant "computer file based" format, as opposed to a physical disc format.

Similar: https://news.ycombinator.com/item?id=28682996

sleepycatgirl · on March 2, 2022

Yup. Or .org for that matter,

_lgrf · on March 2, 2022

I feel like a lot of use plain text proponents forget that outside of ASCII and now UTF-8, lots of alleged plain text documents with diacritics or non-latin characters are at least slightly difficult to open because of their somewhat esoteric encodings. Plain text isn't as universal as it is often claimed, although it is immensely simpler than some other formats.

But maybe we should all use monochrome bitmap files for everything? That would be very simple.

softwarebeware · on March 2, 2022

Yes, I feel this in my bones as someone who previously worked for a text messaging provider. Plain text has the deceptive appearance of simplicity, but it is actually one of the most maddening things to get right, especially if you intend to support the accurate transmission of said text to any possible text message receiving device in the world.

selfhoster11 · on March 2, 2022

If it's 2022 and someone is _still_ saving plaintext in a non-Unicode encoding where going with Unicode is a perfectly viable option, I will personally ensure that (figuratively) they are burnt at the stake.

In addition to UTF-8, my language happens to have ~2 additional code pages/Latin based encodings. Some websites still serve (or very recently used to serve) text files in such broken encodings, so I have to convert such files before use. It's deeply unpleasant. Windows has supported UTF-8 in some fashion for over 15 years, get with the program people.

(I would make an exception for preserving historical non-UTF-8 files in their original byte-exact form, for the same reason that I wouldn't digitise an analogue photograph and then burn the original - but let's be real, all such files have been created by now)

jjav · on March 2, 2022

That is why I tend to always keep files in plain ASCII, even though two out of my three primary languages need characters not in ASCII.

File longevity wins over grammatical correctness most of the time for me. I have text files going back to the 80s, so I'm glad I didn't use any fancier software to write them as they'd be completely unreadable today.

smasher164 · on March 2, 2022

I think for a plaintext format to be "complete", it needs some mechanism of associating the language with some segment of text. Plaintext formats that don't acknowledge unified characters are just Latin-biased.

_lgrf · on March 2, 2022

that's basically point - you can open an ascii file now because utf-8 is ascii oriented, but a utf-8 first editor will struggle with an old french text file for example. plain text has inbuilt biases which have changed over time, it's not as pure as simple as people say.

tombert · on March 2, 2022

That's something I've always thought; plain text is pure and wonderful for me, an Anglo-writing American, because most of these formats were written for people like me.

I suspect for nearly every other language (or at least any language that doesn't use the ~100 characters/symbols used in the English alphabet), old ASCII text isn't terribly useful.

yumiris · on March 2, 2022

This was concocted at 5AM -- my apologies for any peculiar sentence structures or odd phrasing.

Will re-re-re-revise it again with fresh eyes after resting 'em!

aasasd · on March 2, 2022

I got quite a lot of use out of metadata over the years, such that now I'll probably get a nervous itch and tremors all over my body if I attempt to use just plain text. Specifically, the creation and modification times for each addition to my notes are rather valuable, especially with the work-from-home lifestyle aka ‘day fades into night into day’—with which more people are gonna be familiarized in these years.

Thankfully I'm using Org-mode these days, which is reasonably ‘plain text’ under practical definitions—but I make dozens new headings every week, and each of them is stamped with the creation time. But boy do I miss having modification times too—should probably finally set up automatic commits to Git. Also need to mess with Orgzly so that it marks notes that are created on the phone.

gilleain · on March 2, 2022

Indeed, digital archives use (I understand) various metadata standards such as:

https://www.dublincore.org/specifications/dublin-core/dcmi-t...

or 'Dublin core' which is RDF.

selfhoster11 · on March 2, 2022

I supplement my workflow with some judicious use of text-expander macros. I can type a total of three characters for the current date-stamp, or four for a date + timestamp. This makes it easy to reflexively date literally anything systemwide: from archive filenames, to code comments, to config file tweaks, to actual notes.

dv35z · on March 2, 2022

Can you touch on your org-mode journey, setup & current flow? I am just starting the journey - looking to have a fairly coherent notes/todo/planning/contacts/kb system, and have (portions of) it published out to a static website. Emacs is... something else.

aasasd · on March 2, 2022

Eeeh, I already see from your description that your needs are different from mine, and Emacs and Org-mode tend to be customized by everyone to their smallest wants. You won't find a shortage of articles about Org, including here on HN.

ParetoOptimal · on March 2, 2022

What are some things you use modification times for?

aasasd · on March 2, 2022

When I used Evernote and my notes were larger in scale, I mostly used the modification time to figure out how long a particular note was lying around without updates—so abandoned-forgotten projects and such stuff, basically tracking how much I actually use the notes.

(Evernote went to shit over the years, so don't take this as an endorsement.)

Sometimes it's also useful to figure out what I was doing when writing a note, by placing the time among my other activities. This gives some context for the thoughts.

Now that I migrated to outlines and the notes are much more granular, plus I started making more of them—they can often serve as a timestamped log of my day. When did I eat the breakfast—so I can put the dinner in the stomach before it begins an acid-fest? Well, I logged watching an episode of the series during the breakfast, so the creation time tells me the answer.

I'm scatterbrained, okay. Or rather, the notes are part of my ‘brain’ now.

In fact, I do miss granular times in other logs of my activity—ironically, in regard to privacy. I watched a video on a particular topic around last summer, and would like to find it now—but YT's ‘watch history’ is crude and just leafing through all of it is infeasible. (Actually, perhaps I should look into the ‘takeout’ dumps of activity for the timestamps, and make a list of the vids in a better format.)

brians · on March 2, 2022

“all the binary formats of the 1990s can be opened today”

Oh, sweet summer child. Scribe/mss. Koalapad. A bunch of Apple 2GS, Apple 3, and Lisa formats. Lotus Improv.

The points about semantics and authenticity are wonderful, but I think the presumption that all formats can be opened is mistaken exactly because those that can’t be opened become effectively invisible and lost.

selfhoster11 · on March 2, 2022

Emulation can bring them back in a limited fashion. Though obviously un-marrying them from the original system environment and making them accessible outside of the VM can be a challenge.

photojosh · on March 3, 2022

This just gave me the completely random idea... (as someone whose parents used to have tons of ClarisWorks docs)... build OCR into the emulator. :)

mark-r · on March 2, 2022

Survivorship bias it's called.

ggm · on March 2, 2022

he said.. in courier, monospaced paragraphs format, morally as close to "plaintext" as you can be with a couple of diagrams which could have been ASCII art...

ciphol · on March 2, 2022

Ironically, the "pro plain text" link posted earlier used lots of formatting.

I don't see why pure plain text is better in any way than plain text with formatting, like a simplified form of HTML (<a>, <b>, <sup>, some kind of table formatting, etc). The latter is non-proprietary, easily read and diffed, and communicates better than pure text.

Images have their own value, as do animations and video on occasion. Here matters become more complicated - image formats are generally non-human-readable and non-diffable (though SVG or a similar format could solve those problems for schematic-type images) and image conversions generally involve data loss. For starters, though, one should at least use a non-proprietary format for images and video.

alanbernstein · on March 2, 2022

As the pro-plain-text post said: HTML, Markdown, JSON, LaTeX, and many other standard formats, are just plain text.

ciphol · on March 3, 2022

It depends. XML is generally not human-readable, it's got way too much programming code in relation to the amount of human content. But a simple subset of HTML is definitely human-readable.

mro_name · on March 2, 2022

No, not when it comes to what reading person they target.

Here text/plain is the only that works for e.g. my mother.

bbarnett · on March 2, 2022

like a simplified form of HTML (<a>, <b>, <sup>, some kind of table formatting, etc). The latter is non-proprietary, easily read and diffed, and communicates better than pure text.

Yes, but, the problem isn't typically being proprietary, when it comes to future use, but a closed, non standard, unknown format.

Yet you're creating a new standard here, with your own rules, which no one will understand, and which no automated tools can convert to another format.

(Eg some kind of table formatting)

Better to be 100% html than this.

(Maybe you meant that, but regardless, this is a good place for me to comment on standards being more important than anything else.)

mro_name · on March 2, 2022

> no one will understand

that's a bit harsh.

And frankly, I don't understand 100% english either, but still we use it to communicate.

bbarnett · on March 2, 2022

The point isn't that you can learn it, but instead, having to learn it by examining it in depth, always wondering if there are things not used yet (does the doc you look at, only show part of the standard in its formatting? Imagine a doc not using a tag, but the tag is in other docs...), and then, writing code to covert it.

Standards exist for a reason.

We already have issues with people not understanding specs, and writing data out of spec, even with that spec in RFC's!

mro_name · on March 2, 2022

my favourite is the spec that was changed after the fact:

Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character.

https://datatracker.ietf.org/doc/html/rfc3339#section-5.6

But what harm does it to restrict oneself to a few html tags?

lbriner · on March 2, 2022

Exactly. The OP said what they needed to in plain text because it captures what they wanted to say in the simplest format.

If they had needed to convey an image or contextual information like some rich API spec, they would presumably have used something else.

kleiba · on March 2, 2022

That is not a contradiction. The OP is just arguing that you should use the best medium + format for the job - and sometimes simple text is sufficient (as it says in the article).

gwern · on March 2, 2022

And if the diagrams had been ASCII art diagrams (eg https://twitter.com/thorstenball/status/1498541884796542977 ), OP could fix the typo in the diagram with a keystroke.

yumiris · on March 2, 2022

Cheers for the heads-up on the typo. The diagram's an SVG, with the labels being in plain ASCII. One keystroke is all it took indeed! :P

On the serious side, ASCII art diagrams are splendid and I very often use them myself, though they can get quite complex and thus messy to maintain. There comes a certain point where they lose their simplicity, sadly.

briandoll · on March 2, 2022

I assume this is a response to Derek Sivers post: Write Plain Text Files https://sive.rs/plaintext

I've been using computers daily for about 35 years now and I have a _lot_ of plain text files that I regularly use -- notes, lists, outlines, quotes, links, etc. Does anyone who has been around a while, have a large multi-decade collection of texts that are _not_ plain text? What formats do you use? How do you maintain access to those files over time?

paxys · on March 2, 2022

My MP3 collection has been going on for at least 25 years and still works perfectly. Same for HTML pages (I have entire websites backed up from the early 90s). I still have Wordstar and Word 1.0 files which I can open and edit. I can't think of too many pieces of software or data formats from the last 30-40 years which achieved some threshold of popularity but have no support today.

briandoll · on March 2, 2022

MP3 is a good one. Although I had to develop a lot of perl to manage the various incantations of ID3 tags, especially when VBR became popular. MP3 files may still play, but the full experience (properly attributed w/ band, album, song title, song number, album art, etc.) is likely less than perfect over time.

Do Wordstar files open in modern Word applications, even on iOS? That's part of the access aspect over the long term -- files that can be used, everyday, with your daily-driver tools with minimal special software needed.

jjav · on March 2, 2022

> My MP3 collection has been going on for at least 25 years and still works perfectly.

Mine as well (maybe not quite 25 but close). But music isn't written word, clearly it wouldn't be in an ASCII text file.

The key is universal, non-proprietary formats that are supported by thousands of open source applications. Those are the formats that will last a lifetime and beyond. So, plain text for the written word (HTML counts as plain text, you can read and write it in any plain text editor), JPG for pictures, MP3 for music.

For video there doesn't seem to be an answer that is fully satisfactory, that I feel confident I can still view in 50 years. So I mostly take photos, not much video, since I can't trust the longevity of video.

titzer · on March 2, 2022

> What ultimately matters is that information is captured and preserved as thoroughly as possible. Between a picture that expresses a thousand words, and plain text file that sacrifices its detail and authenticity, why wouldn't we choose the former? Indeed, this question applies even the choice may sacrifice the longevity. What's the point of longevity, when the pursuit of it can compromise our ability to capture the information we may be afraid of possibly losing?

I would contend that capturing a picture is absolutely a massive distortion of reality because reality is three dimensional, exists in many spectra beyond visible light, has sounds, smells, taste, and feeling, and exists in a historical context. The selection of framing, distance, focus, all of these are biases of the photographer. A photo is a lie, too. Just because it's higher resolution doesn't mean it has indeed captured the right information.

Text is a lie too, granted. But in our current digitization zeitgeist, we have forgotten that our media (pictures, video, recordings, not just the TV, cable, and internet) lie to us. Our own bias towards slicing apart the world into computer-digestible bits is just us lying more convincingly to ourselves.

selfhoster11 · on March 2, 2022

By that definition, using literally any point of view to capture, measure or describe information is a lie.

I take issue with that. This is stripping the word "lie" from it's time-honoured meaning (~"distorting or fabricating truths to influence decision making or perception"), and dilutes it for when we actually need to call out lies.

orzig · on March 2, 2022

Render to ASCII, everyone wins! (e.g. https://ascii-generator.site/)

copperx · on March 2, 2022

> but dismissing or abandoning media files is a much more guaranteed potential loss of information – information which plain text cannot capture due to its limitations.

Some examples are sorely needed. How is a Word/InDesign file more authentic than a plain text file? Or is the author talking about media? Is a ProTools session more authentic than Wav files?

coldtea · on March 2, 2022

>Is a ProTools session more authentic than Wav files?

Dunno about 'authentic', but since the part you've quoted specifically talks about "loss of information", the WAV files indeed incur loss of information compared to a ProTools session.

E.g. if it's a single stereo wav file render, it would miss all the individual channels, for starters.

If it's multiple wav files with all the channels as stems, it will still miss the effect chain settings (and hardcode them in the final result), the MIDI notes (hardcoded as the rendered VST output), session markers, tempo change tracks, and other such things.

falcolas · on March 2, 2022

> E.g. if it's a single stereo wav file render, it would miss all the individual channels, for starters.

A DAW session is like notes for writing a book. Not everything is going to make it in, and the choice of what does make it from the notes to the book, and how it's changed, is quite intentional. And I, personally, don't consider a book to be "lossy" or "unauthentic" because it doesn't also come with all the author's notes.

So, if it's not in the final mix, it's because it's not supposed to be in the final mix; it's not that the data is lost because of technical limitations. And like notes from a book, unless you throw them away, they're not going anywhere.

On a more technical note, underneath the hood, the recorded items are all stored as .wav files too...

coldtea · on March 2, 2022

>A DAW session is like notes for writing a book. Not everything is going to make it in, and the choice of what does make it from the notes to the book, and how it's changed, is quite intentional. And I, personally, don't consider a book to be "lossy" or "unauthentic" because it doesn't also come with all the author's notes.

Yeah, not really.

In music, for starters, a DAW session is like the recording reels from the analog days. And artists, producers, and studios go back to those reels a lot, for many reasons: to later clean up, rebalance, and release a "remastered version", to adapt to a new format (e.g. Apple/Dolby's spatial audio or some 5:1 surround mix), or simply to give individual parts to collaborators to make a remix of the track, or even just for the artists themselves to plunder it for parts to reuse in later works.

What your comment misses is that we're talking about the author here, not the reader.

The author (or in the DAW session case, the producer/artist) is the one who would be having the original format, and have a choice to keep their stuff as a ProTools session or wav stems, or as a text file or some proprietary format.

So while you "personally, don't consider a book to be "lossy" or unauthentic because it doesn't also come with all the author's notes.", the author would indeed be furious if we needed his notes and couldn't open them because he wrote his first book+notes in some editor/format since discontinued, and he now only has the final printed or ebook text.

>On a more technical note, underneath the hood, the recorded items are all stored as .wav files too...

Which is neither here nor there regarding the things I've mentioned as lost (e.g. the fx chains used with their settings for re-toggleability, sequenced notes, automation curves, and so on), and is also not generally true across DAWs, depends on the DAW whether they'd use some proprietary format.

falcolas · on March 2, 2022

Perhaps this is a protools vs. Reaper thing, but none of that is ever lost in Reaper. So long as I don't get rid of the directory, I will not lose any of that. And, if we're talking about the producer, creating the final mix doesn't lose any of the authenticity of their session.

I'm probably missing something, but between wav files and the metadata (which is, IIRC, marked up text files in Reaper), the producer will never lose anything.

EDIT: Confirmed, .rpp files (and most of the plugins) are text files. So, best of both worlds - longevity AND authenticity (and infinite undo).

lallysingh · on March 2, 2022

I've read some docs with ASCII art diagrams far more complex than the medium really allowed.

I would have preferred PDF/A

jauco · on March 2, 2022

Real archivists (as in people that have archivist as a job description and work at places that have “storing data forever” as a mission statement) tend to store the data in multiple formats. The source + a few derivations. They also store a bunch of copies to ward against bitrot. And they periodically compare the copies.

Real archivists use a lot of data :)

selfhoster11 · on March 2, 2022

I think part of the job of an "archivist" archivist (as opposed to an amateur archivist), is making information accessible to others. For that, you need derivations, because nobody will necessarily know how to deploy a Mac OS 7 virtual machine, install Claris Works (or whatever it is), load the original file onto the machine, and then navigate the contemporary UI (with it's unusual conventions) to get at the information they wanted. For personal data, I already know how to get an old environment up and running, so I'm happy enough to keep multiple copies of the original and of any software I need to open it.

davbryn1 · on March 2, 2022

"Prioritising the longevity of data can sacrifice the authenticity of what it tries to capture and preserve. When I say authenticity, I refer to how accurate and detailed the data in question preserves a particular state. An original raw image, for example, will capture a landscape much more authentically than written text would. Written text will inevitably comprise of ambiguity and even bias, if not distortion."

Or, you need to become a better writer.

selfhoster11 · on March 2, 2022

There's a reason why it's said that a picture is worth a thousand words. There are trade-offs, and at the end of the day some things are more efficiently described in text, and some visually.

Annatar · on March 2, 2022

"This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

http://catb.org/~esr/writings/taoup/html/ch01s06.html

nicbou · on March 2, 2022

There doesn't need to be a compromise. You can have both if you keep your data in multiple formats. Storage is cheap and text files are small.

My timeline thing [0] keeps the original archives, stores the timeline entries in a database, and exports them hourly as JSON + files. If the code stops working or the database crashes, the files are still there. The automated backups are there too. No information is lost.

However, the richness is not lost in the process. This timeline has geolocation history, notebook scans and a bunch of other things that don't really translate to plain text.

The most important difference is that I can write to my timeline from my phone. Managing text files across devices is quite troublesome by comparison. If I want plain text out of it, I can write a new Destination that pipes entries to plain text files or to a fax machine.

[0] https://nicolasbouliane.com/projects/timeline

dorfsmay · on March 2, 2022

Whenever choosing a markup, image format, or other technologies, keep the Lindy effect in mind. A boring technology that has been around for a long time will survive a lot longer and a brand new shiny one.

https://en.wikipedia.org/wiki/Lindy_effect

writegit · on March 2, 2022

Or both?

I have a daemon that watches for binary changes in writing documents.

If changes are identified then it runs:

    $ libreoffice --headless --convert-to txt <CHANGED_FILES>

Then commits the plaintext to a git repo.

Allows for diffs, text search, and "longevity" across "authentic" docs.

VariableStar · on March 2, 2022

IMO the question is more about which standards are used, rather than specifying an specific format. In particular, using open and free standards and formats increases the chance to retrieve and use data after long time storage. Different formats suit different data types.

highspeedbus · on March 2, 2022

Obsidian/Markdown file structure is great for this. It can become a standard to "Offline Hypertext" format.

Despite text being fully portable, it is limited when it's needed to link a image or other files. People often forget how useful this concept is.

Html is not a viable option as it is awfully verbose for taking simple a note.

Markdown adds just enough semantics that is perfectly readable. From a hex editor to Microsoft Word.

We're in a somewhat critical moment, where markdown can either stay as it is, then dominate and become a godsend format of solid usability for decades, or a harmful feature is added that would slowly drag the whole thing down until the next Just Write Plain Text blog post.

ad404b8a372f2b9 · on March 2, 2022

I think longevity is not just an issue of the data format but more so of its organization. It so happens that text files organized using the file system is the most easily producible, maintainable and queryable data organization tool. But other media can have the same properties if they're organized using the file system rather than any complex tools. I have graphs and datasheets that have endured decades that I refer to often and are easily findable because they are well-named files in well-named folders, even though the formats are comparatively much more complex.

Beldin · on March 2, 2022

It seems the author overlooked the possibility of writing out the full binary string of whatever format he'd like (i.e., "zero one one ..."), prefaced by instructions on how to parse that.

That would give you great "authenticity" (in his definition) and great longevity.

Not practical for reading back, but that was not the point. With the help of a few simple scripts, writing is easy. So, in the end, not really an argument against storing information exclusively in plaintext.

jjice · on March 2, 2022

We use Google Docs for pretty much all of our docs since they're easy to create, share, and modify, and it works pretty well. I just (selfishly) want a good integrated plain text editor as part of GSuite. Sharing code via Google Docs isn't great, and sometimes I don't want to think about headers and formatting, I just want to use tabs to separate my pieces. That said, I'm definitely in the minority of users and I'll deal with it, not that big of a deal.

thematrixadmin · on March 2, 2022

What about writing data in markdown format, physically on the HDD. You can use bunch of different both online and local tools which will probably stay supported in the future. There is also no problem with implementing your own markdown editor (nice side, pet project as well). I store and run small server on my RPi, accessible through my phone and desktop. If I'd like to show the text to somebody I can easily copy it as a plain text, Word format or export it to HTML or PDF.

happyglands · on March 2, 2022

I've struggled with this for quite some time now, and tried almost every tool out there. At the moment, I'm settling with Bear, writing my notes in Markdown. I prefer the ease of using nvAlt but I need the ability to store images and PDFs and I like the fact that it has some very nice export options should I eventually move to another tool, so I don't feel like I'm "locked in".

m348e912 · on March 2, 2022

This might be off topic but in terms of communication such as email, plain text seems the most authentic format to me. For example, if you are one of those sales guys that bolds and highlights the important parts of an email that you send, it's off-putting. The only exception I would give is if you wanted to add an inline image or an emoji -- everything else, plain text.

amiga1200 · on March 2, 2022

The Epic of Gilgamesh was written in plain text.

wl · on March 2, 2022

In a complicated, undocumented format that had to be reverse-engineered (Sumerian/Akkadian/Hittite cuneiform).

CRConrad · on March 3, 2022

And those clay tablets were all craggly-surfaced and far too thick to fit in my DVD player! Shabby archivists, those Babylonians.

jdvh · on March 2, 2022

Plain text is so compelling because it's as simple as it gets, you can bring your own editor, you own your own data, and you can use version control.

Text+ is compelling because you can have images and some kind of formatting. You want to store metadata and have backlinks and tags. Ideally with the possibility of collaborative editing.

There should be a way to fuse these two.

Geezus-42 · on March 2, 2022

The latter sounds like Obsidian or Logseq or most other markdown editors.

quasarj · on March 2, 2022

Wrote a whole article about not using plain text. Used plain text for everything except a useless image. A+++

chaxor · on March 2, 2022

I like the idea of making a binary file into a plaintext file - but you could store it as the ASCII characters "0000110100111011110001111100101..."

This would be great for many reasons. At the top of that list for example, is getting a lot more use out of those hard drives you paid for.

dade_ · on March 2, 2022

MD for all things text and SVG journals for handwritten notes, diagrams, sketches, screenshots. Works great, but haven’t found a way to integrate them beyond using a common set of folders.

mxuribe · on March 2, 2022

> ...SVG journals for handwritten...

Would you kindly clarify this? Did you mean scan in handwritten material but save it in a scalable image format like SVG? I'm quite interested but maybe i'm not capturing what you mean here, because i have not had my breakfast. :-)

dade_ · on March 2, 2022

I use Write (iPad/Windows/Ubuntu/Android) to take handwritten notes and jot info. I first started using SVG on my Sony eReader, it used this vector format. http://www.styluslabs.com/

necovek · on March 2, 2022

Not GP and I have no idea if they meant this, but "smart" vectorization/tracing would be ideal when use of touchscreen pens is impossible (still doesn't capture the angle a pen is being held at, but we are getting close).

For the state of the art, look up "image tracing".

mxuribe · on March 2, 2022

Thanks, I'll look up image tracing.

anotherevan · on March 3, 2022

Reminds me of the Einstein quote: Make something as simple as possible, but no more so.

Paraphrased: Make your information capture format as simple as possible, but no more so.

gandalfff · on March 2, 2022

Plain text is fine for some things but lacking for others. I like GUIs for formatting. I wouldn't be surprised if my ODTs could be opened a thousand years from now.

CRConrad · on March 3, 2022

Betcha you'd be surprised to be around to find out, though.

a1445c8b · on March 2, 2022

s/comprise of/comprise/g