Always bet on text (2014)

dang · on Jan 7, 2023

Always bet on text (2014) - https://news.ycombinator.com/item?id=10284202 - Sept 2015 (69 comments)

Always bet on text - https://news.ycombinator.com/item?id=8451271 - Oct 2014 (196 comments)

zzo38computer · on Jan 7, 2023

I do think there are many benefits to using text, and often plain ASCII (without formatting) is suitable, too. It is not always better, but very often it is. I still use Usenet and NNTP and IRC, and many documents I write as written as plain text documents. It is true, text is usually much more efficient than other formats. The files can be smaller and take less time to transfer, too. Also unlike videos you can easily read it whatever speed and direction you want to do. Printed copies of documents also is useful and I usually make handwritten notes. I will usually prefer to have a transcript rather than a audio file, too. Searching can work better with text than using audio/video, and so can making logs of changes. In computer files, the end user can change the fonts if desired. Like mentioned in the linked article, you can have annotations, etc. Some computer games are text-adventure games without graphics, or with optional graphics. Yes, I really think that text is usually much better than others.

ngcc_hk · on Jan 7, 2023

You know the original internet protocol is so much easier than asn.qm1 for diagnosis than I think it is a key reason why internet protocol won. To these days major communication like json is text based.

Having said that when you generalised as in that article one has to think what is text. Also many text message （copper era) still not decriphered. It is not rosy.

And pictorials like Egyptian alphabet and in fact the chinese non-alphabet is picture based. Hence what is really important might be to preserve some reference. Even binary can be decrypted.

Btw how do we do voyager …

layer8 · on Jan 7, 2023

> a key reason why internet protocol won

The internet protocol (IPv4/IPv6) is binary, as are the usual transport-layer protocols on top of it (TCP, UDP, QUIC, etc.). You probably mean the typical application-layer protocols (HTTP, SMTP, etc.), although some are still binary (e.g. NTP), not to speak of SSL/TLS.

eternalban · on Jan 7, 2023

Your comment implies an understanding of text that only considers the surface level. Let's consider what is the fundamental characteristic of text:

Text is any expression of information that is conveyed as a linear sequence of units of a language.

Musical notation is in fact text. Any 'wire protocol' is by definition text. It may not be human readable but it remains text that is optimized for machine reading.

Text can be re-formatted. A musician following a horizontally scrolling sequences of 'notes' - units of the language - will play the same tune as the one who is following the 'square frame' chunks of linear sequences.

Since text is ultimately a 1D construct, it also affords us linear editing properties. We can insert text into text. We can re-arrange text. Text is exceptionally malleable. Sure, we can change aspect rations of images - a 2D language - but not without distortion.

Text: A form of conveying information in 1D using a finite set of unit glyphs.

Bonus points for thinking clearly about text, for those who still debate over "braces vs indentation", note then a "language" like Python (a 'pictorial' language ..) breaks this fundamental feature of text (i.e. it can not be conveyed purely as a sequence) unless one also encodes the 'white space' and 'invisible characters'. So for those who think text to be king, braces or some other symbol as first class units of the alphabet of a programming language are desired and appreciated.

syntheweave · on Jan 7, 2023

If we call any wire protocol "text", we're just restating information theory. The reason why text is text and not binary lies in the rules of presentation.

Text as 1D occurs in the trivial case of direct mapping of bytes to symbols, as in a hexeditor; but one of the first things we look for in analysis of byte data is some kind of notion of word or line - beginnings and endings of a group of data. Once you add these breaks as a rendering effect, text becomes practical for purpose, because it now automatically describes some hierarchy. The functional difference between word and line is just in how it makes use of vertical and horizontal dimensions to present the information. Both are spatial and therefore suggest "pictorial" usages.

So in practice everything we call "text" is at least 1.5-dimensional; the viewing model and editing functions are aware of spatial elements as well as symbolic ones. Formatting rules around tabs, column limits and the like are present as global properties of text documents. It's not something specific to "pictorial" languages, because these formatting rules and discussions about them reoccur in every collaborative project, regardless of language.

I spent some time examining this in detail and noticed that written language encodes many forms of "textual break" through punctuation, paragraphs, and pagination. We don't feel a need to group every written sentence in brackets because we have the punctuation marks to describe the hierarchy efficiently, with spacing and line breaks acting as larger grouping signifiers with more visual impact. And if we instruct a computer to read a text document in that way, by describing "breaking tokens" with different rankings, it can immediately begin to navigate the document in a hierarchical manner by observing higher vs lower rankings within the tokens. The linear encoding is ultimately just that - an encoding. The actual structure of text documents is always more complex and draws on hierarchical elements, even if it doesn't encode them with robust or flexible primitives.

A useful contrasting model is that of the spreadsheet. Spreadsheet cells are designed towards fungible congruency in that they can contain varying types and sizes of data, but are navigable within the document as interchangable elements. The reason why so many information workers(myself included) reach for the spreadsheet for every task has to do with this fungibility: rather than describing spatial formatting through a linear encoding, the assumed spatiality of cells allows for arbitrary groupings of information, which is useful as an iterative process. I might have a few key words for an idea or pieces of data I've collected, but I don't necessarily want to encode them in a sentence or list specifically - I just want them close to each other. Therefore, even if all I'm doing is writing textual notes, I still want to have the cell abstraction so that I can defer encoding a linear structure at the top level and move around my thoughts in more arbitrary ways.

(Would I write code in a spreadsheet editor? Absolutely, given a language designed for it.)

eternalban · on Jan 7, 2023

(Appreciate the thoughtful reply.)

https://gwb.blob.core.windows.net/markpearl/Windows-Live-Wri...

Point taken re. binary & information theoery. I'll step back from general wire protocol, to fixed frame, formatted wire protocols that have 'syntax'. I claim the above header is a kind of 'text'. The +0.5D you mention is sometimes internalized, in this case we have a fixed format, in other cases, we have internalized syntax and grammar rules of a language.

The fact that text can be spatially arranged or annotated, for presentation or clarity, touches on the malleability characteristic of text. Consider text before information systems. (Per my understanding, most punctuation are rather 'modern' additions to writing systems with earlier texts in the same language only using word and sentence delimiters.)

https://www.worldhistory.org/img/r/p/750x750/5037.jpg

Maybe these folks needed 'documents' to be of a certain shape (for storage, transit, etc.) so there are no extant examples of e.g. an elongated baked clay tablet for linear transcription. It is accepted that there is a 'windowing' aspect of any form of 'reading' text. I posit that all formatting meta-information is strictly a windowing concern and not fundamental to text itself.

alwayslikethis · on Jan 7, 2023

As a counterpoint to this, the book Amusing Ourselves To Death (1985) laments the decline of print and partially text as a method of communication and explores the effect on culture, among other things, as it gave way to television. It's an interesting read.

4b11b4 · on Jan 7, 2023

Sure, but the brain is also very good at visual understanding and can pick out relationships that would be comparatively much more work to explain with text.

blacksqr · on Jan 7, 2023

Does OP use Tcl for programming?

philbo · on Jan 7, 2023

OP is Graydon Hoare, he created Rust and is more recently a core contributor to Swift.