Hacker News new | past | comments | ask | show | jobs | submit login

'Always bet on text' is a catchy slogan, but the author fails to define 'text'. This is confusing because the post contains a lot of pictures of things that I don't know we would all agree are text.

Let's start with a radical position. Is something text iff it can be directly encoded in UTF-8? What, then, about symbols that have not yet made there way into Unicode? Like an i dotted with a heart. Does it become text when the Unicode Consortium says so?

Nowadays memes tend to be distributed as (animated) bitmaps. But if we wanted to, we could encode them more efficiently. So are they text?

If 'text' = Unicode then that would also mean that many mathematical expressions (matrices, fractions) are not text. Math texts before symbols were not very readable: http://www.boston.com/bostonglobe/ideas/brainiac/2014/06/bef...

ASCII-encoded math is not without problems either.

Does 'text' include semantic markup like 'emphasis', 'heading', or 'list-item'? Does it include visual markup like 'italic', 'underline', 'blue', or 'Times New Roman'?

Does 'text' include newline and tab characters? Is it correct to say that newlines and tab characters exist on paper? If they don't then why do we use them to indent blocks of code?

If a sheet of paper with scribblings can be text, then can a bitmap be text too?

Now that I've brought up mathematics, HTML, and code, should we think of text as a linear medium or is it better to think of texts as trees?

What about handritten class notes that include arrows that link together different text fragments? Are these arrows part of the text? Does that mean that texts are directed graphs?

I'm even wondering if the author might actually have meant 'always bet on language', although that seems kind of obvious.

Or perhaps he meant 'don't needlessly throw away information', which is what would be happening if your CMS served pages as HTML image maps.

That is to say, even if we're all inclined to say that text is awesome, which we probably are, we might still be saying quite different things.




Alphabets are just symbols that have some sort of meaning to you. Alphabets in a row are text. There are symbols that have no meaning to you but still are te xt such as Kanji or Korean alphabet for some people yet you consider them text. Unicode has nothing to do with what is considered text. Humans have for millenia used pictures as symbols that are used for text, in egypt for example.


Note that Unicode includes hieroglyphs: https://en.wikipedia.org/wiki/Egyptian_hieroglyphs#Unicode


Not just hieroglyphs... Unicode includes anything you can imagine


It includes many things, but not, for example, an 'i' dotted with a heart. And to represent mathematics well you need to use MathML.


There was a request for it: http://www.evertype.com/standards/iso10646/pdf/n258a-heartdo... (but do note the date)


What unicode does or does not include doesn't matter.


I just wanted to clarify this because I got the impression you were implying they weren't.


Would you consider integrals, powers, and fractions text? For readability, the usual notation uses two dimensions.

They can be written in one dimension of course, but so can anything that would count as language (given that all information can be written as a row of 0s and 1s).


>I'm even wondering if the author might actually have meant 'always bet on language'

This was my thought too, the examples the author gives are good illustrations of the power and expression of human language, not specifically text.


I don't know why you're getting so heavily downvoted for this, because every question you raise is absolutely right.

There's a lot of western, programmer central specific quirks being raised which don't necessarily translate across time or culture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: