Stop the Vertical Tab Madness (2010)

billyhoffman · on July 2, 2015

I've used \v support, or lack there of, to fingerprint web devices during security audits.

Whitespace is often defined as space, \t, \r, \n, and \v. However many specs, like HTTP, will sometimes exclude \v. Depending on the underline functions products use in their HTTP parsers, you can fingerprint servers, WAFs, proxies, load balancers, whatever when using \v to separate HTTP headers lines or name\value pairs

KeytarHero · on July 2, 2015

    name
        alue

pairs? ;)

jkire · on July 2, 2015

> Here's my simple plea: stop it. Stop mentioning vertical tabs in tutorials and language references.

I'm failing to see why this is an issue? The examples tutorials/references he gave only mention '\v' in tables of possible escapes. What is the downside to mentioning it, people spending 5 minutes googling for what a vertical tab is?

Unless a language removes support for escaped string literals entirely it seems odd to remove support for a particular standard, if mostly unused, escape sequence.

zaphar · on July 2, 2015

When I was a very green coder I saw the vertical tab in one of those tables and spent a fruitless afternoon trying to use it to align some text. I don't even remember what I thought it did but the afternoon could have been better spent if anything anywhere had said it does absolutely nothing on any modern system.

So the harm is that people see vertical tab and without the associated historical context come up with their own ideas of what it does and then waste time.

If you do include it at least show it in such a way that people know it's a historical oddity with really nothing but very obscure uses like the fingerprinting example mentioned elsewhere in the comments.

GrantSolar · on July 2, 2015

Ironically, this is the first time I've heard of `\v`

agumonkey · on July 2, 2015

Old codes are full of gems. ASCII has field record separator for instance. Free CSV.

jug · on July 2, 2015

Reading https://en.wikipedia.org/wiki/C0_and_C1_control_codes now and getting this urge to write a pure ASCII editor, haha.. Seriously, it's actually suitable for more advanced editing than modern day "pure text" editors.

SOH + STX to set a document title. FF for page breaks. DLE to allow embedded (binary/uninterpreted) data like images and avoid printing garbage. FS, GS, RS, US to support tables.

I find it interesting that this coincidentally works complementary to syntax like Markdown. :p

jimktrains2 · on July 2, 2015

I tried playing around with something like that.

http://jimkeener.com/posts/ADF

I did complicate it a little by trying to insert field metadata into the file.

tptacek · on July 2, 2015

SOH is also important in the FIX protocol --- and the fact that FIX relies on ASCII arcana causes security problems.

I think the article is right: eliminate \v.

agumonkey · on July 2, 2015

4 level list separators, one could almost have an ascii filesystem/db.

wglb · on July 2, 2015

Oh, would that we had used that to begin with instead of the ghastly quote convention!

jacquesm · on July 2, 2015

Computing is full of crappy re-inventions of things that already existed. For instance, many old style terminals could do fields and validation of those fields.

agumonkey · on July 2, 2015

The world is spiraling on itself so much it's crazy. Accepting it is one big step toward happiness.

Talking about old terminals. I worked at a tax office, around 2005, the web was getting trendy, so old AS400 applications were on the way out. I had to use them just before they were replaced by whatever webapp was coming. It was one of the best user experience I had. That old system was so on-point. It's funny because AFAIK terminal code had close to no structure, it parsed fixed patterns on screen buffers, pretty archaic. But, the software was hyper ergonomic, responsive, simple and solid. It did infer/complete lots of fields, find non trivial issues and suggest corrections. I could barely imagine the amount of regression that were about to hit the employees when the html/js version would land, it did make me really sad.

<rant/>

Vexs · on July 2, 2015

I really hate the move to webapp everything, at my workplace we used to have this great terminal interface as a server database, it was easy to script with, automate with, and use. Need a server? Just query it in terminal for data.

Now we have this shite webapp that only runs properly in IE, it's unusable for any sort of mass-querying, and it takes a good 10 or so clicks on buttons that resize themselves to find out all the info about a server, whereas the terminal program was just "ServerLookup X".

agumonkey · on July 2, 2015

I too. I'm not against newness, not even some regression. You gain things, you may lose some, but right now it seems no gain only pain. All so that you may evolve it, so that it's css3-cute. Nowadays at least you'll have responsive layouts to avoid mobile/desktop redundancy. But in 2005 there was none of that.

johnchristopher · on July 2, 2015

I remember an article in Wired about an (old?) company still using AS400 or punchcard in their accounting branch and resisting the migration. Can't find it at the moment though :(.

kps · on July 2, 2015

  > For instance, many old style terminals could do fields

And often had a next-field key distinct from the tab key.

agumonkey · on July 2, 2015

Don't ask, it's only hurting.

wglb · on July 2, 2015

Sorry, but I actually am using RS, FS in the equivalent of a CSV import. It's ASCII, folks.

simonbyrne · on July 2, 2015

I wonder why these are no longer used. There is the obvious visibility issue, but tab delimited files are still quite common. I wonder if it was because languages (particularly C, and all languages which copy the C convention) don't provide escape characters for them?

brudgers · on July 2, 2015

Overloading whitespace is fine for machines, but it doesn't improve human readability. In The days when memory and storage were small and runtime compression would crush throughput saving a byte here or there made sense in the mainstream. But these days? The critical reasons to use text tend to be either hitting an existing interface or human readability, and with 1GB of RAM about the price of a Happy Meal, the risks of invisible textual complexity probably outweigh any benefit.

agumonkey · on July 2, 2015

Cynical me would say it was used a lot at first, but then ASCII became something of the past, and some company doing data work started using commas, it got trendy and ASCII RS went into the closet.

Wikipedia suggest CSV was already around at the time of punched cards, I guess people would prefer commas over some obscure RS code.

adamc · on July 2, 2015

Visible delimiters are easier to understand when folks look at the files later -- they are essentially self-documenting. Whitespace takes more work to decode.

billrobertson42 · on July 2, 2015

Ascii came around in 1963, so possibly csv predates ascii?

agumonkey · on July 2, 2015

Wikipedia quote:

Comma-separated values is a data format that pre-dates personal computers by more than a decade: the IBM Fortran (level G) compiler under OS/360 supported them in 1967.

CSV isn't something computer specific, it's basic human grammar, CSV probably just put a tag on a common practice.

1wd · on July 2, 2015

Random fact: Microsoft PowerPoint inserts a vertical tab when you press Shift+Enter, e.g. to start a new line inside the current bullet point.

chias · on July 2, 2015

Is that a vertical tab, or a line break? It seems much more consistent with the behavior of <br> in HTML documents (i.e. new line but not new paragraph/bullet/whatever).

JonathonW · on July 2, 2015

In the XML PowerPoint format (.pptx), it's a line break (inserts an <a:br> element between the two lines). There's no vertical tab character saved out to disk (who knows what PowerPoint does internally, though).

I don't have the time or the initiative to figure out what the old .ppt format does here.

1wd · on July 2, 2015

I don't know about internally or disk formats. The vertical tab turns up when copying and pasting to a text editor though.

err4nt · on July 2, 2015

I always understood this functionality as 'soft return' versus a 'hard return'. One adding a line break (like <br> in HTML) and the other starting a new paragraph (like <p>) with its own indent/margin/spacing rules.

unwind · on July 2, 2015

This seems related: http://www.gnu.org/prep/standards/standards.html#index-contr....

That's a part of the GNU Coding Standards which say:

Please use formfeed characters (control-L) to divide the program into pages at logical places (but not within a function)..

I always found that particularly archaic.

And yes, of course I realize that vertical tab and form feed are distinct characters.

wtbob · on July 2, 2015

^L is supported in most pagers and news clients to split pause scrolling, so this makes sense.

froydnj · on July 2, 2015

Obviously it's helpful for printing things.

Less obviously, emacs has commands that navigate by logical pages (C-x [, C-x ]). And of course you can adjust the regex that denotes logical pages.

VeejayRampay · on July 2, 2015

Used to be a way to detect IE back in the days.

http://ajaxian.com/archives/ievv

zimbu668 · on July 2, 2015

After we remove support for vertical tab can we drop octal literals from every language? As fun as it is to snicker at a young programmer who spent 2 hours debugging

x = 0123

it's time for octal to go away.

KeytarHero · on July 2, 2015

Same with trigraphs - as much fun as it is to joke about the ??!??! operator[0]

[0] http://stackoverflow.com/questions/7825055/what-does-the-c-o...

GFK_of_xmaspast · on July 3, 2015

I think trigraphs are supposed to be going away in c++17.

KeytarHero · on July 3, 2015

They are, but as far as I know C has no plans to remove them.

desdiv · on July 2, 2015

Hear, hear.

Scala recently removed it altogether: https://issues.scala-lang.org/browse/SI-5205

Other languages (Python 3, Ruby) are moving to the 0o syntax.

pekk · on July 2, 2015

Python has had octal literals like this for years, so it is misleading to say that it is "moving to" it.

jrochkind1 · on July 2, 2015

There are often use cases for many of the ASCII control characters, which is why the escape sequence feature is there.

It would be odd to support an escape sequence feature for every ASCII control character _except_ vertical tab, or to support it but leave it out of the docs.

It's just coming along for the ride with the general escape sequences for ASCII control chars features.

bjourne · on July 2, 2015

What's the harm? In any software project useless shit accumulate because it is much easier to add features than to remove them. Support for one useless feature such as \v will not make a codebase into a mess of spaghetti, but support for enough useless features like \v absolutely will.

The burden for \v is not zero. Every programmer working on the string escapes part of the code has to read and understand the lines that implement it. And it has to be tested and documented and if your documentation comes in multiple languages, translators have to spend time translating text for a completely useless feature.

Writing software is like writing a book of code for other programmers to read. An author wouldn't leave in meaningless chapters in the book it is writing because "what's the harm?" and neither should good programmers.

couchand · on July 2, 2015

While your argument holds generally, let's remember that unless the software is horribly architected the difference between escape-everything-but-vertical-tab and escape-everything really should be trivial.

sph · on July 2, 2015

There was a Python web framework which used \v to separate between code and HTML.. Does any of you remember the name?

EDIT: found it, it's called Aspen (http://aspen.io/simplates/). They actually were using form feed (^L or \f), but apparently have switched to an ASCII combination to separate code from presentation.

From the web archive: https://web.archive.org/web/20110412072653/http://aspen.io/p...

There's an old HN discussion about it: https://news.ycombinator.com/item?id=2410221

tom-lord · on July 2, 2015

> If I could stealthily patch the compiler for any language supporting the "\v" escape so I'd receive mail whenever it occurred in source code, then I could trace actual uses of it. I'm willing to bet that all the mail would come from beginners trying to figure out what the heck "\v" actually does, and then giving up when they realize it doesn't do anything.

Whilst it's true that I've never actually used `\v`, I have included it in code before to cover genuine, necessary edge cases... For example:

https://github.com/tom-lord/regexp-examples/blob/master/lib/...

Swizec · on July 2, 2015

For what it's worth, I've never before heard of \v.

oneandoneis2 · on July 2, 2015

Perl staying ahead of the curve as usual ;)

kriro · on July 2, 2015

The contrarian in me wants to update slide #1 to

>>> print('Hello,\vworld!');

now :D

Actually I think I'll include a "guess what this control sequence does" slide before discussing them.

chjj · on July 2, 2015

Is \v being mentioned in specifications really causing a problem? I've never seen someone mistakenly use a vertical tab in code where they could be using something better.

Maybe it's useless, but I wouldn't say it's harmful. This is a pretty overzealous rant over nothing: it's almost as useless as the vertical tab.

userbinator · on July 2, 2015

From https://en.wikipedia.org/wiki/Tab_key :

"In practice, settable tab stops were rather quickly replaced with fixed tab stops, de facto standardized at every multiple of 8 characters horizontally, and every 6 lines vertically (typically one inch vertically)."

You could try testing this.

As for removing it, I'm not convinced that it's worth the effort to; it's basically a single case in an escape-handling switch. The article he links to in the first line can basically be summarised as "I don't understand escaping and want to replace it with something even more complex".

billforsternz · on July 2, 2015

I think the article he links to can basically be summarised as "Escaping considered harmful, here's a better way to solve the same problem". I don't see any evidence that he doesn't understand escaping, and his claim that eliminating it would be a net win seems plausible (at least) to me.

userbinator · on July 2, 2015

I don't think what he proposes is really a better way, because requiring compilers to comprehend string concatenation and a few more extra reserved words specifically for the characters is a more complex and less general solution than encoding using the string itself, which is what escaping does.

Escaping is amazingly elegant once you realise how general and simple it is, and it's also very important to understand it when designing things like data formats and protocols (length-delimited fields are the best, but it is not always possible.) Ignoring escaping, which is what would otherwise occur, tends to cause rather horrible security issues.

billforsternz · on July 2, 2015

I guess different people can look at the same thing and perceive an elegant solution OR an ugly mess. Note that the most important data protocols don't use escaping. I'm probably biased after years of DOS/Windows programming and having to remember to do things like "C:\\Users\\Bill\\documents" etc.

sophacles · on July 2, 2015

Just for fun, I decided to see what the heck '\v' does....

   >>> print 'hello\vworld'
   hello
        world
   >>> print 'hello\v\vworld'
   hello

        world

Now I want to use it a bunch.

franciscop · on July 2, 2015

I am surprised that no one has commented on vertical languages. I am learning Japanese and I can see how this would be really useful in any vertical writing languages.

muchcomment · on July 2, 2015

Had my first encounter with \v doing an import of a legacy database just a few weeks ago. The data was passed on to us in a batch of XML-files. For some reason our XML parsing library would just ignore the rest of the file when it came to the \v character. Took me some time to find the culprit.

Edit: The \v character had somehow made it into one of the descriptions for one of the user profiles.

jerf · on July 2, 2015

Just checked, and \v is illegal in the characters of an XML document: http://www.w3.org/TR/REC-xml/#dt-text

But you should have gotten an error, of course, not the silent truncation you imply.

If you need to salvage the character, your XML library may let you specify it as &#0b;. That is still a violation, but a lot of libraries seem to let it through: http://www.w3.org/TR/REC-xml/#sec-references (see "Well-formedness constraint"... you are specifically not allowed to use this to do what I'm suggesting here).

Anyways, the moral here is that XML CAN NOT carry arbitrary binary, and EVERY TIME you output something in XML, something in the system needs to run some sort of encoding & illegal-character cleaning pass on the output text. The moral equivalent of "<tag>$content</tag>" in your language is ALWAYS wrong, unless you specifically processed $content into XML character content earlier. This is true even when your really sure $content is "safe". Even if you're right... and statistically speaking, you're not... do it correctly anyhow and call the right encoding function.

spdustin · on July 2, 2015

I've dealt with vertical tabs and linefeeds by just Base64-encoding character data that might include them before stuffing it into a CDATA node in the XML doc.

It's a hack, sure, having to encode/decode all the time, but if you need to store those characters, it's the only bulletproof way I've found.

jerf · on July 2, 2015

I have to admit I'm still kind of split on whether XML made the right call here. It's tricky with character encodings to allow arbitrary binary in the characters, but something like CDATA could have permitted it, perhaps with a shell-like specification of a terminating byte sequence, or even with a UTF-8-style prefix number that indicates the length. This sounds great to me at first. But then I put on my security hat and consider what horrors would transpire in the bowels of programs unprepared to handle binary or somehow can be tricked during validation vs. parsing or any number of other nightmares one could do with this, and I go back to neutral-at-best. (I'd go negative, but on the other, other hand [1], a lot of these things are already happening as people blithely stuff these things in to XML documents anyhow, standard or no.)

[1]: No, not gripping hand... that's only for when the third choice is the dominant/default/obviously-correct-once-I-say-it choice.

jrochkind1 · on July 2, 2015

Yep, that's a correct and fairly standard way of embedding binary data in XML. Base64 Encode.

Always makes me nostalgic for usenet. Which yes, technically was UUEncode back in usenet days, some slight technical differences from Base64 Encode.

halosghost · on July 2, 2015

Actually, glibc's argp library (a more featureful alternative to getopt) uses '\v' for a few things (though, not at all for what it is normally used to mean :P).

lubujackson · on July 2, 2015

\v makes a better separating character than commas or tabs since it won't appear naturally in text.

kevin_thibedeau · on July 2, 2015

RS is even better since it's, you know, the record separator.