
Stop the Vertical Tab Madness (2010) - draegtun
http://prog21.dadgum.com/76.html
======
billyhoffman
I've used \v support, or lack there of, to fingerprint web devices during
security audits.

Whitespace is often defined as space, \t, \r, \n, and \v. However many specs,
like HTTP, will sometimes exclude \v. Depending on the underline functions
products use in their HTTP parsers, you can fingerprint servers, WAFs,
proxies, load balancers, whatever when using \v to separate HTTP headers lines
or name\value pairs

~~~
KeytarHero

        name
            alue

pairs? ;)

------
jkire
> Here's my simple plea: stop it. Stop mentioning vertical tabs in tutorials
> and language references.

I'm failing to see _why_ this is an issue? The examples tutorials/references
he gave only mention '\v' in tables of possible escapes. What is the downside
to mentioning it, people spending 5 minutes googling for what a vertical tab
is?

Unless a language removes support for escaped string literals entirely it
seems odd to remove support for a particular standard, if mostly unused,
escape sequence.

~~~
zaphar
When I was a _very_ green coder I saw the vertical tab in one of those tables
and spent a fruitless afternoon trying to use it to align some text. I don't
even remember what I thought it did but the afternoon could have been better
spent if anything _anywhere_ had said it does absolutely nothing on any modern
system.

So the harm is that people see vertical tab and without the associated
historical context come up with their own ideas of what it does and then waste
time.

If you _do_ include it at least show it in such a way that people know it's a
historical oddity with really nothing but very obscure uses like the
fingerprinting example mentioned elsewhere in the comments.

------
GrantSolar
Ironically, this is the first time I've heard of `\v`

~~~
agumonkey
Old codes are full of gems. ASCII has field record separator for instance.
Free CSV.

~~~
jug
Reading
[https://en.wikipedia.org/wiki/C0_and_C1_control_codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes)
now and getting this urge to write a pure ASCII editor, haha.. Seriously, it's
actually suitable for more advanced editing than modern day "pure text"
editors.

SOH + STX to set a document title. FF for page breaks. DLE to allow embedded
(binary/uninterpreted) data like images and avoid printing garbage. FS, GS,
RS, US to support tables.

I find it interesting that this coincidentally works complementary to syntax
like Markdown. :p

~~~
jimktrains2
I tried playing around with something like that.

[http://jimkeener.com/posts/ADF](http://jimkeener.com/posts/ADF)

I did complicate it a little by trying to insert field metadata into the file.

------
1wd
Random fact: Microsoft PowerPoint inserts a vertical tab when you press
Shift+Enter, e.g. to start a new line inside the current bullet point.

~~~
chias
Is that a vertical tab, or a line break? It seems much more consistent with
the behavior of <br> in HTML documents (i.e. new line but not new
paragraph/bullet/whatever).

~~~
JonathonW
In the XML PowerPoint format (.pptx), it's a line break (inserts an <a:br>
element between the two lines). There's no vertical tab character saved out to
disk (who knows what PowerPoint does internally, though).

I don't have the time or the initiative to figure out what the old .ppt format
does here.

~~~
1wd
I don't know about internally or disk formats. The vertical tab turns up when
copying and pasting to a text editor though.

------
unwind
This seems related: [http://www.gnu.org/prep/standards/standards.html#index-
contr...](http://www.gnu.org/prep/standards/standards.html#index-
control_002dL).

That's a part of the GNU Coding Standards which say:

 _Please use formfeed characters (control-L) to divide the program into pages
at logical places (but not within a function)._.

I always found that particularly archaic.

And yes, of course I realize that vertical tab and form feed are distinct
characters.

~~~
wtbob
^L is supported in most pagers and news clients to split pause scrolling, so
this makes sense.

------
VeejayRampay
Used to be a way to detect IE back in the days.

[http://ajaxian.com/archives/ievv](http://ajaxian.com/archives/ievv)

------
zimbu668
After we remove support for vertical tab can we drop octal literals from every
language? As fun as it is to snicker at a young programmer who spent 2 hours
debugging

x = 0123

it's time for octal to go away.

~~~
KeytarHero
Same with trigraphs - as much fun as it is to joke about the ??!??!
operator[0]

[0] [http://stackoverflow.com/questions/7825055/what-does-the-
c-o...](http://stackoverflow.com/questions/7825055/what-does-the-c-operator-
do)

~~~
GFK_of_xmaspast
I think trigraphs are supposed to be going away in c++17.

~~~
KeytarHero
They are, but as far as I know C has no plans to remove them.

------
jrochkind1
There are often use cases for many of the ASCII control characters, which is
why the escape sequence feature is there.

It would be odd to support an escape sequence feature for every ASCII control
character _except_ vertical tab, or to support it but leave it out of the
docs.

It's just coming along for the ride with the general escape sequences for
ASCII control chars features.

------
bjourne
What's the harm? In any software project useless shit accumulate because it is
much easier to _add_ features than to _remove_ them. Support for one useless
feature such as \v will not make a codebase into a mess of spaghetti, but
support for enough useless features like \v absolutely will.

The burden for \v is not zero. Every programmer working on the string escapes
part of the code has to read and understand the lines that implement it. And
it has to be tested and documented and if your documentation comes in multiple
languages, translators have to spend time translating text for a completely
useless feature.

Writing software is like writing a book of code for other programmers to read.
An author wouldn't leave in meaningless chapters in the book it is writing
because "what's the harm?" and neither should good programmers.

~~~
couchand
While your argument holds generally, let's remember that unless the software
is horribly architected the difference between escape-everything-but-vertical-
tab and escape-everything really should be trivial.

------
1_player
There was a Python web framework which used \v to separate between code and
HTML.. Does any of you remember the name?

EDIT: found it, it's called Aspen
([http://aspen.io/simplates/](http://aspen.io/simplates/)). They actually were
using form feed (^L or \f), but apparently have switched to an ASCII
combination to separate code from presentation.

From the web archive:
[https://web.archive.org/web/20110412072653/http://aspen.io/p...](https://web.archive.org/web/20110412072653/http://aspen.io/page-
break/)

There's an old HN discussion about it:
[https://news.ycombinator.com/item?id=2410221](https://news.ycombinator.com/item?id=2410221)

------
tom-lord
> If I could stealthily patch the compiler for any language supporting the
> "\v" escape so I'd receive mail whenever it occurred in source code, then I
> could trace actual uses of it. I'm willing to bet that all the mail would
> come from beginners trying to figure out what the heck "\v" actually does,
> and then giving up when they realize it doesn't do anything.

Whilst it's true that I've never actually _used_ `\v`, I have included it in
code before to cover genuine, necessary edge cases... For example:

[https://github.com/tom-lord/regexp-
examples/blob/master/lib/...](https://github.com/tom-lord/regexp-
examples/blob/master/lib/regexp-examples/constants.rb#L44)

------
Swizec
For what it's worth, I've never before heard of \v.

------
oneandoneis2
Perl staying ahead of the curve as usual ;)

------
kriro
The contrarian in me wants to update slide #1 to

>>> print('Hello,\vworld!');

now :D

Actually I think I'll include a "guess what this control sequence does" slide
before discussing them.

------
chjj
Is \v being mentioned in specifications _really_ causing a problem? I've never
seen someone mistakenly use a vertical tab in code where they could be using
something better.

Maybe it's useless, but I wouldn't say it's harmful. This is a pretty
overzealous rant over nothing: it's almost as useless as the vertical tab.

------
userbinator
From
[https://en.wikipedia.org/wiki/Tab_key](https://en.wikipedia.org/wiki/Tab_key)
:

"In practice, settable tab stops were rather quickly replaced with fixed tab
stops, de facto standardized at every multiple of 8 characters horizontally,
and _every 6 lines vertically_ (typically one inch vertically)."

You could try testing this.

As for removing it, I'm not convinced that it's worth the effort to; it's
basically a single case in an escape-handling switch. The article he links to
in the first line can basically be summarised as "I don't understand escaping
and want to replace it with something even more complex".

~~~
billforsternz
I think the article he links to can basically be summarised as "Escaping
considered harmful, here's a better way to solve the same problem". I don't
see any evidence that he doesn't understand escaping, and his claim that
eliminating it would be a net win seems plausible (at least) to me.

~~~
userbinator
I don't think what he proposes is really a better way, because requiring
compilers to comprehend string concatenation and a few more extra reserved
words specifically for the characters is a more complex and _less_ general
solution than encoding using the string itself, which is what escaping does.

Escaping is amazingly elegant once you realise how general and simple it is,
and it's also very important to understand it when designing things like data
formats and protocols (length-delimited fields are the best, but it is not
always possible.) Ignoring escaping, which is what would otherwise occur,
tends to cause rather horrible security issues.

~~~
billforsternz
I guess different people can look at the same thing and perceive an elegant
solution OR an ugly mess. Note that the most important data protocols don't
use escaping. I'm probably biased after years of DOS/Windows programming and
having to remember to do things like "C:\\\Users\\\Bill\\\documents" etc.

------
sophacles
Just for fun, I decided to see what the heck '\v' does....

    
    
       >>> print 'hello\vworld'
       hello
            world
       >>> print 'hello\v\vworld'
       hello
    
            world
    

Now I want to use it a bunch.

------
franciscop
I am surprised that no one has commented on vertical languages. I am learning
Japanese and I can see how this would be really useful in any vertical writing
languages.

------
muchcomment
Had my first encounter with \v doing an import of a legacy database just a few
weeks ago. The data was passed on to us in a batch of XML-files. For some
reason our XML parsing library would just ignore the rest of the file when it
came to the \v character. Took me some time to find the culprit.

Edit: The \v character had somehow made it into one of the descriptions for
one of the user profiles.

~~~
jerf
Just checked, and \v is illegal in the characters of an XML document:
[http://www.w3.org/TR/REC-xml/#dt-text](http://www.w3.org/TR/REC-xml/#dt-text)

But you should have gotten an error, of course, not the silent truncation you
imply.

If you need to salvage the character, your XML library may let you specify it
as &#0b;. That is still a violation, but a lot of libraries seem to let it
through: [http://www.w3.org/TR/REC-xml/#sec-
references](http://www.w3.org/TR/REC-xml/#sec-references) (see "Well-
formedness constraint"... you are specifically not allowed to use this to do
what I'm suggesting here).

Anyways, the moral here is that XML CAN NOT carry arbitrary binary, and EVERY
TIME you output something in XML, something in the system needs to run some
sort of encoding & illegal-character cleaning pass on the output text. The
moral equivalent of "<tag>$content</tag>" in your language is ALWAYS wrong,
unless you specifically processed $content into XML character content earlier.
This is true even when your _really sure_ $content is "safe". Even if you're
right... and statistically speaking, you're not... do it correctly anyhow and
call the right encoding function.

~~~
spdustin
I've dealt with vertical tabs and linefeeds by just Base64-encoding character
data that might include them before stuffing it into a CDATA node in the XML
doc.

It's a hack, sure, having to encode/decode all the time, but if you need to
store those characters, it's the only bulletproof way I've found.

~~~
jerf
I have to admit I'm still kind of split on whether XML made the right call
here. It's tricky with character encodings to allow arbitrary binary in the
characters, but something like CDATA could have permitted it, perhaps with a
shell-like specification of a terminating byte sequence, or even with a
UTF-8-style prefix number that indicates the length. This sounds great to me
at first. But then I put on my security hat and consider what horrors would
transpire in the bowels of programs unprepared to handle binary or somehow can
be tricked during validation vs. parsing or any number of other nightmares one
could do with this, and I go back to neutral-at-best. (I'd go negative, but on
the other, other hand [1], a lot of these things are already happening as
people blithely stuff these things in to XML documents anyhow, standard or
no.)

[1]: No, _not_ gripping hand... that's only for when the third choice is the
dominant/default/obviously-correct-once-I-say-it choice.

------
halosghost
Actually, glibc's argp library (a more featureful alternative to getopt) uses
'\v' for a few things (though, not at all for what it is normally used to mean
:P).

------
lubujackson
\v makes a better separating character than commas or tabs since it won't
appear naturally in text.

~~~
kevin_thibedeau
RS is even better since it's, you know, the record separator.

