I've used \v support, or lack there of, to fingerprint web devices during security audits.
Whitespace is often defined as space, \t, \r, \n, and \v. However many specs, like HTTP, will sometimes exclude \v. Depending on the underline functions products use in their HTTP parsers, you can fingerprint servers, WAFs, proxies, load balancers, whatever when using \v to separate HTTP headers lines or name\value pairs
> Here's my simple plea: stop it. Stop mentioning vertical tabs in tutorials and language references.
I'm failing to see why this is an issue? The examples tutorials/references he gave only mention '\v' in tables of possible escapes. What is the downside to mentioning it, people spending 5 minutes googling for what a vertical tab is?
Unless a language removes support for escaped string literals entirely it seems odd to remove support for a particular standard, if mostly unused, escape sequence.
When I was a very green coder I saw the vertical tab in one of those tables and spent a fruitless afternoon trying to use it to align some text. I don't even remember what I thought it did but the afternoon could have been better spent if anything anywhere had said it does absolutely nothing on any modern system.
So the harm is that people see vertical tab and without the associated historical context come up with their own ideas of what it does and then waste time.
If you do include it at least show it in such a way that people know it's a historical oddity with really nothing but very obscure uses like the fingerprinting example mentioned elsewhere in the comments.
Reading https://en.wikipedia.org/wiki/C0_and_C1_control_codes now and getting this urge to write a pure ASCII editor, haha.. Seriously, it's actually suitable for more advanced editing than modern day "pure text" editors.
SOH + STX to set a document title. FF for page breaks. DLE to allow embedded (binary/uninterpreted) data like images and avoid printing garbage. FS, GS, RS, US to support tables.
I find it interesting that this coincidentally works complementary to syntax like Markdown. :p
Computing is full of crappy re-inventions of things that already existed. For instance, many old style terminals could do fields and validation of those fields.
The world is spiraling on itself so much it's crazy. Accepting it is one big step toward happiness.
Talking about old terminals. I worked at a tax office, around 2005, the web was getting trendy, so old AS400 applications were on the way out. I had to use them just before they were replaced by whatever webapp was coming. It was one of the best user experience I had. That old system was so on-point. It's funny because AFAIK terminal code had close to no structure, it parsed fixed patterns on screen buffers, pretty archaic. But, the software was hyper ergonomic, responsive, simple and solid. It did infer/complete lots of fields, find non trivial issues and suggest corrections. I could barely imagine the amount of regression that were about to hit the employees when the html/js version would land, it did make me really sad.
I really hate the move to webapp everything, at my workplace we used to have this great terminal interface as a server database, it was easy to script with, automate with, and use. Need a server? Just query it in terminal for data.
Now we have this shite webapp that only runs properly in IE, it's unusable for any sort of mass-querying, and it takes a good 10 or so clicks on buttons that resize themselves to find out all the info about a server, whereas the terminal program was just "ServerLookup X".
I too. I'm not against newness, not even some regression. You gain things, you may lose some, but right now it seems no gain only pain. All so that you may evolve it, so that it's css3-cute. Nowadays at least you'll have responsive layouts to avoid mobile/desktop redundancy. But in 2005 there was none of that.
I remember an article in Wired about an (old?) company still using AS400 or punchcard in their accounting branch and resisting the migration. Can't find it at the moment though :(.
I wonder why these are no longer used. There is the obvious visibility issue, but tab delimited files are still quite common. I wonder if it was because languages (particularly C, and all languages which copy the C convention) don't provide escape characters for them?
Overloading whitespace is fine for machines, but it doesn't improve human readability. In The days when memory and storage were small and runtime compression would crush throughput saving a byte here or there made sense in the mainstream. But these days? The critical reasons to use text tend to be either hitting an existing interface or human readability, and with 1GB of RAM about the price of a Happy Meal, the risks of invisible textual complexity probably outweigh any benefit.
Cynical me would say it was used a lot at first, but then ASCII became something of the past, and some company doing data work started using commas, it got trendy and ASCII RS went into the closet.
Wikipedia suggest CSV was already around at the time of punched cards, I guess people would prefer commas over some obscure RS code.
Visible delimiters are easier to understand when folks look at the files later -- they are essentially self-documenting. Whitespace takes more work to decode.
Comma-separated values is a data format that pre-dates personal computers by more than a decade: the IBM Fortran (level G) compiler under OS/360 supported them in 1967.
CSV isn't something computer specific, it's basic human grammar, CSV probably just put a tag on a common practice.
Is that a vertical tab, or a line break? It seems much more consistent with the behavior of <br> in HTML documents (i.e. new line but not new paragraph/bullet/whatever).
In the XML PowerPoint format (.pptx), it's a line break (inserts an <a:br> element between the two lines). There's no vertical tab character saved out to disk (who knows what PowerPoint does internally, though).
I don't have the time or the initiative to figure out what the old .ppt format does here.
I always understood this functionality as 'soft return' versus a 'hard return'. One adding a line break (like <br> in HTML) and the other starting a new paragraph (like <p>) with its own indent/margin/spacing rules.
After we remove support for vertical tab can we drop octal literals from every language? As fun as it is to snicker at a young programmer who spent 2 hours debugging
There are often use cases for many of the ASCII control characters, which is why the escape sequence feature is there.
It would be odd to support an escape sequence feature for every ASCII control character _except_ vertical tab, or to support it but leave it out of the docs.
It's just coming along for the ride with the general escape sequences for ASCII control chars features.
What's the harm? In any software project useless shit accumulate because it is much easier to add features than to remove them. Support for one useless feature such as \v will not make a codebase into a mess of spaghetti, but support for enough useless features like \v absolutely will.
The burden for \v is not zero. Every programmer working on the string escapes part of the code has to read and understand the lines that implement it. And it has to be tested and documented and if your documentation comes in multiple languages, translators have to spend time translating text for a completely useless feature.
Writing software is like writing a book of code for other programmers to read. An author wouldn't leave in meaningless chapters in the book it is writing because "what's the harm?" and neither should good programmers.
While your argument holds generally, let's remember that unless the software is horribly architected the difference between escape-everything-but-vertical-tab and escape-everything really should be trivial.
There was a Python web framework which used \v to separate between code and HTML.. Does any of you remember the name?
EDIT: found it, it's called Aspen (http://aspen.io/simplates/). They actually were using form feed (^L or \f), but apparently have switched to an ASCII combination to separate code from presentation.
> If I could stealthily patch the compiler for any language supporting the "\v" escape so I'd receive mail whenever it occurred in source code, then I could trace actual uses of it. I'm willing to bet that all the mail would come from beginners trying to figure out what the heck "\v" actually does, and then giving up when they realize it doesn't do anything.
Whilst it's true that I've never actually used `\v`, I have included it in code before to cover genuine, necessary edge cases... For example:
Is \v being mentioned in specifications really causing a problem? I've never seen someone mistakenly use a vertical tab in code where they could be using something better.
Maybe it's useless, but I wouldn't say it's harmful. This is a pretty overzealous rant over nothing: it's almost as useless as the vertical tab.
"In practice, settable tab stops were rather quickly replaced with fixed tab stops, de facto standardized at every multiple of 8 characters horizontally, and every 6 lines vertically (typically one inch vertically)."
You could try testing this.
As for removing it, I'm not convinced that it's worth the effort to; it's basically a single case in an escape-handling switch. The article he links to in the first line can basically be summarised as "I don't understand escaping and want to replace it with something even more complex".
I think the article he links to can basically be summarised as "Escaping considered harmful, here's a better way to solve the same problem". I don't see any evidence that he doesn't understand escaping, and his claim that eliminating it would be a net win seems plausible (at least) to me.
I don't think what he proposes is really a better way, because requiring compilers to comprehend string concatenation and a few more extra reserved words specifically for the characters is a more complex and less general solution than encoding using the string itself, which is what escaping does.
Escaping is amazingly elegant once you realise how general and simple it is, and it's also very important to understand it when designing things like data formats and protocols (length-delimited fields are the best, but it is not always possible.) Ignoring escaping, which is what would otherwise occur, tends to cause rather horrible security issues.
I guess different people can look at the same thing and perceive an elegant solution OR an ugly mess. Note that the most important data protocols don't use escaping. I'm probably biased after years of DOS/Windows programming and having to remember to do things like "C:\\Users\\Bill\\documents" etc.
I am surprised that no one has commented on vertical languages. I am learning Japanese and I can see how this would be really useful in any vertical writing languages.
Had my first encounter with \v doing an import of a legacy database just a few weeks ago. The data was passed on to us in a batch of XML-files. For some reason our XML parsing library would just ignore the rest of the file when it came to the \v character. Took me some time to find the culprit.
Edit: The \v character had somehow made it into one of the descriptions for one of the user profiles.
But you should have gotten an error, of course, not the silent truncation you imply.
If you need to salvage the character, your XML library may let you specify it as �b;. That is still a violation, but a lot of libraries seem to let it through: http://www.w3.org/TR/REC-xml/#sec-references (see "Well-formedness constraint"... you are specifically not allowed to use this to do what I'm suggesting here).
Anyways, the moral here is that XML CAN NOT carry arbitrary binary, and EVERY TIME you output something in XML, something in the system needs to run some sort of encoding & illegal-character cleaning pass on the output text. The moral equivalent of "<tag>$content</tag>" in your language is ALWAYS wrong, unless you specifically processed $content into XML character content earlier. This is true even when your really sure $content is "safe". Even if you're right... and statistically speaking, you're not... do it correctly anyhow and call the right encoding function.
I've dealt with vertical tabs and linefeeds by just Base64-encoding character data that might include them before stuffing it into a CDATA node in the XML doc.
It's a hack, sure, having to encode/decode all the time, but if you need to store those characters, it's the only bulletproof way I've found.
I have to admit I'm still kind of split on whether XML made the right call here. It's tricky with character encodings to allow arbitrary binary in the characters, but something like CDATA could have permitted it, perhaps with a shell-like specification of a terminating byte sequence, or even with a UTF-8-style prefix number that indicates the length. This sounds great to me at first. But then I put on my security hat and consider what horrors would transpire in the bowels of programs unprepared to handle binary or somehow can be tricked during validation vs. parsing or any number of other nightmares one could do with this, and I go back to neutral-at-best. (I'd go negative, but on the other, other hand [1], a lot of these things are already happening as people blithely stuff these things in to XML documents anyhow, standard or no.)
[1]: No, not gripping hand... that's only for when the third choice is the dominant/default/obviously-correct-once-I-say-it choice.
Actually, glibc's argp library (a more featureful alternative to getopt) uses '\v' for a few things (though, not at all for what it is normally used to mean :P).
Whitespace is often defined as space, \t, \r, \n, and \v. However many specs, like HTTP, will sometimes exclude \v. Depending on the underline functions products use in their HTTP parsers, you can fingerprint servers, WAFs, proxies, load balancers, whatever when using \v to separate HTTP headers lines or name\value pairs