Don't do this. Tsv has won this race, closely followed by Csv. Anything else wil...

radicaledward · on March 26, 2014

I was surprised to see you list tsv as more common than csv. I encounter csv's on a pretty regular basis, but I don't think I've had to parse a tsv in the past 3 or 4 years. As a junior web developer, I don't have much experience though. 9 times out of 10, the csv is coming from or going to Excel, or a system that was designed to support Excel. If you don't mind my asking, what types of data do you regularly work with that are in tsv format?

l-p · on March 26, 2014

Your comment disturbs me a little… One of my gripes with Excel was that it imported and produced TSV data by default when you asked for CSV.

shrikant · on March 27, 2014

Excel actually doesn't 'care'. It uses the record separator defined in your Windows "Regional Settings", and the defaults there differ for each system locale.

claudius · on March 26, 2014

TSV is nicer for output (on stderr/out or a logfile), so tends to crop op if you want to parse the output/logfile of something. I haven’t seen Excel in use at my workplace yet.

oneeyedpigeon · on March 27, 2014

Surely only if you're dealing with fixed-width fields.

claudius · on March 27, 2014

Floating point values with a given precision and some integers. We’ll have to buy a proper supercomputer before the latter take more than seven digits :\

haddr · on March 27, 2014

ohhm this is pain in the ass format. srsly everything can happen there...

eggie · on March 27, 2014

There tends to be less overhead in TSV. Unless you want to represent text that has embedded tabs it seems unnecessary. It works with standard *nix tools. Not a bad compromise and part of the reason that people whose standard "file" is 100Gb prefer it.

pyre · on March 26, 2014

You're telling the HN crowd not to do something because it might cause confusion and... disruption? Good luck! ;)

Istof · on March 26, 2014

TLDR: it's superior, but don't do it...

dxbydt · on March 26, 2014

That's unfortunately a very accurate summary:) Real estate data, traffic data, weather data, population demographics, stock prices, tweets - I've parsed all that and more. Every one of them was a giant Tsv (except finance ones, which were csv's because Excel). Say you purchase the database containing every single home sold/bought in California for past decade. That's 11 20gb Tsv's with 250 tab separated columns plus 1 data dictionary which tells you what each of the 250 columns mean.That's what Reology sells you - gigantic txt files with tabs, that are easy to handle with awk, cut, sed and more.

typicalrunt · on March 27, 2014

I could preface a lot of this with 'kids these days', but...

What you write is so true. So many large companies use text files to shuttle around data. I worked at one place that used pipe-delimited 10+GB files. It's not sexy, and using awk/sed/cut seems like a hack at first, then you realize that it works and it is the simplest solution to the problem.

ithkuil · on March 27, 2014

awk, cut, sed and less

npinguy · on March 26, 2014

It's simply too late. There was MAYBE a chance 20 years ago to push adoption of this into major text editors and spreadsheet software.

But now it's like harping on the benefits of HDDVD or Bluray.

tbrownaw · on March 26, 2014

It is strictly less expressive, because it can't handle nesting. This makes it inferior.

lowboy · on March 26, 2014

What would make one pair of ASCII characters (comma/linefeed or tab/linefeed) handle nesting any better than another pair (unit separator/record separator)?

tbrownaw · on March 26, 2014

Because CSV actually has three special characters. The field separator is ',' (or tab for TSV). The record separator is '\n' (or "\r\n"). And the quote/escape character is '"'. Commas or newlines which are part of a quoted string are data rather than control characters. Quote characters can be escaped by preceding them with a second quote character. There is an RFC that describes this.

You could use ESC (\x1b) to escape itself and either of your delimiter characters, but of course now you've gotten back all that complexity you were trying to avoid by using non-printable characters.

dlrush · on March 27, 2014

It's hardly too late. If you just look at the use case of application logging for example, the latest fashion is logging using JSON format log files, which is INSANE for a different set of reasons. I have servers that generate TB of log files on a regular basis. ASCII delimited log file format standard could be adopted by the application logging space, could result in some uniform tools that provide better streaming support for log shipping, and gain adoption in other adjacent use cases from there.

userbinator · on March 26, 2014

I think CSV is more common, since it allows for escaping whereas with TSV I don't believe there is any method of escaping.

I had to deal with a system using TSV once, with that "feature", and that point made it so that we had to do the escaping at some higher level, with \t and \\.

mbq · on March 26, 2014

Still, most *SV parsers can use arbitrary char or even regexp for a separator.

jmulho · on March 26, 2014

There is another: fixed width. It doesn't cause grief.

r00fus · on March 26, 2014

Sure it does - any change in supported field length requires a schema change.

You trade simplicity of parsing for rigidity of schema.

jokoon · on March 26, 2014

yeah, so many people use horses, don't make the horses angry