Hacker News new | past | comments | ask | show | jobs | submit login

TLDR: it's superior, but don't do it...



That's unfortunately a very accurate summary:) Real estate data, traffic data, weather data, population demographics, stock prices, tweets - I've parsed all that and more. Every one of them was a giant Tsv (except finance ones, which were csv's because Excel). Say you purchase the database containing every single home sold/bought in California for past decade. That's 11 20gb Tsv's with 250 tab separated columns plus 1 data dictionary which tells you what each of the 250 columns mean.That's what Reology sells you - gigantic txt files with tabs, that are easy to handle with awk, cut, sed and more.


I could preface a lot of this with 'kids these days', but...

What you write is so true. So many large companies use text files to shuttle around data. I worked at one place that used pipe-delimited 10+GB files. It's not sexy, and using awk/sed/cut seems like a hack at first, then you realize that it works and it is the simplest solution to the problem.


awk, cut, sed and less


It's simply too late. There was MAYBE a chance 20 years ago to push adoption of this into major text editors and spreadsheet software.

But now it's like harping on the benefits of HDDVD or Bluray.


It is strictly less expressive, because it can't handle nesting. This makes it inferior.


What would make one pair of ASCII characters (comma/linefeed or tab/linefeed) handle nesting any better than another pair (unit separator/record separator)?


Because CSV actually has three special characters. The field separator is ',' (or tab for TSV). The record separator is '\n' (or "\r\n"). And the quote/escape character is '"'. Commas or newlines which are part of a quoted string are data rather than control characters. Quote characters can be escaped by preceding them with a second quote character. There is an RFC that describes this.

You could use ESC (\x1b) to escape itself and either of your delimiter characters, but of course now you've gotten back all that complexity you were trying to avoid by using non-printable characters.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: