Hacker News new | past | comments | ask | show | jobs | submit login

That's unfortunately a very accurate summary:) Real estate data, traffic data, weather data, population demographics, stock prices, tweets - I've parsed all that and more. Every one of them was a giant Tsv (except finance ones, which were csv's because Excel). Say you purchase the database containing every single home sold/bought in California for past decade. That's 11 20gb Tsv's with 250 tab separated columns plus 1 data dictionary which tells you what each of the 250 columns mean.That's what Reology sells you - gigantic txt files with tabs, that are easy to handle with awk, cut, sed and more.



I could preface a lot of this with 'kids these days', but...

What you write is so true. So many large companies use text files to shuttle around data. I worked at one place that used pipe-delimited 10+GB files. It's not sexy, and using awk/sed/cut seems like a hack at first, then you realize that it works and it is the simplest solution to the problem.


awk, cut, sed and less




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: