

Efficient Tabular Storage - ah-
http://matthewrocklin.com/blog/work/2015/08/28/Storage/

======
SloopJon
It looks like the NYCTaxi dataset is here:

[http://www.andresmh.com/nyctaxitrips/](http://www.andresmh.com/nyctaxitrips/)

Some background on this data:

[http://chriswhong.com/data-
visualization/taxitechblog1/](http://chriswhong.com/data-
visualization/taxitechblog1/)

And data for 2014 directly from the city:

[https://data.cityofnewyork.us/view/gn7m-em8n](https://data.cityofnewyork.us/view/gn7m-em8n)

------
TheGuyWhoCodes
Vertica has all those performance enhancements, great DB can't recommend
because of pricing :(

~~~
beagle3
kdb+ answers same description.

And it's a 300KB executable with no dependencies (other than glib/MSVCRT).

------
owlish
How do databases like MySQL store data efficiently for querying? It seems like
something like protobuf would do well here, though you'd need to generate code
for each dataset.

~~~
kragen
Typically they use row-oriented binary storage, optionally with individual
columns or subsets of columns duplicated into indices for fast querying. Have
you tried protobufs? How many hundreds of megs per second do you get? I think
it is remarkably slow on the scales we're talking about here.

------
Little_Peter
How is it different from HDF5 (h5py and pytables)?

------
Ashim_Usmani
Great share

------
Sprint
This is super interesting, thanks!

