Hacker News new | past | comments | ask | show | jobs | submit login

Storing as parquet, and using hive path partitioning, you can get passable batch performance assuming you queries are not mostly scans and aggregates across large portions of the data. On the order of ten seconds for regex matching on columns for example.

I thought it was a great way to give analysts direct access to data to do adhoc queries while they were getting familiar with the data.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: