Ask HN: How to run queries on 20B clicks log

andrewgrowles · on Nov 12, 2017

BigQuery would probably be the easiest, low maintenance approach. Normally I would leverage the Hadoop ecosystem for this type of ad-tech/analytics problem, but I am not 100% certain of your use case, budget, etc. Providing clients read-only SQL access would be much better than a REST API.

o1lab · on Nov 10, 2017

You mention about making it available to customers - are you looking for APIs ?

https://github.com/o1lab/xmysql - one command to generate rest api for any mysql database in seconds. (shameless plug - something i've been workng on)

This supports grouping, count and where as query params of api. I'll be happy to help - pls raise an issue if you get stuck. Thank you.

andrewgrowles · on Nov 12, 2017

This is not going to perform well with this amount of data. HTTP Response times for queries are likely to timeout. Needs a job-based queue.

ecesena · on Nov 10, 2017

I'd go for Druid + Pivot (or BigQuery if you happen to be in GCloud already).

http://druid.io/

https://docs.imply.io/pivot/

xstartup · on Nov 11, 2017

Have you actually deployed this in production? Last year, we had lots of issue with Druid. Do you know about clickhouse?

thisone · on Nov 12, 2017

can you write briefly about the issues you had with Druid? We've been looking at bringing it into out stack for our OLAP queries and aggregates and would appreciate a war story.