
Ask HN: How to run queries on 20B clicks log - xstartup
I have a click log with 20 Billion clicks. I want to perform a group by, count, sum, average queries on this data set and make it available to all the customers. Each record includes Browser name, Country, Carrier, Campaign ID. Queries include a) Get the number of clicks from X to Y date, for each specific browser. What options do I have?
======
andrewgrowles
BigQuery would probably be the easiest, low maintenance approach. Normally I
would leverage the Hadoop ecosystem for this type of ad-tech/analytics
problem, but I am not 100% certain of your use case, budget, etc. Providing
clients read-only SQL access would be much better than a REST API.

------
o1lab
You mention about making it available to customers - are you looking for APIs
?

[https://github.com/o1lab/xmysql](https://github.com/o1lab/xmysql) \- one
command to generate rest api for any mysql database in seconds. (shameless
plug - something i've been workng on)

This supports grouping, count and where as query params of api. I'll be happy
to help - pls raise an issue if you get stuck. Thank you.

~~~
andrewgrowles
This is not going to perform well with this amount of data. HTTP Response
times for queries are likely to timeout. Needs a job-based queue.

------
ecesena
I'd go for Druid + Pivot (or BigQuery if you happen to be in GCloud already).

[http://druid.io/](http://druid.io/)

[https://docs.imply.io/pivot/](https://docs.imply.io/pivot/)

~~~
xstartup
Have you actually deployed this in production? Last year, we had lots of issue
with Druid. Do you know about clickhouse?

~~~
thisone
can you write briefly about the issues you had with Druid? We've been looking
at bringing it into out stack for our OLAP queries and aggregates and would
appreciate a war story.

