
Summarizing Data in SQL - elisebreda
http://blog.yhat.com/posts/summarizing-data-in-SQL.html
======
frik
> Percentile

> In MySQL, we can use local variables to keep track of the order (...) Here's
> the Postgres version.

I would be interested in the MySQL percentile snip.

------
d33
Nice! Does anybody know a project that combines this kind of techniques in a
program that could be ran across billions of rows, most preferably something
that scales to multiple nodes?

~~~
TheLogothete
MPPs do this. Azure Data Warehouse, Amazon Redshift, Google BigQuery.

~~~
vgt
Shameless plug:

[https://cloud.google.com/blog/big-data/2016/01/anatomy-
of-a-...](https://cloud.google.com/blog/big-data/2016/01/anatomy-of-a-
bigquery-query)

~~~
TheLogothete
Heh, I'm gonna use this opportunity to make my case then. Why is Google
Analytics data export available only to premium customers? I'm sure it's a
compelling feature to nudge some clients to upgrade, but allowing everybody to
export GA data to BigQuery would create quite a lot more revenue. BQ
consumption, VMs for R/Python/RapidMiner/Knime, databases, importing data from
other sources, which in turns means even higher BQ consumption, people moving
ERPs and other systems to be closer. I actually think that it will increase
the demand for GA premium too since it will make companies feed an ever
increasing volume of data to GA, requiring custom dimensions and more event
volume. GA is very useful, consolidating multiple sources of marketing data.
However this data is practically locked beyond retrieval. Hell, even charge
for exports, everybody will pounce on the opportunity. The current workflows
are very frustrating, requiring you to have multiple tools, locking your data
in silos.

------
alfanick
Cool article, but please update code snippets to some consistent codestyle :)

