

Google launches BigQuery - Analyze big data on the cloud - neya
http://cloud.google.com/products/big-query.html

======
RealGeek
I tried running some simple SELECT queries on the sample Wikipedia dataset of
a few GBs. Most queries took 5 to 10 seconds, which is not exactly fast.

Moreover, it is super expensive with their 'data processed' based pricing. It
actually costs more than $1 to run a single query on a database of 30 GB. So
it will cost $300,000 to power my analytics app with 10,000 queries a day.
This cost will further skyrocket if your database size is anywhere near a
terabyte.

~~~
haberman
Hi, I work on the BigQuery team.

Keep in mind that you only pay for the columns you query. The Wikipedia table
as a whole is 35GB, but if you only query one column, it might only need to
scan a couple of GB. If you can limit the columns you are querying, you can
save a lot of money.

If you have a high-traffic analytics app, it would probably make sense to
cache some of the materialized results, which are usually orders of magnitude
smaller than the source data. BigQuery supports writing its output to another
table, but it would probably be even faster for you to cache these results
client-side.

~~~
alecco
What about query latency? 5-8 seconds?

~~~
nl
5-8 seconds isn't a problem at all.

It's Analytics, not an online database for website authentication or
something.

It competes with datawarehouse solutions where the typical reporting models
involves submitting a job, waiting an hour and then getting a report.

I'd expect a lot of the most useful applications running on this will use
queries that take many minutes (if not hours) to complete.

~~~
mark_integerdsv
"...waiting an hour..."

Uhm... Ke?

I work in Business Intelligence and anything over 5 seconds to return a
typical (eg: year to date/daily sales) report is unacceptable by my standards.

Could you perhaps go I to more detail as to the definition of 'job' in your
post? Are we talking about giant year end actuarial runs here or something
like that?

~~~
nl
No, I'm talking about ad-hoc reports over large datasets.

For example, find the original source of all users who bought more than 6
different items over any 6 week period during the last 5 years, then find
every web page loaded by IPs form the same subnet as those users in the same
time periods.

~~~
mark_integerdsv
Got it, makes sense.

------
hendler
Queries on hundreds of TB in seconds?

Anyone know what powers this? Is this custom SQL optimization on top of
BigTable and/or Map Reduce?

~~~
salsakran
Dremel, as described in <http://research.google.com/pubs/pub36632.html>

~~~
alecco
Cool. Columnar, read-only, nested records.

------
3amOpsGuy
The ability to JOIN (even if it is limited to 8Mb) is pretty useful for a
couple of specific use cases we came up against recently.

It can reduce the disk (& therefore more importantly, cache) space
requirements of the materialised views you otherwise have to maintain with a
product like Cassandra (which is still ACE! IMHO).

[https://developers.google.com/bigquery/docs/query-
reference#...](https://developers.google.com/bigquery/docs/query-
reference#smalljoin)

------
curiousfiddler
Considering they started it, there is still more scope for Google to leverage
Map-Reduce paradigm and probably build more products around it like the Hadoop
ecosystem has done. This looks quite useful to start with.

~~~
yaroslavvb
This is not MapReduce though, rather an execution engine specialized for data
analysis: <http://research.google.com/pubs/pub36632.html>

------
wslh
How does it compare with <http://www.vertica.com> ?

