

Tenzing: A SQL Implementation On The MapReduce Framework - motter
http://research.google.com/pubs/pub37200.html

======
matclayton
Isn't this a Google version of Hive, which was open sourced by Facebook and
provides an SQL style syntax to Hadoop. Queries aren't quick, it just allows
offline data crunching to be coded quickly with out users having to code lots
of map reduce. Cool concept but dont expect to see the online part of web apps
powered by this.

~~~
wyuenho
Well this is pretty much a given isn't it? It's just a SQL implementation. It
doesn't say anything about the underlying storage and guarantees. Given that
it runs on GFS and Bigtable, unless these tech supports ACID, don't expect
Tenzing to be able to support it either. Here's a quote from the paper:

    
    
        Tenzing is not ACID compliant - specifically, we are atomic, consistent and
        durable, but do not support isolation.

------
civild
It would be interesting to know the identity of the vendor for "DBMS-X". I
work in the "enterprise" data warehouse space and I'm trying to advocate
moving away from "database appliances" towards distributed computing, and
having a quotable source from Google would be very compelling.

~~~
pwang
I'm curious - what sorts of work do you do in the data warehousing space? Do
you work as a consultant, or as an implementor at a customer of data warehouse
products?

It seems to me that that whole industry (DW & ETL) is a dinosaur whose lunch
is about to get eaten by some upstarts.

~~~
jaylevitt
I've read a few books on data warehousing, and maybe you can confirm my
suspicion:

Isn't ETL just an acronym that means "I wrote this Perl script to populate the
database"?

How on earth is that even an industry?

~~~
karlmdavis
Simple ETL jobs are mostly just E & L: extract the data from one system, load
it into another.

Where things get complex is in the Transform aspect of some jobs. Mapping
disparate schemas is complex, often messy work. Especially when one (or both)
sides of the ETL job have poor/no primary keys, foreign keys, or even are just
"mostly standard" CSV files [shudder].

Also: some ETL jobs can get quite large. I know one guy who had to create an
ETL system that continuously moved data from one 1200-table system into some
other system. Crazy.

------
willvarfar
(Related article the likely prompted this link, but has since fallen off the
HN front-page just in time for the American audience is:

<http://news.ycombinator.com/item?id=3503866> )

------
sigil
How does this relate to Dremel? I thought Dremel had a SQL frontend to
MapReduce that was already in wide use at Google.

<http://research.google.com/pubs/pub36632.html>

~~~
benmccann
Dremel is mostly used for SQL-like queries in logs processing while Tenzing is
largely used to run SQL-like queries on BigTable.

------
JoachimSchipper
> Tenzing is currently used internally at Google by 1000+ employees and serves
> 10000+ queries per day

So that's 10 queries/employee/day. That screams "experimental". Still, this
would be very nice.

~~~
willvarfar
that doesn't say how _big_ these queries are

and this was quite a while ago

