

Dremel – Google's tool for analyzing trillion-row tables in seconds [pdf] - alexkon
http://sergey.melnix.com/pub/melnik_VLDB10.pdf

======
alexkon
Quick view:
[http://docs.google.com/viewer?url=http%3A%2F%2Fsergey.melnix...](http://docs.google.com/viewer?url=http%3A%2F%2Fsergey.melnix.com%2Fpub%2Fmelnik_VLDB10.pdf)

Abstract:

Dremel is a scalable, interactive ad-hoc query system for analysis of read-
only nested data. By combining multi-level execution trees and columnar data
layout, it is capable of running aggregation queries over trillion-row tables
in seconds. The system scales to thousands of CPUs and petabytes of data, and
has thousands of users at Google. In this paper, we describe the architecture
and implementation of Dremel, and explain how it complements MapReduce-based
computing. We present a novel columnar storage representation for nested
records and discuss experiments on few-thousand node instances of the system.

Overview by Greg Linden: [http://glinden.blogspot.com/2010/12/papers-on-
specialized-da...](http://glinden.blogspot.com/2010/12/papers-on-specialized-
databases-at.html)

------
joshu
Yahoo also had a tool for interactively querying night datasets. Dremel was
way faster... many orders of magnitude.

The ability to query and re-query and refine in real-time lets you learn so
much more about your data - the leverage is simply unreal.

------
jbeda
Dremel is the technology behind BigQuery. The talk from Google IO is
informative.

<http://code.google.com/apis/bigquery/>

------
whiletruefork
Having fast turn around on data of this size is immeasurably important and a
significant competitive advantage. I have worked on large search-engine
systems up until very recently and I can say without a doubt that turnaround
time is more important than overall dataset size.

Our not-so-awesome mitigations were to have strictly defined preprocessed
pipes to plug into. This meant having to either rigidly define queries or suck
it up and wait hours, if not days for results.

------
ygd
Dremel, Sawzall, what's next?

~~~
sliverstorm
I don't know, but I wonder if this may get them in a little hot water. Both
Dremel and Sawzall are trademarked.

I also don't particularly like the use of the words, I am very attached to my
Dremel and I don't see how it applies to processing a database :)

~~~
btilly
Trademark restrictions are tied to the field of use.

If Google was releasing power tools and calling them Dremel, they would have a
problem. But they can release a programming language called Dremel and are
likely to be legally in the clear.

(Or such is my non-lawyerly understanding. I am not a lawyer, this is not
legal advice.)

((The reason people say that about not legal advice is that if you give legal
advice and someone gets in trouble because of it - even if the advice was
correct and the person misunderstood - you are liable for the consequences of
that advice.))

~~~
Daishiman
But these are not products. They're internal code names and there's no chance
that Google will ever be releasing these pieces of software to the public.
Thus no possible trademark issues.

------
zzleeper
Interesting, but not very useful w/out code =)

~~~
mpk
The paper describes the algorithms, implementation and operation in several
real-world scenarios. The essential routines are described using pseudo code
in the appendix.

Not really seeing the problem here. This is actually more useful than a dump
of some source code.

------
ryanhuff
Its certainly tough to come up with project names that are unique these days,
but Dremel? Its a well-known consumer brand of tools in the US. Did Google's
Dremel name derive from something other than the tool brand?

Google's project is certainly not in the same market that the Dremel tools
brand, but is it ok for a company like Google to borrow from a brand name in a
different market?

~~~
zdw
Footnote on the first page:

    
    
       Dremel is a brand of power tools that primarily rely 
       on their speed as opposed to torque. We use this 
       name for an internal project only.

