
Columnar Storage - mattyb
http://the-paper-trail.org/blog/columnar-storage/
======
nwatson
I worked on a columnar database engine deployed on commodity Linux clusters in
2001, initial release 2002. This was at Sensage. We built a very fast DB
engine that could store log data with 40x compression over traditional RDBMSs
and ran log queries much more quickly than commercial DBs could at the time. I
think we were too early.

Current Sensage company blurb about the event-data warehouse:
<http://sensage.com/content/clustered-columnar-database> and
[http://sensage.com/content/why-columnar%E2%80%A6not-row-
base...](http://sensage.com/content/why-columnar%E2%80%A6not-row-based)

Patent work: <http://www.patentgenius.com/patent/7024414.html>

The core engineering team was CTO + 3 engineers. Best engineering experience
of my life. I wasn't involved at the lowest DB storage level, the guys who did
that did a great job.

Michael Stonebraker, technical advisor to Sensage, learned from the Sensage
mistakes and built Vertica.

~~~
clouderashiring
Sounds like Impala could use your talent. You should talk with them.

~~~
nwatson
Thanks for the pointer! Looking at Cloudera's jobs board, most tech-heavy work
is in SF or Palo Alto. I live in San Mateo, convenient to both of those, but
... I'm likely leaving for the North Carolina RTP area soon.

~~~
hammerbacher
Cloudera has an office in RTP (Raleigh) as well:
[http://www.cloudera.com/content/cloudera/en/about/contact-
us...](http://www.cloudera.com/content/cloudera/en/about/contact-us.html)

------
hkmurakami
When I hear columnar storage, I can't help but remember all the stories my
friends have told me about kx systems (I'd love to share them but my memory is
so fuzzy that I wouldn't be doing them justice. Hopefully some of the HNers
here can share some of theirs though!) [1][2].

[1] <http://kx.com/>

[2] <http://en.wikipedia.org/wiki/K_(programming_language)>

~~~
gruseom
Get your friends to join HN and tell us?

------
wazoox
Everything old is new again. I remember considering using Sybase IQ Multiplex
(columnar engine) back in 2000 in my startup. Just like "nosql" was all the
rage a few years ago, bringing memory of the Pick databases of yore.

------
vinkelhake
Google's Dremel[0] also belong in this category.

[0] <http://research.google.com/pubs/pub36632.html>

------
JoachimSchipper
((Morcane and diwa, you are [dead]. Consider creating a new account.))

------
toast0
possibly dumb question; what's the difference between columnar storage and a
bunch of row-major tables with an id column, and one column of data?

~~~
grundprinzip
The biggest advantage of a pure column-oriented DBMS comes from having the
positional information indirectly available without the requirement to store
this ID information. During query execution the required position lists can
then be generated.

In addition. If you considere the record format of traditional row-oriented
databases you will see that the overhead of storing a single attribute record
is rather high. Since with column-oriented DBMS its all about IO performance
(Disk/Memory, Memory/CPU) such overhead can diminish the advantage.

Thus typical column stores tend to use only single strings of sequential
memory to store the data. This can even be enhanced by applying dictionary
compression and as a result only storing integer values. And modern CPUs are
good in processing lots of them.

------
toufique
Is this effectively a read-write inverted index?

