

Column datastores ... like making the finest buggy whip in 1920 - ableal
http://drcoddwasright.blogspot.com/2009/03/next-revolution-and-alternate-storage.html

======
fizx
Sounded crackpot-ish to me. Anyone know wtf he was talking about?

~~~
ableal
Far from 'crackpot-ish'. The author belongs to the hardcore DB-side hacker
party. He's putting forward the idea that SSDs (solid state disks), by
upsetting the memory hierarchy applecart, may have deep effects on our work
with storage.

(apologies for picking a provocative quote for the title, but it seemed to me
that the front page was lacking its recommended daily dose of hacking.)

~~~
AlisdairO
He's right, in some ways. Long term, SSDs are going to have a massive impact
on DBMS technologies. Disk latency is the dominant performance inhibitor for a
lot of DBMS applications. The area I'm working in, RDF triple stores, involves
large joins and relatively small amounts of total data transfer, resulting in
a lot of disk seeks per unit data transferred. In this area, SSDs will be a
huge leap from disk techs.

He's missed the point by targeting column stores, however. Column stores are
primarily aimed at data warehouses, which (generally) have the following
characteristics:

\- _Extremely_ large amounts of data stored.

\- Queries tend to involve very large, unpredictable scans (aggregations,
etc).

\- Since queries are relatively unpredictable, there's a limit to the
effectiveness of caching.

While SSDs can have blistering transfer rates and are very suitable for large
scans, they're still vastly more expensive per gig than conventional disks.
The kinds of queries being done are good for disks: they involve long
continuous reads, so disk's primary disadvantage of slow seeks is not so
relevant here. Column stores are aimed at this market by aiding faster
effective transfer of data, since you're only reading data from the columns
you're interested in off the disk. There's other claimed advantages like
better compression ratios, too.

Data warehouses are one of the primary growth areas for the DBMS market at the
moment. There's a bunch of small players, who all have a chance to grow
because the major heavyweights like Oracle and MS don't have overly strong
products in the field - although that'll no doubt change with MS's acquisition
of DATAllegro, one of the aforementioned small players. I'm sure Stonebraker
and co will be keeping an eye on SSDs, but they're safe for some time to come
yet.

edit: formatting

~~~
jsrn

        He's missed the point by targeting column stores,
        however. Column stores are primarily aimed at data 
        warehouses, which (generally) have the following
        characteristics: [...]
    

He is not missing the point. Quote from the article:

    
    
        "My conclusion is, on the whole, no. The column
         store, when used to support existing petabyte
         OLAP systems may be worth the grief, but for 
         transactional systems, at which the TRM is
         aiming and from which column stores would extract,
         not so much."
    

OLAP = Online Analytical Processing = Data Warehousing, i.e. he primarily
questions the use of column stores for transaction processing (OLTP) and
recognizes the usefulness of column stores for large data warehouses.

~~~
AlisdairO
Then he's criticising the applicability of column stores to a mode of use for
which they're not designed. While there have been suggestions that correctly
implemented column stores might provide acceptable OLTP performance, I don't
see many people advocating them as a replacement for Oracle, DB2, or MSSQL.
That seems to me to be missing the point, rather.

" _My interest is this: given the Next Revolution, do either a TRM or column
store database have a purpose?_ "

His post minimises the significance of data warehousing when it's a
significant growth area, particularly for new players in the market. Column
stores are applicable to DWs much smaller than petabyte scale - and indeed, I
haven't seen any figures that suggest that Vertica has been scaled up that far
yet.

He's also ignored the impact of physical storage layer design on memory
organisation and CPU cache performance. The use of SSDs is not a panacea that
renders physical storage layout irrelevant.

edit: quotes, and the realisation that 'minimalises' is not, in fact, a word.

~~~
jsrn

        Then he's criticising the applicability of column stores to a mode of use for which they're not designed.
    

ah, ok. That makes sense, agreed.

Here is question to you - not directly concerning the article - If we talk
about all those new (or not so new) hash databases (a.k.a. "key/value stores"
like Amazon Simple db, Google's equivalent which they make available with
Google App Engine, Berkeley DB etc.) _and_ if we talk about OLTP: I have been
thinking lateley that those stores that often require the programmer to
denormalize and trade relational features for performance could soon become
largely irrelevant SSDs, and soon (OLTP databases are often not _that_ big in
my experience [compared to OLAP databases], many should fit into an SSD
today). In contrast to column stores those key/value stores are explicitely
marketed for transaction processing (if a RDBMS doesn't scale enough).

As I understand it, most of the time performance and scaling problems with
relational databases result from the database being disk bound - a problem
that should largely vanish with SSDs.

Do you agree?

~~~
frig
Yeah, that guy's writing is muddled and obscure, but I think your points are
closer to (what I think is) his intended point:

Column-oriented dbs are essentially a "heroic engineering" way to optimize a
datastore for a particular workload -- continuous-read-heavy-batch-processing
-- by engineering around the performance peculiarities of traditional hard
drives.

Clearly this is a sensible strategy for optimizing a datastore for certain
workloads: at the moment that strategy delivers material performance
improvements in the scenarios it's designed for (material enough that if the
performance of your system on those workloads is economically important to
you, it's worth the cost to build or buy a system that'd significantly speed
things up).

What I think he's saying is that the rise of ssd may make this engineering
effort essentially useless outside of a handful of niches (essentially, the
niches reduce to: data volumes too large to economically fit into ssd within
the foreseeable future):

\- in a storage medium with heavy seek times, the engineering effort to
implement a column store (instead of just using an off-the-shelf rdbms) can
pay off in a somewhat broad range of usages

\- in ssd-ish storage media, the engineering effort isn't going to be worth
the benefit, most of the time, compared to just using a stock rdbms

The mention of the TRM fits into this picture of his intended claim.

If you're not familiar the TRM is a mystery shrouded in an enigma: a bunch of
big, credible names in the database world claimed have invented a radical new
way of implementing the backend of a relational database that would've offered
radically better performance characteristics (essentially it made joins so
unbelievably 'cheap' that it was no longer necessary to denormalize for
performance; supposedly the more-normalized you went the better TRM would
perform).

The issue with the TRM (transrelational model) is that:

\- the supposed core concept is patented, but doesn't explain _en toto_ how
it'd work (for obvious reason)

\- there's a ton of secrecy and ndas and so on surrounding anyone and everyone
who got a good glimpse of the full picture -- the core inventors seem
extremely protective of their ip, to the point pretty much nothing material
has leaked about it's supposed workings

\- the company that was supposedly doing the first commercial implementation
folded, ostensibly for non-technical reasons but again it's so secretive no
one really knows what happened

So it's a big mystery. There's basically a couple schools of thought on it:

\- it does actually work, but a comedy of errors / business climate /
personality conflicts / whatever have prevented it from either being
commercially implemented or from having a fuller picture of its workings
disclosed publicly. Things have been quiet since then due to the
protectiveness and penchant for secrecy on part of the principals.

\- it looked good on paper, but in doing the actual implementation some
unavoidable complication turned up that prevented it from obtaining the needed
performance (either at the time -- 2005ish -- or forever). The big names
associated with it have kept quiet since then partly out of embarrassment
(publicly endorsing a flop, kinda like hawking endorsing a free energy
machine) and partly again out of concern for the principals' protectiveness

\- it was some kind of hypey thing that blew up in their faces; essentially a
belief they could attract funding and customers by virtue of their reputations
and claims of a revolutionary approach, combined with a belief that they could
do an awesome-enough imlpementation of a non-revolutionary datastore approach
-- basically do a best-practices, clean-room build of the current state of the
art -- that no one would be the wiser

I tend to think the second option is the likeliest story.

All of that is a long windup for a very quick pitch:

Assuming the TRM wasn't just bunk it would then be one of two things:

\- heroic engineering to bring revolutionary performance gains to systems
built around disk-based storage

\- some kind of heroic datastore engineering that'd work better with a more
ram-like storage system (eg ssds)

That's a bit of a non-answer answer, but the connection to his main line of
reasoning is something like:

\- if it's the former, then it's another example of heroic engineering made
irrelevant by ssds, as even a 'traditional' rdbms can be made similarly
performant on an ssd without all that effort; this is my take on the author's
opinion

\- if it's the latter, then maybe there's something to be gleaned from it; I
don't think this is what the author thinks

So, yeah: from reading the rest of his blog (posts are wordy, but there's not
that many of them) I think he's got the following idea:

\- implementing a traditional rdbms is hard, but mainly b/c of all the work
you have to do to make it not perform like a dog under

\- ssds radically shake up the assumed performance contours of your persistent
storage, enough so that a lot of the specifics designing an rdbms for
performance might change

\- additionally, this guy has the impression this implementation might be
"easy"; that is, a lot less work to do compared to writing a traditional rdbms
from scratch

There you have it, I think.

~~~
neilc
_Column-oriented dbs are essentially a "heroic engineering" way to optimize a
datastore for a particular workload -- continuous-read-heavy-batch-processing
-- by engineering around the performance peculiarities of traditional hard
drives._

Another way to view this is simply that column stores are a more appropriate
storage technique for read-intensive workloads on magnetic disks. When you
characterize column stores as "engineering around" the "pecularities" of HDDs,
you make it seem like column stores are a workaround and row stores are the
"natural" approach, which I don't think is the case. Implementing a column
store from scratch is not significantly harder/easier than building a row
store from scratch, AFAIK (albeit there is probably more expertise on how to
do the latter).

 _the rise of ssd may make this engineering effort essentially useless outside
of a handful of niches (essentially, the niches reduce to: data volumes too
large to economically fit into ssd within the foreseeable future)_

Given the daunting rise of data volumes in the data warehousing market, that
makes for a pretty big niche. It will be a _long_ time before SSDs are cost
effective for the DW market, I think.

~~~
frig
There's something of an argument for naturalness wrt rows-versus-columns, but
it's not conclusive.

When it's developed it's usually stated as that a column store encompasses a
multiplication of metadata (eg: suppose each row has some kind of row id; you
want to support lookup of the value in a given column for a given row; thus in
a rowstore you minimally only need one lookup aide (to get you to where that
row is) to handle a lookup for any column, but in a column store you arguably
need one lookup aide per column (to tell you how to find the point in that
column's column store where the value for that row is).

It's never been an amazingly compelling argument to me, either, but there it
is; the whole thing strikes me as 'semantical confusion', as what we're really
talking about (row-vs-column) is a consequence of not taking the relational
model to its logical conclusion:

\- a "table" like this:

the table T with row = (unique #, column A, column B, column C, ..., column )

\- is really a materialized view of this view (V):

table_A has row (unique #, column A), ...,table_N has (unique #, column N);

V = join table_A,...,table_N on 'unique #' in the obvious way

...and a column-oriented data store is just a DB that stores data in a more-
fully normalized form (often with tricks to minimize or eliminate the need to
include the unique# in some or all of the table_Is, keeping just some sort of
index).

OR: the distinction between row and column stores -- at a high enough level of
abstraction -- boils down to questions of how fully-normalized the physical
storage's data model is vis-a-vis how normalized the user-facing data model
is.

In the hard-disk world your choice of physical layout matters a lot; it's
possible the evolution of ssds will make the difference between 'row' and
'column' orientation a lot less material (as we've already talked about).

The really crazy thing you could consider doing is go one step further and do
something like:

say T = (unique #, customer_name, library_size (integer)). (library size is
like 'how many books does this dude own')

Step 1:

T -> V like above ( table_a = (unique #, customer_name), table_b = (unique #,
library_size))

Step 2:

\- let table_c = (unique_library_size #, library_size), constructed from
table_b as basically 'select distinct library_size into table_c'

\- then let table_b' = (unique #, unique_library_size #)

(and a similar transform for table_a, but we'll stick with table_b for now)

Doing this is crazy talk on a disk-based system: you're adding a lookup, but
for what?

But if lookups are essentially free, this has the potential to heavily cut
down on the amount of data you need to store (in cases where you have N rows
but only K << N distinct values) even before you start applying the obvious
compression techniques to the stored data.

As datasets grow very large, it'll often be the case that we have K << N; this
won't be the case for, eg, google's index of web page contents, but for
something like 'how many cases of X did walmart W sell on day D' or 'how many
billable seconds was call #1234567 on date DDMMYYYY' it's hard to imagine K
isn't << N much of the time for some of the columns.

I think that's what the author's trying to very obliquely get at; petabytes
might be a stretch, but with sufficiently-cheap lookups a lot of compression-
by-indirection may become feasible, which might let you _really_ reduce the
amount of persistent storage you need, which'd make ssds extend into workloads
you might expect to remain out of cost-effectiveness for much longer.
Speculative; heck yes.

~~~
neilc
_if lookups are essentially free, this has the potential to heavily cut down
on the amount of data you need to store_

Assuming that the size of unique_library_size # is smaller than library_size.
That's unlikely to be true for this example, but I get what you're driving at.
However, column stores basically do this already: for example, you might store
the library_size column ordered by size, and then compress it using either RLE
or differential encoding (data item i is stored as a delta against data item
i-1).

~~~
frig
Oh agreed, (I was assuming I could easily use eg 20 unpadded bits or whatnot
for unique_library_size # but not necessarily get away with it easily for
library_size; when that's the case it's always true that unique_library_size #
needs less storage than library_size).

Edit: to correct...it's always true that unique_libary_size # needs less
storage than library_size except under very unlikely circumstances
(...circumstances which get ruled out anyways if you do a more-granular
calculation for when this strategy would make sense to consider).

------
joshu
Wow, he's wordy.

~~~
iigs
I'm having one of those moments where I can't tell if the guy is really smart
and "above my pay grade" or if he's stuffing his sentences with industry
terminology in order to make everyone think he's a SME.

The crux of the argument seems to be that SSDs and something called TRM will
eliminate the need for column-oriented databases. Even then he softens the
argument by special-casing "existing petabyte OLAP systems". To me this
argument abstracts to "${advanced_technology} will eliminate need for
${good_design}", which is almost never a winning bet when taken in absolute.
After the "except for really big systems" softening it abstracts to
"${advanced_technology} will eliminate the need for ${good_design} except for
${unusual_workload}", which is basically the personal computer revolution in
one sentence.

------
vicaya
Column/hybrid datastore is about performance per dollar. He doesn't get it.

~~~
vicaya
The above statement lacks qualification. I meant for certain OLAP workloads,
where data size are in petabytes.

