

Databases 102: Row versus Column Orientation (2014) - brudgers
http://www.ibmsystemsmag.com/Blogs/You-and-i/Archive/db-102-database-orientation-row-vs-column/

======
jandrewrogers
While row-oriented and column-oriented storage models as laid out in the
article are classic database organization patterns, modern databases engines
often use _neither_ as their primary storage model. They still have their use
cases, pure OLTP for row-oriented and cold storage for column-oriented, but
many workloads these days have a high insert rate concurrent with light
analytic queries. This workload shows poor efficiency on either storage model.

Designing a general-purpose database engine today, I would probably use some
variant of the vectorized row storage model. Rows are stored within a single
page but are decomposed within that page into vectors, the organization of
which can be quite complex and may not require any significant row
reconstruction. Page sizes are usually larger than OLTP (64-256k versus 4-16k)
which makes it suited to modern disk storage.

Vectorized row storage often approaches the insert performance of row-oriented
and the light analytics performance of column-oriented if designed and
implemented competently. Column-oriented is often more space efficient for
cold data storage but depending on the use case may not be significantly
faster for queries.

Classic row-oriented and column-oriented designs are at opposite extremes of a
set of tradeoffs. Many newer database engines sit at neither extreme while
enjoying most of the benefits of both.

------
contingencies
This is a terrible article.

While arguably a single infographic would have sufficed, in terms of text
[https://en.wikipedia.org/wiki/Column-
oriented_DBMS](https://en.wikipedia.org/wiki/Column-oriented_DBMS) is far
clearer.

People who sit down at your table at the beginning of a database project and
want to make a binary decision about row vs column orientation in an off the
shelf database selection process have simply drunk too much koolaid. Why?

Firstly, performance should generally be optimized post-facto (with real,
measured numbers on real systems) not prematurely in etch-a-sketch land.
Commercial solutions providers are usually happy to bring their systems in to
do battle with the enemy for a lucrative contract, just make sure you give
open source a fair seat at the table as well.

Secondly, the real driver for performance is the frequency and type of queries
(both read and write) which are almost always unpredictable in the
architecture (read: 'immaculate conception') stage, being well ahead of real
use, and thus must be continuously re-evaluated as a system evolves.

Perhaps instead, a key factor within modern database engine selection is
actually the degree of parallelism and consistency required. See
[https://en.wikipedia.org/wiki/Consistency_model](https://en.wikipedia.org/wiki/Consistency_model)

A good general purpose strategy for 90% of cases is to start with ACID and
perfect consistency on a general purpose RDBMS, and relax your constraints
over time based on measurements of real world performance as your use case
becomes more clearly defined. You can usually switch out an RDBMS for a NoSQL
solution fairly easily through code generation, since SQL is by definition
very well defined. Don't assume for 5-10% of cases that building on a
distributed database is a good idea from scratch, unless you know you really
need it; if your API is clean you can always retrofit parallelism.

A second tip is that sharding your database (splitting one fat RDBMS in to
multiple databases) is often a great workaround (if workable in your
consistency model) earlier on in a project to avert the well-known failings of
RDBMS overcentralization (performance problems: "I can only buy a server so
fat", downtime issues when full backups, upgrades or new replication models
are required, etc.) and can have the side benefit of helping to protect
against architectural lock-in to any one particular DB engine (ie. immobility
to NoSQL solutions if that becomes required later).

