
F1: A Distributed SQL Database That Scales - alec
http://research.google.com/pubs/pub41344.html
======
jacques_chester
Impressive, as one might expect. Some reactions:

\-- Secret sauce

A lot of the magic of F1 comes from Spanner, the distributed storage system.
The name "F1" itself is an allusion to "inheriting" some of the properties of
Spanner.

\-- Hierarchical tables

What they call hierarchical tables, I would think best be viewed as one-to-
many relationships. In guess they've privileged this model in their storage
because that's what a lot of their AdWords schema looks like.

\-- Change History

I like the observation that keeping full histories is relatively
straightforward with atomic, granular timestamping and indeed that it should
be baked in. Every database schema I've ever worked with always goes through a
similar evolutionary cycle:

1\. We only need to capture the current state of the model.

2\. Wait, we do need to capture historical states of the model.

3\. Wait, the model is changed, we need to capture historical states and the
models that were current.

(You can think of this as taking progressive differentials of incoming
transactions).

The F1 designers have baked that right into the database, where it belongs.
Weak temporal support has long been the sore point in SQL.

\-- Remote data

I was struck by their observation that most database storage engines are built
around the concept of seeks and reads, whereas theirs is necessarily built
around batching and pipelining over a network. If I am reading them correctly,
their engine takes advantage of having multiple copies of data by sending
reads to multiple disk nodes and then working from the first copy that is
returned.

~~~
vosper
Your point about change history and its poor support in SQL is spot-on; at my
employer we've gone through (1) and (2) and are trying to figure out (3) at
the moment.

From what I understand of Datomic temporal support is baked in, but it's about
the only database I can think of that does that, and I haven't heard stories
about it being used in production for large volumes of data.

~~~
jacques_chester
Snodgrass wrote an entire book about dealing with the temporal blindness of
SQL which might help you:

[http://www.cs.arizona.edu/people/rts/tdbbook.pdf](http://www.cs.arizona.edu/people/rts/tdbbook.pdf)

More generally, you can usually build a model that handles (2), the changing
state, by generalising your original model. I've done in both of classical
ways, having either validity period fields (thus pushing complexity into every
query), or by having audit tables (thus relying on triggers).

But building a changing model is hard, because at the schema level, SQL only
supports (1). That is, it provides primitives to change the current state of
the model.

So you wind up having to build a meta-model. The deficiencies of SQL (and I'm
an RDBMS bigot) mean that we wind up building inner platforms. Or outer
platforms -- witness migration toolkits.

Yet support for changing models is _essential_. No model is ever constant.
Most of my work is for a public-sector client whose legal reporting
requirements are _constantly_ changing. You cannot throw away old reports
which were made under old regulations; you must be able to recreate the world
at any point in time. Which requires either a lot of bookkeeping code or
taking periodic snapshots.

Datomic looks neat. I see that I'm not the only one who made the leap from
transactional memory to transactional models.

Edit: I just found notes I wrote in 2009 on database languages while looking
for something else. Spooky: [http://chester.id.au/2013/08/28/notes-towards-a-
set-objectiv...](http://chester.id.au/2013/08/28/notes-towards-a-set-
objective-language/)

~~~
twic
Databases which use MVCC handle (2) - and if they have transactional DDL, then
they cover (3). But they don't keep the history permanently. Who wants all
that old data?

The one exception to this that i know of is Oracle, which although not
fashionable round here, does expose the MVCC log through "flashback queries",
where one can simply write "SELECT * FROM USER AS OF TIMESTAMP (SYSTIMESTAMP -
INTERVAL '60' MINUTE)":

[http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_...](http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_flashback.htm)

I don't know what the logistics of keeping an infinite history are. I do know
that an e-commerce system i worked with before kept a few days, and that that
saved our bacon on a couple of occasions.

~~~
jacques_chester
They have the technological underpinnings, but it's not properly exposed _via
SQL_. In practice you wind up having to do it all by hand.

What the F1 guys seem to have done is said "well that's just silly, let's take
advantage of it".

Oracle is my day job, I've been saved by a flashback query once. A DBA
misunderstood a request I was making and applied a dev change to production.

> _I don 't know what the logistics of keeping an infinite history are._

I'd say they're not much different than asking a database developer to be able
to generate reports for any given span of time. Or keeping a complete audit
trail of changes. Or, more broadly, complete website logs, complete source
revision history etc etc.

That is: in theory it creates an infinite downside. In practice it is more
valuable than not, especially with the ever-plunging cost of storage.

------
fintler
I posted this question on StackOverflow regarding TrueTime (used by Spanner) a
few days ago and haven't received any responses:

<[http://stackoverflow.com/questions/18384883/why-is-
googles-t...](http://stackoverflow.com/questions/18384883/why-is-googles-
truetime-api-hard-to-duplicate>)

However, I think this HN thread seems like it might be a good place to get
comments on why my line of reasoning may be incorrect. Does anyone have any
thoughts on why building something like Spanner on top of basic Paxos quorums
and NTP would be a bad idea?

~~~
mad44
Check this out. [http://muratbuffalo.blogspot.com/2013/08/beyond-truetime-
usi...](http://muratbuffalo.blogspot.com/2013/08/beyond-truetime-using-
augmentedtime-for.html)

~~~
fintler
Thanks for the link -- I'm working through it now.

------
capkutay
Can someone enlighten me as to why google doesn't want to make an enterprise
play? Many companies tout that they are offering data infrastructures similar
to those used at google..why wouldn't google commercialize that technology
themselves? It should be pretty obvious as to how that could be of huge
financial benefit to the company.

~~~
dragonwriter
> Can someone enlighten me as to why google doesn't want to make an enterprise
> play?

They do. They have a range of commercial, including enterprise, offerings
(many of which are based on originally-internal technologies, including their
MySQL-based distributed database that preceded F1.)

> why wouldn't google commercialize that technology themselves?

They probably will, just as they have commercialized their previous internal
storage technologies.

------
dvliman
This is not new by any mean...

~~~
jahewson
The paper was only published in conference proceedings yesterday.

~~~
paulsamways
See
[http://research.google.com/pubs/pub38125.html](http://research.google.com/pubs/pub38125.html)

