Hacker News new | past | comments | ask | show | jobs | submit login

(Disclosure: I work on Google's Cloud Datastore.)

This looks super neat, and I can't wait to learn more about it, but just for the record: I'm pretty sure this isn't the first serverless cloud database. Both Firebase's Realtime Database and Cloud Datastore (which powers Snapchat and Pokemon Go) are serverless; you pay only for your ops and storage. They've been publicly available for several years.

Fair enough; I think it depends where you draw the line between key/value store and database.

Both of those depend on other distributed storage systems under the hood, as far as I am aware? Or is Datastore an end to end system? I know Firebase was backed by MongoDB.

Datastore runs on top of Megastore [1]. You can find out more about our data model here [2], but it's definitely not limited to key-value data.

Our end users don't have to think much about our storage system, though, if we're doing our jobs right. :)

[1] https://cloud.google.com/datastore/docs/articles/balancing-s... [2] https://cloud.google.com/datastore/docs/concepts/entities

An interesting observation that I can't seem to "un-observe" is that Megastore is actually a lot more like MongoDB than one would expect.

Both ostensibly work best when the application fits a hierarchical data model (entity groups vs. documents), and provide out-of-the-box strongly-consistent transactions for a single entity group. MongoDB feels like schemaless Megastore.


Here's a paper with which you might already be familiar, but it's one of the citations for the Megastore paper: http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pd.... You'll probably enjoy it (if you haven't already!).

If I remember correctly, Datastore is basically a thin layer on top of Megastore[1] (aka the precursor to Spanner).

1: https://static.googleusercontent.com/media/research.google.c...

It's a lot more than a thin layer, but like most systems at Google is a specialized layer that utilizes built on top of more fundamental building blocks that are designed and proven to do a particular job really well.

In this case Megastore provides the underly multi-region/datacenter K-V replication services.

All the database features like secondary & composite indexes, query language, multi-tenancy support, PAYG model, etc, etc, are built in the Cloud Datastore layer.

> secondary & composite indexes

Interesting, so you don't use Megastore's indexes?

As you noted earlier, Megastore has a schema and we don't, so we have our own index implementation, yes. :)

In technology evolution there are technologies that enable a new ecosystem, and then there are technologies that are built natively for that ecosystem. The previous generation of datastores enabled Lambda style applications, the next generation of databases assumes they are the new normal.

The reasons FaunaDB fits serverless like a glove can be boiled down to a few points: pay-as-you-go, database security awareness and object level access control, hierarchical multi tenancy with quality of service management. Running on multiple clouds makes the Serverless model more acceptable for risk averse enterprises, and complements multi-cloud serverless FaaS execution environments nicely.

There's more to say, check out this post on the blog: https://fauna.com/blog/serverless-cloud-database and https://fauna.com/blog/escape-the-cloud-database-trap-with-s...

I'm familiar with both; what disappoints me is the claim of novelty here with respect to autoscaling. That's just not true. To quote you:

"A serverless system must scale dynamically per request. Current popular cloud databases do not support this level of elasticity—you have to pay for capacity you don’t use. Additionally, they often lack support for joins, indexes, authentication, and other capabilities necessary to build a rich application."

That first criterion we absolutely meet, today. Cloud Datastore has been doing that for eight years now. We don't have joins, but we do have indexes, auth, multi-region replication and a whole lot more.

That's why it comes down to the details and the fit and finish. One neat feature is the ability to run Lambdas with a database access token corresponding to a particular user, which can then be passed through to sub-Lambdas (or it can even run with sub-permissions). Here is a blog post with quickstart instructions: https://serverless.com/blog/faunadb-serverless-authenticatio...

For instance, you could have a fire-and-forget self-service self-provisioning online shopping site builder, and bill database costs through to your customers (we give you that information in response headers).

You can also use FaunaDB to do consistent coordination between FaaS execution environments running in different clouds. So if you like a processing feature Azure makes available, but want to run your user facing servers in GCE, you can use FaunaDB to coordinate between the clouds.

Again, neat stuff, but that's not what this link claims:

  * The first serverless database
  * The first active-active multi-cloud database
  * The first strongly-consistent multi-region database available to the public
None of these are firsts. I don't know if our (GCP) services are themselves the first of their kind (it's an ambitious claim and, as an engineer, I try to be careful about those), but Datastore meets at least two of those three and predates FaunaDB by several years.

Maybe the first one.

GCP can't span multiple public cloud providers, or even different continents within GCP, apparently.

Indexes and cross-partition transactions aren't consistent, which doesn't meet, to us, the minimum bar for utility. Your docs say the consistent write throughput per entity group is 1 write per second?

Perhaps I've misunderstood you, but I'm pretty sure cross-partition (I assume you mean cross-entity-group in our terms) transactions are in fact consistent (not totally sure what you mean by transactions being consistent, per se; if you're talking about serializability, at least, we are). Explicitly (from [1]):

  Queries that participate in a transaction are always strongly consistent.
And the consistent write throughput to which you refer means sustainable write throughput per entity group. We can burst much higher.

It would be much easier to assess the relative consistency models of our products if FaunaDB had documentation with respect to its claims. We have a litany of pages about ours (e.g. [2] and [3]).

[1] https://cloud.google.com/datastore/docs/concepts/structuring...

[2] https://cloud.google.com/appengine/articles/transaction_isol...

[3] https://cloud.google.com/datastore/docs/concepts/transaction...

Those docs are coming. I still don't see how you can make any useful claim about consistency when indexes are never isolated or consistent, and sustained consistent write throughput can't exceed 1 wps.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact