
Show HN: FaunaDB, a strongly consistent, globally distributed cloud database - evanweaver
https://fauna.com/blog/faunadb-serverless-cloud-launch-day
======
mdasen
I feel like I need to note the pricing. $0.01 per 1,000 queries. That doesn't
sound like much, but it adds up. Let's say you make 1,000/sec. $0.01 * 60
seconds in a minute * 60 minutes in an hour * 24 hours in a day * 30 days in a
month = $25,920.

Is that a lot? I think it is. Google Cloud Spanner costs $0.90/hour per node
or around $650/mo. Each Cloud Spanner node can do around 10,000 queries per
second[1]. So, $650 to Google gets you 10x the queries that $25,920 to Fauna
gets you. I mean, for $25,920, you could get a Spanner cluster with 40
servers. Each of those servers would only have to handle 25 queries per second
to get you 1,000 queries per second.

I'm sure that people are going to question whether FaunaDB can actually do
what it claims. At this pricing, I can't imagine someone actually seeing if
they can live up to their claims. They have a graph showing linear scaling to
2M reads per second. Based on their pricing, that would be $630M per year. For
comparison, Snapchat committed to spending $400M per year on Google Cloud and
another $100M on AWS (and people thought the spend was outrageous even for a
company valued at tens of billions of dollars). This is more money for the
database alone.

Heck, it looks like one can get 5-20k queries per second out of Google's Cloud
SQL MySQL on a highmem-16 costing $1k/mo[2]. That would cost $130k-$500k on
FaunaDB. It seems like the pricing of FaunaDB is off by a couple orders of
magnitude.

Ultimately, Spanner is something built by people that published a notable
research paper and used by Google. Reading the paper, you can understand how
Spanner works and be saddened that you don't have TrueTime servers powered by
GPS and atomic clocks. FaunaDB has some marketing speak about how I'll never
have to worry about things ever again - without telling me how it will achieve
that.

It's also implemented in Scala. This isn't a dig on Scala or the JVM, but I
use three datastores on the JVM and only one isn't sad for it is Kafka. But
Kafka does very little in the JVM - it basically just leans on sendfile to
handle stuff which means you don't get bad GC cycles or lots of allocations
and copying.

FaunaDB is a datastore without much information other than "it's great for
everything and scales perfectly". Well, at their pricing, they might be able
to make it happen. I mean, most customers would simply move to something
cheaper as they got beyond small amounts of traffic due to the pricing. 60,000
queries per second? That'll be $18M per year from FaunaDB or $50k per year
from Google. It's not even in the same ballpark. If you really need to scale
to 2M reads per second, $630M seems like a lot more than $1.6M for Spanner.

Maybe it's an easy way to get some money off people that "need a web scale
database", but are actually going to be serving like 10 queries per second and
are willing to spend $260/mo to serve that. If they hit it big, it shouldn't
be insane to scale it to 10,000 queries per second and milk $260k out of them
each month for a workload that can be handled by a single machine. That money
also pays for decent ops people to run a big box and consult with the customer
if they're going towards 100k queries per second with a $2.6M monthly payment.

EDIT: looking over Fauna's blog and some of their comments here, they seem to
understand more than their marketing lets on. Daniel Abadi is one of those
people whose name carries weight in the databases world (having been involved
with C-Store/Vertica, H-Store/VoltDB, and others). While I haven't read the
Calvin paper, it looks like a good read. I can see that they are using logical
clocks and I can't find it right now, but I thought I saw that they're not
allowing one to keep transaction sessions checked out - that all the
operations must be specified. So, it seems like there's some decent stuff in
there that's currently being obscured by marketing-speak. Still, the pricing
seems really curious.

[1] [https://cloud.google.com/spanner/docs/instance-
configuration](https://cloud.google.com/spanner/docs/instance-configuration)

[2] [https://www.pythian.com/blog/benchmarking-google-cloud-
sql-i...](https://www.pythian.com/blog/benchmarking-google-cloud-sql-
instances/)

~~~
evanweaver
Huge scale is what FaunaDB On-Premises is for; the pricing model is different.
That's what NVIDIA uses for example. Nevertheless, we will have volume
discounts and reserved capacity in Cloud too.

I see where you're coming from. People make the same argument against using
cloud services at all when you can buy hardware yourself and operate it. The
lack of flexibility is the hidden cost.

Our cloud pricing is competitive with other vendors, most of which require you
to massively over-provision in order to get high availability, especially
global availability, as well as predictable performance. In traditional cloud
databases, you have to provision for peak load. Usually this is an order of
magnitude difference from average load. An order of magnitude difference
happens to matches your Spanner example exactly; however with Spanner, you
still have to manage your capacity "by hand".

Architecture docs are on the way.

~~~
mdasen
You're right that it's was a bit unfair to compare a flexible FaunaDB to
Spanner which you'd need to provision for peak traffic. But even if it's an
order of magnitude more, $16M vs $630M is still quite a gap. It really doesn't
match the Spanner example. And if you're able to handle incredibly spiky
loads, information on how is kinda important. If I go from a steady state of
100 QPS to 15,000 QPS for a 20 minute period, will that just be pain?

You've said that Spanner makes you manage capacity by hand, but the marketing
copy says, "FaunaDB is adaptive, because it lets you change your
infrastructure footprint on the fly. Dynamically shift resources to critical
applications, elastically add capacity during peak events, and replicate data
around the world—all in a unified data fabric." So, if I'm expecting a burst
of traffic, do I have to "change my infrastructure footprint" manually? How
quickly can one "elastically add capacity"? I mean, I've seen plenty of
systems that one can add capacity to that, well, get humbled when copying data
to new nodes. Like, you had 10 nodes and now you want 15 because you're being
hammered. And wonderful, it's trying to copy data to the new nodes while it
was already having capacity issues and only making response times worse and
errors go up. I'm not saying that will happen to you, but there's no
information to make me think that problem is addressed.

Honestly, people involved in FaunaDB seem to know enough about databases that
I'd just expect more real information on the website. When Kudu came out, they
published a paper that basically read like, "well, we created a column store
kinda like one would if you'd read the C-Store paper and these are the trade-
offs and we seem to have done reasonably" and I came away from reading it
thinking, "ok, these people know the score. It may or may not be executed well
enough, but there's an understanding." They led with a paper that might not
have been revolutionary, but really showed that they understood the space and
explained how it was designed such that someone with databases knowledge could
see that it was reasonable.

Introducing your database with so much, well, non-information doesn't help you
(in my opinion). Without digging, it looks like another DB vendor that
promises everything will be perfect and that it's great for any workload.

The whole "About FaunaDB" page doesn't tell me much. Like, there's a comment
in here that tells me you're using logical clocks, I can see from Daniel's
Twitter that you're using some of his research, etc. I mean, you actually have
cool technical details to highlight - details that make your DB seem a lot
more real. But the page makes it feel like you don't have cool technical
details - that you're trying to hide information because it's not good. I
mean, adding in some details about how things are achieved make a product seem
a lot more real. I know what logical clocks are. Calvin is a research paper I
can read. I mean, finding that makes FaunaDB seem way more real - there's
something substantive. Like, I can read Calvin tomorrow and some of the ways
you're achieving things will come to light and I might be impressed.

But right now, it's really hard to find the information that would impress
technical readers.

~~~
evanweaver
I'm with you. That level of detail is coming soon.

------
gregwebs
I think Fauna is not very good at docs and communication yet, at least judging
by confusion from some of the comments and by reading their docs. But
launching will probably make them a lot better at it. Here are my notes which
may add clarity for some:

similar to a RethinkDB/MongoDB:

* Designed to be great for storing application data. Fields can be dynamically added (schemaless) and their values can be arrays so it is easy to maintain data locality according to your application use patterns.

* Uses a non-SQL query language

* Probably not great for ad-hoc reporting (arguably SQL is a requirement for that)

Unlike MongoDB: supports joins

Unlike RethinkDB: great support for transactions, just not SQL transactions
with an open session (which are unnecessary for an application)

Unlike most databases

* cloud-hosted and pay-for-use (on-premise is on their roadmap)

* claims support for graph data by storing arrays of references

* QoS built-in so you could run a slow analytics query without disrupting your application

Cons

* Unfortunately just like MongoDB/RethinkDB they have no real database-level integrity of schema and foreign keys, but at least foreign keys are on their roadmap.

I am a huge fan of the cloud-hosted pay for use aspect: I wonder why anyone
would design a DB today without this in mind. You can transfer your data from
a pay-for-use application DB (FaunaDB or Google DataStore) to a data warehouse
(Snowflake or Google BigQuery) which is also pay for use and gives you SQL
reporting abilities.

~~~
freels
Thanks, we realize the docs still have a long ways to go and are working to
improve them as fast as we can. This is a good summary.

We're definitely aware that the lack schema definition is a problem for
certain use-cases, and solving this is on our roadmap.

------
ccleve
You might consider removing the badge on the home page that says "Global
Latency 2.8 ms". Unless you really can give me latency across the globe of 2.8
ms, in which case your solution to the speed-of-light problem is quite
impressive :)

~~~
evanweaver
Reads do not require global coordination because of the way the transaction
log advances within each datacenter, but I take your point.

~~~
jazoom
I'm not sure if you did take the point intended. You would need a datacentre
in every city in the world to get that "global latency".

~~~
ccleve
Yes. Light travels 839 km or 521 miles in 2.8 ms. If you need a round-trip to
send a query and get a response, then you would have to be within a few
hundred miles of the datacenter, assuming perfect efficiency.

~~~
qaq
Could you point us toward the amazing switches and routers that have 0
processing latency we would love to buy them.

------
akerl_
It seems like one of the big issues with the marketing copy here is some of
the word tricks being played:

Any first-read of "The first serverless database" has the implication that the
database is serverless. Comments from FaunaDB folks in this comment page
clearly indicate that they _mean_ is that it's the first database _for_
serverless, which is a pretty bold claim, given Google and AWS and any number
of other providers offer databases that are accessible from serverless things,
so it essentially boils down to "The first database that's marketed
specifically to serverless use cases", which is maybe true but also kindof not
a useful trophy to put on the mantle?

This is further muddled by the blog post linked to from the launch
announcement ([https://fauna.com/blog/escape-the-cloud-database-trap-
with-s...](https://fauna.com/blog/escape-the-cloud-database-trap-with-
serverless)), which includes "FaunaDB Serverless Cloud is an adaptive,
serverless database". Nobody is reading that and thinking "ah, an adaptive
database _for_ serverless apps".

To describe it as "The first active-active multi-cloud database", is possibly
true if you mean "the first time a single company has sold a publicly-
available database-as-a-service running on multiple cloud providers". But the
text says "database" where "public database-as-a-service" would be the
accurate term, leaving the reader with the impression that no existing
databases can be set up on multiple cloud providers in an active-active HA
config, which is absurd. Fixing the copy here should be pretty easy, and
they're already headed in the right direction with the next bullet point,
although it as well refers to "database" where it means "database-as-a-
service".

It feels like somebody on marketing really wanted to have a list of firsts, so
they toyed with definitions of words until they thought they could flex these
into being technically accurate. I get the same feel from the closing argument
in the linked blog post: "The query language, data model (including graphs and
change feeds), security features, strong consistency, scalability and
performance are best in class. There is no downside.". I don't think I want to
trust a database if the folks designing it couldn't think of any downsides.

~~~
michaelmior
I'm not aware of any other services which offer pricing that isn't based on
number and size of database servers.

Edit: Thanks to all the commenters who corrected me :)

~~~
edaemon
It's entertaining that one can make an innocent HN comment about something
esoteric like cloud databases and be quickly corrected by three different
Google engineers.

To provide a non-Google example, AWS DynamoDB's pricing is based on throughput
(similar to Google Cloud Datastore).

~~~
wsh91
Heh. :)

One technicality: DynamoDB's pricing is based on throughput, as you know, but
it's _provisioned_ throughput. (You manage your capacity unit allocations
yourself, at least that was the case a little while ago.) You're charged money
for how much you provision regardless of whether or not you actually consume
the capacity units. Our pricing model (like Fauna's) only takes ops and
storage into consideration.

~~~
bscanlan
Ah, so just like SDB. I wonder why AWS moved on from that model.

------
evanweaver
Hey everybody, today we launched FaunaDB Serverless Cloud, 4 years in the
making. FaunaDB is a strongly consistent, globally distributed operational
database. It’s relational, but not SQL.

We're excited to open our doors and explain more of our design decisions. Our
team is from Twitter, and that experience has deeply informed our interface
and architecture. Try it out and let us know what you think.

An on-premises release is coming later this year.

~~~
hoodoof
4 years seems a very long time for development.

I'd be interested to hear why, and what you would do differently if you were
to start from scratch.

~~~
the_duke
I don't think 4 years is long at all for developing an advanced, distributed
database solution.

It is a long time to go without real world use and user feedback, though.

~~~
evanweaver
The beta has been in production with cloud and on-premises customers for two
years.

------
wsh91
(Disclosure: I work on Google's Cloud Datastore.)

This looks super neat, and I can't wait to learn more about it, but just for
the record: I'm pretty sure this isn't the first serverless cloud database.
Both Firebase's Realtime Database and Cloud Datastore (which powers Snapchat
and Pokemon Go) are serverless; you pay only for your ops and storage. They've
been publicly available for several years.

~~~
evanweaver
Fair enough; I think it depends where you draw the line between key/value
store and database.

Both of those depend on other distributed storage systems under the hood, as
far as I am aware? Or is Datastore an end to end system? I know Firebase was
backed by MongoDB.

~~~
wsh91
Datastore runs on top of Megastore [1]. You can find out more about our data
model here [2], but it's definitely not limited to key-value data.

Our end users don't have to think much about our storage system, though, if
we're doing our jobs right. :)

[1]
[https://cloud.google.com/datastore/docs/articles/balancing-s...](https://cloud.google.com/datastore/docs/articles/balancing-
strong-and-eventual-consistency-with-google-cloud-datastore/) [2]
[https://cloud.google.com/datastore/docs/concepts/entities](https://cloud.google.com/datastore/docs/concepts/entities)

~~~
elvinyung
An interesting observation that I can't seem to "un-observe" is that Megastore
is actually a lot more like MongoDB than one would expect.

Both ostensibly work best when the application fits a hierarchical data model
(entity groups vs. documents), and provide out-of-the-box strongly-consistent
transactions for a single entity group. MongoDB feels like schemaless
Megastore.

~~~
wsh91
:)

Here's a paper with which you might already be familiar, but it's one of the
citations for the Megastore paper:
[http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pd...](http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pdf).
You'll probably enjoy it (if you haven't already!).

------
Efrim-Lipkin
Hi guys. Can I ask a simple question?

I understand that we are talking about a globally distributed, serverless and
yet consistent relational database.

My question is about latency. How long does it take for transactional
atomicity to become a consistent read on a globally distributed database? (1)
And what are the measures taken between entry nodes to prevent clients from
recieving inconsistent data? (2)

As I ponder this, I am struck by not the consistency problem, as that is
solvable. But I am struck by the latency problem of assuring that all global
queries are consistent for some (any) time quanta. What sort of latency should
be expected?

both questions (1) and (2) are interesting, but (1) is critical while (2) is
academic.

Thanks, and very interesting work guys.

EL

~~~
freels
FaunaDB has a per-database replicated transaction log. Once a transaction has
been globally committed to the log, it is applied to each local partition that
covers part of the transaction. By this point, the transaction's order with
respect to others in a database and results are determined. While writes
require global coordination to commit, reads across partitions are coordinated
via a snapshot time per query, which guarantees correctness.

In short, writes require a global round-trip through the transaction pipeline;
reads are local and low latency.

~~~
Efrim-Lipkin
This is a very good answer. So if I understand you correctly (please correct
if I do not), atomicity is handled on a per connection basis (writes cannot be
distrubted). And there may be high latency in distributing a transaction, but
read consistency is guaranteed by timestamp (equivalent to versioning).

Is this correct?

EL

~~~
freels
One thing that makes this easier is that FaunaDB does not support session
transactions, rather you must express your transaction logic as a single Fauna
query, which is executed atomically. Transactions can still involve arbitrary
keys, however.

And yes, for reads, by default the coordinating node chooses a timestamp and
uses that to query all involved data partitions. Each partition will respond
with the requested data as of at that timestamp, or will delay responding
until it has caught up.

One nice thing about this approach is that any chosen timestamp is enough to
provide a consistent snapshot view of the dataset at that time. This ends up
being useful for bulk or incremental reads, where a longer running process
needs a stable view of the dataset.

~~~
ewrcoffee
Without session transaction, how does the application performs transactional
read modify write?

------
danthemanvsqz
I think they missed Google's launch of Spanner their distributed strongly
consistent DB.

~~~
evanweaver
Spanner is great, but it's not pay as you go, or multi-region (yet), or multi-
cloud.

------
jazoom
$0.01 per simple operation sounds very expensive to me. This would add up very
quickly.

Edit: I misread it. Perhaps instead of inventing your own point system that
you have to explain and hope silly people (like me) don't mix up you could
take a lesson from Google Cloud and just lay out the pricing in a table. If
you ever add another service you'll have to integrate it also into your made
up points system.

~~~
evanweaver
It's $0.01 per thousand, not each.

~~~
jazoom
You're right. I realised that and came to correct it. Thanks for pointing it
out.

------
doublerebel
That pricing model and serverless model is why I've always chosen
CouchDB/Cloudant. If I'm doing the MB/hour to GB/month conversion correctly,
Fauna cloud is significantly cheaper.

I see Fauna has temporal queries, but receiving events is strictly pull, there
is no push or single feed?

~~~
jchrisa
Event push / feeds are on the roadmap. Currently we have everything
implemented at the data model level to do live query feeds, you just have to
do polling until we ship the feature.

I'm working on a follow up example to this CRUD one, that implements a multi-
user TodoMVC, and will use event queries to keep the UI updated between tabs
and users. You can see the basic Serverless CRUD starter example here:
[https://fauna.com/blog/serverless-cloud-
database](https://fauna.com/blog/serverless-cloud-database)

------
jchrisa
There is a related technical blog post [1] and discussion [2]. Also I've got a
companion blog post on the Serverless.com blog at [3]

[1] [https://fauna.com/blog/escape-the-cloud-database-trap-
with-s...](https://fauna.com/blog/escape-the-cloud-database-trap-with-
serverless)

[2]
[https://news.ycombinator.com/item?id=13877223](https://news.ycombinator.com/item?id=13877223)

[3] [https://serverless.com/blog/faunadb-serverless-
authenticatio...](https://serverless.com/blog/faunadb-serverless-
authentication/)

~~~
zenithm
What's the biggest deployment so far?

~~~
jchrisa
We have a bunch of customers listed in the press release. [0] Our managed
cloud installation is about the size of our large customers. NVIDIA launched
their latest world-scale user facing service on top of FaunaDB.

Personally I'm more interested in helping people writing fresh apps use
FaunaDB, because while we can solve enterprise problems at scale, it's the
greenfield apps that will be able to best use our advanced features.

[0] [http://finance.yahoo.com/news/fauna-launches-faunadb-
serverl...](http://finance.yahoo.com/news/fauna-launches-faunadb-serverless-
cloud-131200188.html)

------
snackai
Serverless Database, Global Latency 2.8 ms, Relational but no SQL (whatever
sense that makes) BULLSHIT BINGO at its very best.

~~~
hoodoof
You could put this more nicely - they have spent alot of time working on it.
I'm sure no-one intends "BULLSHIT".

One way to handle it when something strikes you as not correct is to politely
ask for clarification.

~~~
snackai
Well no. A Global latency of 2.8 ms is just bullshit. As others pointed out,
they did not beat the speed of light. I hate stuff like this. If you really
have a worthy product just point out what it _really_ can do, not what in a
lab environment worked once.

~~~
btilly
I agree with the buzzword bingo complaint, but they really do have a claim
that is defensible here.

Their claim is that you can run an application distributed in a cloud around
the world with data in their database, and read queries to their database will
get results in an average of 2.8 ms.

This puts a lot of caveats on that 2.8 ms claim, and makes it something that
is both good and believable. Which makes the claim very much not bullshit.

------
z3t4
You should explain how it works. It's not like I'm going to steal your ideas
and spend five years implementing them ... or maybe I will if it's good ;)

~~~
jchrisa
The node that receives the query figures out which nodes it needs to talk to
for an answer, and then it adds the operations to the next batch for those
nodes. The batch dispatches, the nodes do their work, the batch commits, and
the client receives the response.

It's not really a full answer, but maybe hints at the architecture.

------
anamoulous
I have been a fan of Evan going way back to the early Rails days. Congrats on
the launch.

------
elvinyung
I'm curious about the relational-ness of FaunaDB. e.g. How do you efficiently
maintain integrity of foreign key constraints across the entire system? How
fast and consistent are secondary indexes?

------
sushisource
So... where does the data go? Maybe a simpleton question but I couldn't easily
find an answer in the about section. If it's all function-based, where does
the data actually get persisted?

~~~
jchrisa
This is a database FOR serverless style applications. It runs on servers like
most databases, it's not made out of lambdas. But it's built so you don't have
to worry about the details. When a traffic spike hits your app we'll keep up.
And when your app is quiet, you don't pay for unused capacity.

~~~
sushisource
Ooooooooooh, that makes sense now. I thought it was a database _implemented
as_ as whole series of individual lambdas and I was having a hard time
figuring out how you guys pulled that one off.

Data stored "on the wire"? :D

~~~
inopinatus
Pretty sure this is how reality works: all matter is information, all
information is functional, hence all perception is the lazy evaluation of a
functional universe.

It merely remains to turn this into a startup.

------
mring33621
If my calculations are correct, that's about $87+ million USD to store 1 PB of
data for one year?

~~~
evanweaver
That doesn't seem right. Will investigate.

~~~
the_duke
1 [mb hour] * 24 [hours] * 365 [days] * 10^9 [mb] / 1000 [points] * 0.01 [$] =
87.6 million

So unless you have some quantity discounts, that would seem to be the price
for storing 1 PB , without any querying.

~~~
evanweaver
Yeah it's our mistake. It's actually gigabyte-hour.

~~~
ryanworl
So in the OP's example, the cost is $87,000 per petabyte-year of storage?

EDIT: I see mentioned in another post this a per-replica cost. So it would be
roughly $87,000 times the number of replicas, ignoring the initial queries
that inserted the data in the first place?

~~~
evanweaver
I think so but we are going to run the cost regressions again and make sure
it's in line with market.

------
dsl
How are you running this serverless? Is it a thin application in front of AWS
or Google BigQuery?

~~~
jchrisa
No, this is not made OF serverless, it's made FOR serverless. We run on big
nodes in the same cloud as your app.

------
jimreedia
Cool. My team has been looking for something like this.

~~~
jchrisa
You can jump directly into the developer dashboard with only an email address
here, and start playing with queries: [https://fauna.com/serverless-cloud-
sign-up](https://fauna.com/serverless-cloud-sign-up)

~~~
jimreedia
My team wants to understand this in context of other databases. Are there any
architecture docs available?

~~~
evanweaver
We have a white paper coming soon...if you email us at priority@fauna.com we
can give you a preview.

------
pbgiese
This sounds a lot like google spanner. I'm no expert though. What's the
difference?

~~~
jchrisa
The big difference is that we use a logical clock and batches of operations so
that we aren't dependent on atomic clocks. We also have a different API style,
and plan to run on all the major cloud providers. You'll be able to do
consistent operations visible to application code running in different
providers.

------
nunzi
can't wait to learn more

~~~
evanweaver
The docs are now public...so you can.

We have a blog post by Daniel Abadi coming as well about the consistency
model.

~~~
btilly
Have you talked to Aphyr yet about testing it and having it become an entry in
[https://aphyr.com/tags/jepsen](https://aphyr.com/tags/jepsen)?

I've learned to not believe that distributed software will work in practice in
the way that its authors claim it will. The stronger the claims, the more
important it is to have an independent test validating it before I even think
of trusting it.

~~~
evanweaver
Totally understandable. We have been talking to Kyle. We have internal
verification systems and will be publishing more later this year to that
effect.

For what it's worth, we have built high-performance distributed systems
before....so it's not just wishful thinking.

------
hubert123
Do you support a hard limit on money spent? I would like to be able to say 30
bucks a month max or something

------
marknadal
If your database is not Open Source then your marketing lingo needs to be more
open or else you'll have the same mistake as FoundationDB (which looked like
vapor-ware).

As a proprietary service, you are now competing against Cloud Spanner which
(while people love the underdog) means your toast because they have Eric
Brewer to hand wave away their marketing lingo.

On the flip side, you are competing against Cockroach, but they are Open
Source so that puts you in a rock and a hard place. From previous comments of
mine, you may know I don't think Cockroach has much of a future either because
Globally Consistent databases aren't going to cater to the necessary P2P
future of the web (5B+ new people coming online, 100B+ IoT devices, graph
enabled social web, Machine Learning, etc. which is what we,
[http://gun.js.org/](http://gun.js.org/) , caters to and we just successfully
ran load tests on low end hardware doing 1.7K table inserts/sec across a
federated system, we plan on getting this up to 10K inserts/second on cheap
[if not free] hardware).

Why are these systems going to fail to pick up the market? Because the best of
the best, both in engineering and as an Open Source community, RethinkDB
(which I praise highly) couldn't. At the end of the day, the few companies
that need globally consistent transactions will trust (for better or for
worse) Cloud Spanner, and the others who want to roll their own infrastructure
will try Cockroach but ultimately switch to RethinkDB in the end.

So on that note, as others have noted, don't use your /fantastic/ marketing
opportunities (top of HN) to make false claims about being "industry first",
it won't help you gather a developer community. Use this time to win
developers over like Firebase did (which itself now has their community scared
of when/if Google will shut them down, those developers are now flooding to
RethinkDB and ours, despite Firebase being one of the best - high praise for
them as well, like Rethink).

~~~
elvinyung
> Globally Consistent databases aren't going to cater to the necessary P2P
> future of the web

Well, that's an interesting assertion. Why do you think that?

~~~
marknadal
Because even a "3ms latency" (which was a problem, with respect to "global",
that other people have commented out) can absolutely kill the performance for
IoT data that may be emitting thousands of updates a second.

Those systems are largely highly localized, and so /strong eventual
consistency/ is more important than globally consistent blocking operations.

Also, again with 5B+ people coming online, Master-Slave systems (even
distributed ones) still have a huge bottleneck already in the present day. P2P
systems (master-master) will scale better in these settings.

~~~
elvinyung
I was more curious about the "necessary P2P future of the web" part.

I think there's an assumption here that _most_ of the responsibility for
storing the source of truth will move out to things like IoT devices (i.e. fog
computing).

And sure, there will probably be a need for that. But regarding the assumption
that most web services will go away, I don't there's sufficient evidence to
bet on it happening anytime reasonably soon. Data centers and public clouds
will probably still be there in the next decade or two.

~~~
marknadal
Twitter is spending over $15M/month on server costs alone to support 333M
active monthly users.

Now compare to Pokemon Go's huge explosion of 20M daily users from a while
ago.

This problem is only going to get worse with another 5B+ people coming online
into the 2020s.

In order to scale, using (what you call) "fog computing" will be absolutely
necessary. Cloud services will still be used, of course, but they will be
built as P2P systems to take advantage of the "fog".

Cloud infrastructure will always be around, but how apps are built will be a
fundamentally different architecture. But when S3 goes out, like it did the
other week, we can't suffer worldwide downtime - that will be unacceptable.

Rethink's unfortunate failure to capitalize in this market is a signal that
Master-Slave databases (even the best of the best) will have a very small role
with respect to the total amount of data flowing through the internet.

My thoughts here: [https://hackernoon.com/the-implications-of-rethinkdb-and-
par...](https://hackernoon.com/the-implications-of-rethinkdb-and-parse-
shutdowns-c076460058f7)

As well as my the Changelog podcast interview:
[https://changelog.com/podcast/236](https://changelog.com/podcast/236)

