
Serverless Databases: The Future of Event-Driven Architecture - marvinpinto
https://www.simform.com/serverless-databases/
======
hodgesrm
> I, personally, have seen many projects burning thousands a month in database
> costs because they prefer replicating the environment for testing branches.

If this is the premise for serverless database it's a weak start. If you
really need lightweight DBMS for testing just run MySQL or PostgreSQL in
Docker. If you really need access to production-like data (e.g., a lot of it
so you get realistic distributions) run the same DBMS on cheap hardware or
cheap instances. In both cases you can use persistent volumes and shut down
when things are not in use. Few people really care if it takes a few minutes
to spin up the test environment.

As for the main point of Aurora offering a "Serverless" architecture it looks
as if what they've really done is enabled the DBMS compute layer to scale up
and down quickly. I wonder if this optimization fell out of pushing redo log
management down into the storage layer. (See Section 3.1
[https://www.allthingsdistributed.com/files/p1041-verbitski.p...](https://www.allthingsdistributed.com/files/p1041-verbitski.pdf)
for details.)

~~~
marknadal
Exactly! It is also ridiculous that this "serverless" phrase is being used to
push this agenda.

"Serverless" DBs are also very pricey if you consider everything. Instead, it
isn't actually that hard to do some work loads at huge scale for cheap, two
good articles on this (one discord, one ours)

\- [https://blog.discordapp.com/how-discord-stores-billions-
of-m...](https://blog.discordapp.com/how-discord-stores-billions-of-
messages-7fa6ec7ee4c7)

\-
[https://www.youtube.com/watch?v=x_WqBuEA7s8](https://www.youtube.com/watch?v=x_WqBuEA7s8)
(100M records/day for $10 total cost, server, disk, S3 backup!)

~~~
hodgesrm
Thanks for the references! Just read the blog article. It was kind of obvious
from first paragraph it would be a Cassandra deployment story. ;)

Gun is new to me. You have a good way of handling distributed consistency.
It's intriguing that the system still can reach consistency even if every node
loses network connectivity temporarily. Is there any academic work behind
this?

~~~
marknadal
Thank you! Yes, there is currently just a whitepaper we're publishing with
Stanford and that was also reviewed by colleagues at MIT. However it isn't up
to snuff yet to be posted publicly, so shoot me an email mark@gunDB.io and
I'll send you the link.

For non-academics though, I did a comic strip explainer for the layperson (as
distributed systems are often hyped up with elitist jargon) here:
[http://gun.js.org/distributed/matters.html](http://gun.js.org/distributed/matters.html)
!

The prototype shown in the video was specific to append-only data and was done
last year, we recently rewrote the system to be more generalizable using a
radix trie structure and have released an alpha of it with the Radix Storage
Engine (RSE) in the main repo:
[https://github.com/amark/gun](https://github.com/amark/gun)

Let me know if you need help with any/all of that, or more links/resources
(docs are somewhat scarce on it currently, sadly), and I'll do what I can!

Although, I can't resist but to leave this one last animated explainer gif
that shows what a radix looks like:
[http://gun.js.org/see/radix.gif](http://gun.js.org/see/radix.gif)

------
dsr_
A comparison that only mentions pros and no cons is not a useful comparison.
It's a cheerleading piece.

------
jtwaleson
We're deploying thousands of tenants to their own postgres RDS instances which
are completely overkill for most scenarios. On average we use about 3% CPU...
We still do this because of three reasons: security isolation, performance
isolation and monitoring. I think Aurora serverless will be a game changer for
us, but we would still need per-tenant monitoring.

~~~
icey
I do this with a side project, except it's SQLite databases on s3 buckets --
that's about as easy as it gets and doesn't require nearly as much of the
configuration overhead that Aurora does.

~~~
pletnes
How does that work? Download DB file, read/write, upload DB file (if written)?
What about concurrency?

~~~
icey
I wouldn't trust it for write concurrency but it's been great for my use case
(reads are multiple orders of magnitude more frequent than writes; and the
writes can be queued), I'm using s3sqlite for this from the Zappa project:
[https://github.com/Miserlou/zappa-django-
utils/blob/master/z...](https://github.com/Miserlou/zappa-django-
utils/blob/master/zappa_django_utils/db/backends/s3sqlite/base.py).

------
peterevans
I was hoping the future of event-driven architecture is that we would call it
"event-driven" and not serverless; alas.

~~~
mjb
Event-driven and serverless are related, but distinct ideas. Event-driven
architectures can be built in a datacenter, on virts, in containers, or on
serverless. On the other hand, serverless architectures don't necessarily need
to be event driven. These patterns do have natural affinity, so they are often
referred to together.

Serverless is a set of financial, scaling and operational properties of an
architecture. One of those common properties is phrased as "scaled per
request", which is particularly interesting in event-driven architectures.

~~~
peterevans
I understand your point; my issue is that "serverless" as a term is a
misnomer. You cannot have a serverless architecture without some servers to
support it.

The "serverless" distinction is a useful one to make, because it implies
something about the architecture as you mention—my only point is that we could
have done much better with the name we use to reference said architecture.

~~~
amarkov
It seems like a pretty good name to me. The point of the term "serverless" is
that you aren't presented with any abstraction of a server; operations like
"spin up more servers to handle the load" or "SSH to the server to figure out
what's going on" or "reboot the server because it's acting weird" don't exist
for you.

The term does get abused to refer to any server cluster with a bit of
autoscaling logic, and in that sense it is a misnomer. But I don't think
that's what it originally meant.

~~~
styfle
But didn’t we already have this concept with Platform-as-a-Service (PaaS)?

What you described sounds like Heroku, Azure Apps, etc.

~~~
amarkov
In many simple cases, Heroku is effectively serverless. But there are
situations (in particular monitoring and billing) where you're forced to think
about the individual dynos your app is running on.

------
moocowtruck
databaseless serverless productless architecture

~~~
rb808
[https://github.com/kelseyhightower/nocode](https://github.com/kelseyhightower/nocode)

------
manigandham
This split of compute and storage into distinct layers is starting to become
more common, and it has some pretty neat advantages like much better
scalability and efficiency.

Google's BigQuery and Snowflake Data are examples of data warehouses, similar
to DIY presto/drill/spark on S3. Apache Pulsar brings that to messaging and
distributed logs. It'll be interesting to see how it applies to more OLTP
database engines, although there are examples like TiDB which seem to work
well enough.

~~~
vgt
BigQuery specifically pioneered many of these concepts since it's release in
2012 - "serverless manageability", pay-per-query consumption-based pricing,
pure separation of compute and storage (no intermediate ssh mesh), and more
recently separation of compute and intermediate state (wonders for scalability
and complex query performance).

(work at G and used to work on BQ)

~~~
hueving
Separation of compute and storage has been around since before Google was even
relevant as a company (see NFS).

If you think bigquery pioneered all of these concepts then your team did a
very poor job of researching prior art. Maybe that was intentional for a good
green-field design, but it's certainly not pioneering at that point.

------
jimbokun
“This isn’t the end, management teams are expecting lower resource costs and
higher return on investment for the technology projects.”

Wow, really? Bit radical assertion there don’t you think? I thought they might
want higher resource costs and lower returns on investment.

Maybe there is some good content in there, but I found it difficult to sift
through the banalities to find out.

------
matchagaucho
It makes me nervous when cloud companies release new features to select pilot
groups.

The Aurora Serverless signup form asks no questions related to DB scale or
capacity, so we're left to assume they're accepting pilot customers based on
region or company size?

------
gregwebs
There are other databases in this category.

Snowflake DB (data warehouse, so competes with Redshift rather than Aurora).
Google's other database offerings (Cloud Data Store and Big Query)

------
stunt
DynamoDB has same properties but you will have more flexibility in Aurora
Serverless.

------
M_Bakhtiari
I still haven't been able to find someone who can explain what make these
elaborate layers of indirection running on servers in datacentres exactly
'serverless'.

~~~
manigandham
Because you (the customer) dont have to think about them. It's similar to
PaaS, or rather between PaaS and SaaS as a finer granularity model.

------
arisAlexis
isn't this like Ethereum programming on a private blockchain?

