If this is the premise for serverless database it's a weak start. If you really need lightweight DBMS for testing just run MySQL or PostgreSQL in Docker. If you really need access to production-like data (e.g., a lot of it so you get realistic distributions) run the same DBMS on cheap hardware or cheap instances. In both cases you can use persistent volumes and shut down when things are not in use. Few people really care if it takes a few minutes to spin up the test environment.
As for the main point of Aurora offering a "Serverless" architecture it looks as if what they've really done is enabled the DBMS compute layer to scale up and down quickly. I wonder if this optimization fell out of pushing redo log management down into the storage layer. (See Section 3.1 https://www.allthingsdistributed.com/files/p1041-verbitski.p... for details.)
"Serverless" DBs are also very pricey if you consider everything. Instead, it isn't actually that hard to do some work loads at huge scale for cheap, two good articles on this (one discord, one ours)
- https://www.youtube.com/watch?v=x_WqBuEA7s8 (100M records/day for $10 total cost, server, disk, S3 backup!)
Gun is new to me. You have a good way of handling distributed consistency. It's intriguing that the system still can reach consistency even if every node loses network connectivity temporarily. Is there any academic work behind this?
For non-academics though, I did a comic strip explainer for the layperson (as distributed systems are often hyped up with elitist jargon) here: http://gun.js.org/distributed/matters.html !
The prototype shown in the video was specific to append-only data and was done last year, we recently rewrote the system to be more generalizable using a radix trie structure and have released an alpha of it with the Radix Storage Engine (RSE) in the main repo: https://github.com/amark/gun
Let me know if you need help with any/all of that, or more links/resources (docs are somewhat scarce on it currently, sadly), and I'll do what I can!
Although, I can't resist but to leave this one last animated explainer gif that shows what a radix looks like: http://gun.js.org/see/radix.gif
He also seems to imply that you have to keep your test db's running all day instead of simply paying for them by the hour. :|
You can also throw a light DB (mySql, SqlExpress, Postgres) on a VM that you already have running 24/7, that would make the additional cost zero.
Of course you are still going to need to curate your dev/qa data (through replication or periodic backup/restore) because shit data is hard to code against, not to mention debug against.
Serverless is a set of financial, scaling and operational properties of an architecture. One of those common properties is phrased as "scaled per request", which is particularly interesting in event-driven architectures.
The "serverless" distinction is a useful one to make, because it implies something about the architecture as you mention—my only point is that we could have done much better with the name we use to reference said architecture.
The term does get abused to refer to any server cluster with a bit of autoscaling logic, and in that sense it is a misnomer. But I don't think that's what it originally meant.
What you described sounds like Heroku, Azure Apps, etc.
Then there is the problem of who it is 'serverless' for. If the developers use a lambda like service hosted by their own org in their datacenter (operated by separate ops people), is it still serverless? If so, then it has nothing to do with actual servers and it's all about making operating system runtime specifics transparent. If not, then it's just a marketing term for letting Amazon or Google run your company's hardware.
Google's BigQuery and Snowflake Data are examples of data warehouses, similar to DIY presto/drill/spark on S3. Apache Pulsar brings that to messaging and distributed logs. It'll be interesting to see how it applies to more OLTP database engines, although there are examples like TiDB which seem to work well enough.
(work at G and used to work on BQ)
If you think bigquery pioneered all of these concepts then your team did a very poor job of researching prior art. Maybe that was intentional for a good green-field design, but it's certainly not pioneering at that point.
Wow, really? Bit radical assertion there don’t you think? I thought they might want higher resource costs and lower returns on investment.
Maybe there is some good content in there, but I found it difficult to sift through the banalities to find out.
The Aurora Serverless signup form asks no questions related to DB scale or capacity, so we're left to assume they're accepting pilot customers based on region or company size?
Snowflake DB (data warehouse, so competes with Redshift rather than Aurora).
Google's other database offerings (Cloud Data Store and Big Query)