With any database architecture the goal is to ensure that nothing hits your database if you can help it.
We served 13 billion requests last year, according to fastly, and burned 3.84PB of data.
Each one of those requests will generate a bunch of metrics, depending on what's happening.
To back that we have an Aurora db.r6g.large, which we've been using for a few years now. We just redid the metrics, so the CPU is now running at about 40% (it was at like 6% for a few years). Basically instead of aggregating metrics we moved to basically a time-series sort of structure.
The reason we can do that is simple: because we have no realtime requirement, we push all the requests through fastly, which pushes the logs to s3. Fastly uploads the logs to s3 every few minutes, and there's a lambda that chews through the logs whenever they show up. Usually there are like 10-15 of them running at a given time.
So really, by removing the realtime requirement we've been able to get away with getting our metrics for free (for the most part). In any case there's only like 5-10 minutes of lag, but since our customers generally look at yesterday's numbers 10AM local time it's not important.
So really, how big of a database do you need? With aurora unlimited storage you might as well use that. Do you need the data? At these prices you might as well keep all of it. Why not? Archive everything to s3, so you can rebuild it later (this has come in handy a few times).
From the UI, you need to eliminate queries until the user asks for them. Then limit the queries as much as possible. Use indexes. Ideally you'd pre-aggregate your data if possible.
For IoT, unless you need interactive responses just use HTTP GET requests into a CDN. No mqtt, etc. Why bother? And you probably don't have to worry about intermediate network devices clipping your GET requests, since you're deploying into someone's environment (make it a requirement for their network people).
For sizes and rows, they're not that big until recently. I think the biggest table has like 20-30 million rows? We aggregated stuff so our sizes were relatively small.
We have a monolith that handles the UI, api endpoints, and some maintenance things, and a bunch of lambdas that handles all the backend. Overall it's been pretty great; the backend that does work basically scales to some huge number. The real issue with lambdas is limiting concurrency so the DB doesn't get overwhelmed, and watching the memory/timeout numbers. Oh, and making sure that when you hook them up to SQS you don't let AWS fire up like 2000 lambdas (which works great for s3 processing, btw).
We served 13 billion requests last year, according to fastly, and burned 3.84PB of data.
Each one of those requests will generate a bunch of metrics, depending on what's happening.
To back that we have an Aurora db.r6g.large, which we've been using for a few years now. We just redid the metrics, so the CPU is now running at about 40% (it was at like 6% for a few years). Basically instead of aggregating metrics we moved to basically a time-series sort of structure.
The reason we can do that is simple: because we have no realtime requirement, we push all the requests through fastly, which pushes the logs to s3. Fastly uploads the logs to s3 every few minutes, and there's a lambda that chews through the logs whenever they show up. Usually there are like 10-15 of them running at a given time.
So really, by removing the realtime requirement we've been able to get away with getting our metrics for free (for the most part). In any case there's only like 5-10 minutes of lag, but since our customers generally look at yesterday's numbers 10AM local time it's not important.
So really, how big of a database do you need? With aurora unlimited storage you might as well use that. Do you need the data? At these prices you might as well keep all of it. Why not? Archive everything to s3, so you can rebuild it later (this has come in handy a few times).
From the UI, you need to eliminate queries until the user asks for them. Then limit the queries as much as possible. Use indexes. Ideally you'd pre-aggregate your data if possible.
For IoT, unless you need interactive responses just use HTTP GET requests into a CDN. No mqtt, etc. Why bother? And you probably don't have to worry about intermediate network devices clipping your GET requests, since you're deploying into someone's environment (make it a requirement for their network people).
For sizes and rows, they're not that big until recently. I think the biggest table has like 20-30 million rows? We aggregated stuff so our sizes were relatively small.
We have a monolith that handles the UI, api endpoints, and some maintenance things, and a bunch of lambdas that handles all the backend. Overall it's been pretty great; the backend that does work basically scales to some huge number. The real issue with lambdas is limiting concurrency so the DB doesn't get overwhelmed, and watching the memory/timeout numbers. Oh, and making sure that when you hook them up to SQS you don't let AWS fire up like 2000 lambdas (which works great for s3 processing, btw).