This is super interesting stuff. First of all, I think the caching layer (which ...

sgk284 · on Aug 29, 2021

I do want to emphasize that the S3 approach has a lot of trade offs worth considering. There is something really nice about having all of your data in one place (transactions, backups, indexing, etc... all become trivial), and you lose that with the S3 approach. BUT in a lot of cases, splitting out blobs is fine. Just treat them as immutable, and write them to S3 first before committing your DB transaction to help ensure consistency.

Regarding JSON schema, if you have a Marshmallow schema or similar, yes that’s a wonderful starting point. This should map pretty closely to your DB schema (but may not be 1-to-1, as not every field in your DB will be needed in your API).

I’d suggest avoiding storing JSON at all in the DB unless you’re storing JSON that you don’t control.

For example, if the JSON you’re storing today has a nested object of GPS coords, temperature, etc.. make that an explicit table (or tables) as needed. The benefits are many: indexing the data becomes easier, the data is stored more efficiently, the table will take up less storage, the columns are validated for you, you can choose to return a subset of the data, etc… You will not regret it.