Hacker News new | past | comments | ask | show | jobs | submit login

This is super interesting stuff.

First of all, I think the caching layer (which we currently don’t have) is going to be a necessity in the coming weeks as we scale for an additional project (that will be relying on this architecture)

Second of all, it is just PK lookups. We don’t actually have a single fk (contractor did not set up any relations), which makes me think moving all of this replicated JSON data from fields to tables may help.

The queries that are currently causing issues are not filtering out any data but returning entire records. In ORM terms, it is Video.objects.all(), and from a URL param in our GET to the api, limiting the amount of entries returned. What’s interesting is this latency scales linearly, and at the point we ask for ~50 records we hit the maximum raw memory alloc for PG (1GB) causing the entire app to crash.

The solution you propose for s3 blob store is enormously fascinating. The one thing I’d mention is these JSON fields on the Video table have a defined schema that is replicated for each Video record (this is video/sensor metadata, including stuff like gps coords, temperature, and a lot more).

So retrieving a Video record will retrieve those JSON fields, but not just the values: the entire nested BLOB. And does so for each and every record if we are fetching >1

Would defining this schema with something like Marshmallow/JSON-Schema be a good idea when you mention JSON schemas we control? As well as explicitly migrating those JSON fields to their own tables, replaced with an FK on the Video table?




I do want to emphasize that the S3 approach has a lot of trade offs worth considering. There is something really nice about having all of your data in one place (transactions, backups, indexing, etc... all become trivial), and you lose that with the S3 approach. BUT in a lot of cases, splitting out blobs is fine. Just treat them as immutable, and write them to S3 first before committing your DB transaction to help ensure consistency.

Regarding JSON schema, if you have a Marshmallow schema or similar, yes that’s a wonderful starting point. This should map pretty closely to your DB schema (but may not be 1-to-1, as not every field in your DB will be needed in your API).

I’d suggest avoiding storing JSON at all in the DB unless you’re storing JSON that you don’t control.

For example, if the JSON you’re storing today has a nested object of GPS coords, temperature, etc.. make that an explicit table (or tables) as needed. The benefits are many: indexing the data becomes easier, the data is stored more efficiently, the table will take up less storage, the columns are validated for you, you can choose to return a subset of the data, etc… You will not regret it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: