I admire people for their inventiveness, however surely attempting to square-peg-round-hole something like this into AWS S3 is only going to end up biting them in the backside ?
S3 is delivered over a stateless protocol (http) and AWS makes no promises about any given request being fulfilled (the user is merely invited to retry). The safety of your data is then further dependent on what storage tier you chose.
There just seem to be so many footgun opportunities here. Why not just use the right tool for the job ? (i.e. hosted DB or compute).
> attempting to square-peg-round-hole something like this into AWS S3
I don't think it's like that. AWS already offers services like S3 Select[0] or Athena[1] that do something similar.
> the user is merely invited to retry
Another reason why I used s3fs instead of manually making requests.
> Why not just use the right tool for the job?
I certainly have multiple uses-cases where creating an SQLite database locally and distributing it via S3 (for read-only usage) is orders of magnitude more convenient than the alternatives. It's hard to beat the developer experience of a local SQLite database.
> Why not just use the right tool for the job ? (i.e. hosted DB or compute).
Because sometimes you want a database for 10000 rows, or you have 5 logins a month and you don't want a $30/month database running in the cloud.
There's a market out there for real "server-less" database that charges you per rows stored, is priced per read/write operations, and is reasonably priced for a 10000 rows per month, without having to calculate how many 4KB blocks you are going to read or write.
Maybe my 10000 rows are 5 tables, 2000 rows each, and I want to have some regular SQL features like joins? Can I do that without having to do joins in code? Because if I have to do joins in code, I can just store data in flat files and process everything in memory. But what if I want to have an existing app designed to run against a db system?
If I have to start writing joins by hand, what's the point of even having a database?
Dynamodb doesn't do well with complex queries. Imagine you've got 4 integer fields, any combination of which can be filtered by a range.
Using dynamoDb, you're going to have a hard time querying if field A, B, and D are included. If you add a fifth field, it's going to be a pain to add to historical data.
S3 is delivered over a stateless protocol (http) and AWS makes no promises about any given request being fulfilled (the user is merely invited to retry). The safety of your data is then further dependent on what storage tier you chose.
There just seem to be so many footgun opportunities here. Why not just use the right tool for the job ? (i.e. hosted DB or compute).