Hacker News new | past | comments | ask | show | jobs | submit login
Reddit’s database has two tables (2012) (burke.dev)
206 points by swyx on Aug 10, 2022 | hide | past | favorite | 247 comments



I worked at a startup (now fairly popular in the US) where we had tables for each thing (users, companies, etc) and a “relationship” table that described each relationship between things. There were no foreign keys, so making changes were pretty cheap. It was actually pretty ingenious (the two guys who came up with the schema went on to get paid to work on k8s).

It was super handy to simply query that table to debug things, since by merely looking for a user, you’d discover everything. If Mongo was more mature and scalable back then (2012ish), I wonder if we would have used it.


This is quite similar to the RDF triples model. Everything is a 'thing' including 'relationships' between things. So you just need a thing table and a relation table.

The issue with this is schema management and rules gets pushed to the application layer. You also need to deal with very massive tables (in terms of # of rows) for the relationships table which leads to potential performance issues.


let's call them "nodes" and "edges"


Right? What a groundbreaking idea! /s


other downsides I saw from an implementation of this in postgres:

1.you can't have strongly typed schemas specific to a 'thing' subtype; to store it all together, you end up sticking all the interesting bits in a JSON field

2. any queries of interest require a lot of self joins and performance craters

Please, please, never implement a "graph database" or "party model" or "entity schema" in a normal RDBMS, except perhaps at miniscule scale; the flexibility is absolutely not worth it.


That's why you have duplicated properties on things you went to query on.


That's why in RDF you should really be using a well-modelled ontology.

As for performance issues, using a dedicated Quad store helps there, rather than one built on top of an RDBMS (OpenLink Virtuoso excluded, because somehow they manage to make Quads on top of a distributed SQL engine blazing fast).

It's been about 6 years since I've been in that world, so things might have changed wrt performance.


Sounds like a 'star schema', just without foreign key constraints on the relationship ('fact') table.

I'm not really sure I understand why not having that constraint is helpful - what sort of 'changes' are significantly cheaper? Schema changes on the thing ('dimension') tables? Or do you mean cheaper in the 'reasoning about it' sense, you can delete a thing without having to worry about whether it should be nullified, deleted, or what in the relationship table? (And sure something starts to 404 or whatever, but in this application that's fine.)


> you can delete a thing without having to worry about whether it should be nullified, deleted, or what in the relationship table?

Pretty much this. You could just delete all the relationships, or just some of them. For example, a user could be related to many things (appointments, multiple vendors, customers, etc). Thus a customer can just delete the relationship between them and a user, but the user still keeps their other relationships with the customer (such as appointments, allowing them to see their appointment history, even though the customer no longer has access to the user's information).


wouldn't you be able to do the same even with FK in place? what FK would prevent you from doing would be deleting a user before all his relationships are removed. That should prevent you to ending up with inconsistent data inside the tables.


In my experience with regular Postgres databases shared across many teams, inconsistent/broken references is seen as a much less risky thing to do than, say, cascading a delete. Usually a broken reference is of no consequence. And you don’t use FKs, people tend to bake that into how they code things without much trouble.


I've oft wished for a system where FKs could be used to automatically mark rows as invalid instead of deleting them, and you could inspect the nature of the broken reference as a property of the row/object. Then broken references can be cleaned up when appropriate, possibly never depending on the application.

This gives the neat property of being able to do data entry in parallel, too - different people may insert records into two tables linked by a FK relationship, it's nice for this to happen in either order. It also allows the user to query for exactly what rows will be affected by a cascaded delete, or to perform the operation in steps.


Wouldn't you effectively get those features by just not using FKs? You'd get your broken-reference marker by `where foreign_id not in (select id from foreign_table)`. You could create that as a view or materialized view.


It would be nice to have the schema metadata, at the very least, to be able to analyze a given tables reverse dependencies without resorting to naming conventions.


> tables for each thing (users, companies, etc) and a “relationship” table that described each relationship between things

Sounds like objects and associations in the Tao model:

https://engineering.fb.com/2013/06/25/core-data/tao-the-powe...


So basically one table for each type of node, plus one table for all edges. Neat!

How big was the relationships table? How was performance?


It was pretty massive. I’m certain that it is probably sharded by now. Database performance wasn’t our bottleneck, we had np-hard problems all over the place which was our primary performance issues. For most customers, they never really had issues, but our larger customers def did. Those problems were amazingly fun to solve.


How do you shard a thing like this? (Apart from “with great difficulty.”) :)


Does this question imply sharding would be more difficult with this sort of table design, compared with other table designs?

Obviously sharding is difficult, but what would make it any more difficult with this table design?

Picking the sharding keys, avoiding hotspots, rebalancing, failover, joining disparate tables on different shards, etc... these are all complicated issues with far reaching implications, but I just don't understand how this table design would complicate sharding any more than another.

I'm not trying to sound condescending and suggest it's a dumb question... genuinely interested.


This would have happened after I left, so I'm just speculating based on experience.

1. You create a function in your application that maps a customer id to a database server. There is only one at first.

2. You setup replication to another server.

3. You update your application's mapping function to map about half of your ids to the new replica. Use something like md5(customer id) % 2. Using md5 or something else that has a good random distribution is important (sha-x DOES NOT have a good random distribution).

4. You application now will write to the replica, turn off replication, and delete the ids from each server that no longer map to that server.

Adding a new server follows approximately the same lines, except you need to rebalance your keys which can be tricky.


So I'm trying to learn about system design without having ever encountered large problems from any of my projects. What I'm reading keeps pushing for hash rings for these types of distribution problems.

Someone who does this stuff for real, do orgs actually just do a hash % serverCount? That's what a startup I worked at did 8 years ago. I thought nobody actually did this anymore though given the benefits of hash rings?


It depends on control and automation. If you can easily migrate things between servers, a hash ring makes more sense. If you are a scrappy startup who has too much traffic/data, you are doing everything manually, including rebalancing and babysitting servers.


Can you please clarify why md5 has better random distribution that sha?


It doesn't. I'm curious to hear how withinboredom arrived at that conclusion.


I arrived at this empirically doing some tests at work. In general, Sha and md5 BOTH present randomly distributed numbers. However, most IDs (and the assumption made here) are numeric. For the first billion or so numbers, sha doesn't create a random distribution, while md5 does.

Looking back, I don't remember my experimental setup to verify a random distribution and I recommend checking for your own setup. But that was the conclusion we arrived at.


Here's the fraction of 1s for each bit position output by MD5 and SHA-256 using the first million integers as inputs: https://pastebin.com/aVR3u8Vv


I quickly threw a few into excel with a pareto graph to show distribution. Granted, it is only the first 10,000 integers % 1000: https://imgur.com/a/HypGWP1

To my untrained eye, md5 and sha1 (and not shown, sha256) look remarkably similar to random. If you want truly even distributions, then going with a nieve "X % Y" approach looks better.


If you do % 1000 there'll be a bit of bias from the modulus, but not anything meaningful for the use case, and it affects MD5 and SHA both.

No argument that for simple incrementing integers a raw modulus is better, though. Even if hashing is needed, a cryptographic hash function probably isn't (depending on threat model, e.g. whether user inputs are involved).


MD5 and ShaX are available in almost every language. Things that are probably better (like Murmur3) might not be, or may even have different hashes for the same thing (C# vs JS with Murmur3, for example). I had to reimplement Murmur3 in C# to get the hashes to match from JS, it boiled down to the C# version using longs and eventually casting to an int while JS can’t or didn’t do that final cast.


I wasn't thinking of anything exotic, just the built-in functions used for hash tables and random number generators (for integers and other inputs that can be interpreted as seeds, a random number generator is also a hash function, and of course in your case the hash function was a random number generator all along anyway!).


soooo basically there is no bias


Looks like it. Though I think md5 is faster, which is maybe what I’m remembering, now that I think about it. This was years ago, funny how memory gets tainted.


I think that's probably true of lots of masters of crafts, sometimes you have a gut feeling and don't even remember the formative moments that created the conclusion haha


If you worry about random distribution of the bits wouldn't it be better to do (H(customer_id) % big_prime) % 2? so that the "% 2" does not just read the last bit


A finite field would ALMOST work here. However, you're making the assumption that the user ids using your application are random. If you have an A/B test running using that same mapping, you'll end up with a server hotter than the other if one arm of the experiment is widely successful.


In practice I would have added a salt, but I did not consider that each independent sharding needs to have its own salt.


Thanks for that answer, it seems quite a specialist job. So many moving parts. Why would you need random distribution and not just customers 1 to 100k for example?


> Why would you need random distribution and not just customers 1 to 100k for example?

To keep the databases from having hotspots. For example, if your average customer churns after 6 months, you'll have your "old customer" database mostly just idling, sucking up $$ that you can't downgrade to a smaller machine because there are still customers on it. Meanwhile, your "new customer" database is on fire, swamped with requests. By using a random distribution, you spread the load out.

If you get fancy with some tooling, you can migrate customers between databases, and have a pool of small databases for "dead customers" and a more powerful pool for "live customers" or even have geographical distribution and use the datacenter closest to them. But you really do need some fancy tooling to pull that off, but it is doable.


> Meanwhile, your "new customer" database is on fire, swamped with requests. By using a random distribution, you spread the load out.

Great answer.


This depends on access patterns to your data.


I was wondering the same thing. But at the same time, it seems like having it all centralized in one place would make it really easy to write tooling to aid in such an endeavor.

For example, suppose it was decided that sharding would be by account ID. It would probably be very achievable to have some monitoring for all newly created relationships to check that they have an account ID specified, and break that down by relationship type (and therefor attributable to a team). Dashboards and metrics would be trivial to create and track progress, which would be essential for any large scale engineering effort.


Yeah it does seem you need to think ahead of what the future needs would be.


One neat thing about Mongo is the ObjectID's:

We didn't know it at the time, but you can get the date of creation from them. Which means... If you sort by ObjectID, you're actually sorting by creation time. You can create a generic ObjectID using JavaScript and filter by time ranges. I still have a codepen proof of concept I did for doing ranges.

Anyway, the other thing is, and I'm not sure how many people use it this way: you can copy an ObjectID verbatim into other documents, like a foreign key, so you can reference between collections. If you do this, you'll save yourself a lot of headaches. Just because you can shove anything and everything into MongoDB documents, doesn't mean you absolutely should.


But why? Ok, no foreign key checking, so it’s fast, great. Why not include the foreign ID in the local table? What’s the purpose behind a global “relationships” table?


Guessing:

* Most relations are properly modeled as N:N anyways.

* Corollary: no "aargh, need to upgrade this relation to N:N" regrets.

* Corollary: no 1000 relationship tables to name & chase around the DB metadata.

* Low friction to add relations on-the-fly when developing code.

* Low friction to query all relations associated with an object.

* Low friction to update a relation, but keep the ex around.

Neat approach IMHO.


>Most relations are properly modeled as N:N anyways

I'm not sure that's true. Containment is a very common type of relationship.

>Corollary: no 1000 relationship tables to name & chase around the DB metadata.

That depends on whether relationships have associated attributes. If and when they do, you have to turn them into object types and create a table for them. And it's exactly the N:M relationships that tend to have associated information.

That's the conceptual problem with this approach in my view. Quite often, there isn't really a conceptual difference between relationship types and object/thing types. So you end up with a model that allows you to treat some relationships as data but not others.

>Neat approach IMHO.

It's a valid compromise I guess, depending heavily on the specifics of your schema. Ultimately, database optimisation is a very dirty business if you can't find a way to completely separate logical from physical aspects, which turns out to be a very hard problem to solve.


Great points! Indeed, hard to tell without knowing a bit more about their schema specifics.


* low friction till things go wrong and you're not even aware they went wrong because the relational mistake wasn't constrained

* aaargh we have trillion row tables that contain everything, how do we scale this (you don't)

* aaargh our tables indices don't fit into memory, what do we do (you cry)

* no at a glance metrics like table sizes to have an idea of which relations are growing

* oh no we have to have complex app code to re-implement for relationships everything a database gave for free

Reddit used a similar system in Posgresql and hit all of these issues before moving to Cassandra. Even then they still had to deal with constraints of the original model.

Edit: Didn't even read the article being about Reddit, it pretty much confirms what I said. It took them something like 2 years iirc to migrate to Cassandra and the site was wholely unuseable during that time. There is no doubt in my mind if another Reddit style site existed during that time it would have eaten their lunch same as Reddit ate Diggs lunch (for non technical reasons).

Furthermore, back then iirc they only really had something like 4 tables/relations: comment, post, user, vote. The 'faster development time' adding new relations was completely moot, they spent more time developing and dealing with a complex db query wrapper and performance issues (caching in memcache all over the place) than actual features.

The (app+query) code and performance for their nested comments was absolutely hilarious. Something a well designed table would have given then for free in one recursive CTE.

It wasn't until they moved to Cassandra that they were able to let the db do that job and work on adding the shit ton of features that exist today.


> low friction till things go wrong and you're not even aware they went wrong because the relational mistake wasn't constrained

These are people-problems, and usually a dev made this mistake exactly once after being hired. Writing robust code isn't hard if the culture supports it.

> we have trillion row tables that contain everything, how do we scale this (you don't)

You do, by sharding the table.

> no at a glance metrics like table sizes to have an idea of which relations are growing

select count(*) where relation = 'whatever'; is pretty simple.

> we have to have complex app code to re-implement for relationships everything a database gave for free

These queries are no different than standard many-to-many type queries. In all, the logic is no different than anything else. You also still have to do the same number of inserts that you have to do for many-to-many, but now the order isn't important. You can even do a quick consistency check before committing the transaction at the framework level.


> These are people-problems, and usually a dev made this mistake exactly once after being hired.

Classifying bugs as "people problems" strikes me as an incredibly hostile work environment.

What happens to the poor developer who makes a particular mistake twice? They get fired?


Have you ever accidentally deleted a production table before? If so, I doubt you’d do it again if you are able to learn from your mistakes. No hostilities required. Hire good people who are smart and capable of learning.


Deleting a production DB is not the only way to mess up your data.

Smart people design systems that alert them when they're about to commit an error because they have 500 other priorities on their mind.


Yeah, we had that.

Even where I work now (that has almost no table constraints at all because there is no way to support them at our scale), we have tooling that prevents you from making dumb mistakes and alerting + backups if you manage to do it anyway. Having processes in place to deal with these kinds of issues, and tooling is. That's what I meant by "people problem." No constraint will keep a determined person from deleting the wrong thing (in fact, it might be worse if it cascades).

We had someone accidentally fight the "query nanny" once, thinking they were deleting a table in their dev environment... nope, they deleted a production table with hundreds of millions of rows. It took hours to complete the deletion, while all devs worked to delete the feature so we could restore from backup and revert our hacked out code. They still work there.

My point is, these are all "people problems" and no technology will save you from them. They might mitigate the issue, or slow people down to get them to realize what they are about to do ... but that is all you can really hope for. A constraint isn't necessarily required to provide that mitigation, but it helps when you can use them.


Thanks for clarifying, I had misinterpreted your statement.

I can see how at huge scales, DB constraints can be impossible to implement. But they do come (almost) for free in terms of development effort, whereas what you describe probably took a long time to implement.


"made a big mistake twice" does not correlate with "bad person who is not smart or capable of learning"—hope you're not in management


I never said it did. Just that people tend to learn from their mistakes. Are there people who will make the same mistake different ways over and over again? Yep (I’m one of them). It’s not an issue, I learn and get better at spotting the mistake when I review other’s code and others point it out when reviewing my code. That’s why I said “exactly once” since once you know what it looks like, you can spot it in a review.


It wasn't until they moved to Cassandra that they were able to let the db do that job and work on adding the shit ton of features that exist today.

But cassandra is still a nosql and doesn't give the constraints of something like pg.


Cant help but read this and wonder if it was an strict architectural decision (I think this is what you imply?), or a "move fast and fix it later" side effect that hung around. To me it seems like your discarding a lot of safety but perhaps that's overstated in my head.


The relationships were enforced in-code, so the safety existed there instead of by the database. Even where I work now with sharding and such, we don’t have any foreign key constraints yet serve billions of requests. Code can provide just as much safety as the database, you just have to be smart enough to not try and override it.


> Code can provide just as much safety as the database

No, database constraints declaratively specify invariants that have to be valid for the entirety of your data, no matter when it was entered.

Checks and constraints in code only affect data that is currently being modified while that version of the code is live. Unless you run regular "data cleanup" routines on all your data, your code will not be able to fix existing data or even tell you that it needs to be fixed. If you deployed a bug to production that caused data corruption, you deploying a fix for it will not undo the effects of the bug on already created data.

Now, a new database constraint will also not fix previously invalid data, but it will alert you to the problem and allow you to fix the data, after which it will then be able to confirm to you that the data is in correct shape and will be continue to be so in the future, even if you accidentally deploy a bug wherein the code forgets to set a reference properly.

I'm fine with people claiming that, for their specific use case or situation, database constraints are not worth it (e.g. for scalability reasons), but it seems disingenuous to me to claim that code can provide "just as much safety".


> Code can provide just as much safety as the database, you just have to be smart enough to not try and override it.

This statement is absolutely false. Some application code that random developers write will never be a match for constraints in a mature, well tested database.


> Some application code that random developers write will never be a match for constraints in a mature, well tested database.

Well, lucky for you, most places don't hire random developers, only qualified ones. /s

Where I work has billions of database tables sharded all over the place. There simply isn't a way to define a constraint except via code. We seem to get along just fine with code reviews and documentation.


I’ve never seen a database with just one writer. I wouldn’t even bet on all the writers using the same language, much less reusing a single piece of code implementing constraints that never have changed and never will.


Can you tell me what the performance impact is with code logic and extra dB calls versus built-in constraint logic? Because I am fairly convinced that most modern DBs have almost zero impact from relationship constraints. Those are heavily optimised. I am curious to your study and want to know more.


The problem with sharding is that you can't join, as most databases don't support "distributed joins" so relationship constraints don't even make sense. Thus it has to be done in code and often requires multiple queries to various servers to get the full picture. Most of this can be optimized by denormalizing as much as possible for common queries and leaving normalized records for less common queries.

There's a performance impact, for sure, but there's really not much that can be done without switching to some database technology that does sharding for you and can handle distributed joins (rethinkdb comes to mind, but I'm not sure if that is maintained any more). But then you have to rewrite all your queries and logic to handle this new technology and that probably isn't worth it. Even if you were to use something like cockroachdb and postgres, cockroachdb doesn't support some features of postgres (like some triggers and other things), so you'll probably still have to rewrite some things.


> I am fairly convinced that most modern DBs have almost zero impact from relationship constraints

It cannot fundamentally be zero impact, the database needs to guarantee that the referred record actually exists at the time the transaction commits. Postgres takes a lock on all referenced foreign keys when inserting/updating rows, this is definitely not completely free.

Found this out when a plain insert statement caused a transaction deadlock under production load, fun times.


> There were no foreign keys

Then how did the relationship table refer to things in the thing tables?


I bet they meant no foreign key constraints.


OK, but that still doesn't make any sense because:

> we had tables for each thing

I take that to mean "a separate table for each class of thing." So the single "relationship" table had to somehow refer not just to rows in tables but to the tables themselves. AFAIK that is not possible to do directly in SQL. You would have to query the relationship table, extract a table name from the result, and then make a separate query to get the relevant row from that table. And you would have to make that separate query for every relationship. That would be horribly inefficient.


I don't see why you would have to include the table name. I presume the calling code knows what they are looking for.

For example, say we have a table of people and a table of books and a relationship table books-owned-by. If I want to see what's in your library I can select you in the relationship table and join with the books table to get a record set describing your library. Or is this not what the OP was describing?


The OP said, "we had tables for each thing (users, companies, etc) and a “relationship” table that described each relationship between things". I took that to mean that there was only one relationship table.


Oh I see. Hmm... I wonder what OP meant?


Yep, so a table for each thing with their own schemas. A user table, a company table, etc. Then a single relationship that defines all the relationships between all the objects in a giant many-to-many relationship.

So to query a full relationship, you'd do a join from A -> relationship -> B, probably an inner join. But most of the time, it was usually just a singular join (to get a list of things to show the user, for example) since you'd already be querying in the context of some part of the relationship.


Ah, that makes sense. So something like:

    select whatever from thingy, relation, widget
    where thingy.id = relation.id1
    and widget.id = relation.id2
    and relation.type1 = 'thingy'
    and relation.type2 = 'widget'


I'm a bit confused about how that relationship table would look. Is anybody kind enough to show a concrete example?


This type of setup is semi-common and I've seen it called "polymorphic relationships". The PHP framework Laravel has first-party support for them if you want a real example, although those are generally only polymorphic for one side of the relationship (i.e. building a CRM and having a single "comments" table with the ability to relate Customers, Opportunity, Leads, etc. to each comment without having a unique column for each type).

Generally, the relationship table will look like this:

  - relation1_id
  - relation1_type
  - relation2_id
  - relation2_type
Where *_id is the database ID and *_type is some kind of string constant representing another table. Generally you put an index on "relation#_id, relation#_type", but that's obviously use case dependent


Like other have said, you can store the table types with the ids.

But here's the neat part. That's often only for safety.

If I want to find all users and their orders, I can join the user table to the relationship table, then join that to the order table. Something like:

SELECT <COLUMNS GO HERE> FROM USER INNER JOIN RELATIONSHIP ON USER.Id IN (RELATIONSHIP.LeftId, RELATIONSHIP.RightId) INNER JOIN ORDER ON ORDER.Id IN (RELATIONSHIP.LeftId, RELATIONSHIP.RightId)

So you're getting the Users. The filtering the ones that have relationships, then filtering that to the ones that also have orders.

If you've ever managed an N:N relationship table, it's like that, but more generic.


IIRC, we had a nullable column for each ID type. So a column for user, company, appointment, etc. then we had an enumerated column for the relation (owner, patient, vendor, etc).


Sounds like the classic entity–relationship model

I am assisting a professor with his intro to databases class this semester and that was the first thing taught there

https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_mo...


How far is that from rdf triplets and similar ideas ? Just curious


I had never heard of this before! But yeah, you could use the relationship table to describe it as a RDF triplet. For example, relationships could be OWNER, or PATIENT, or something and could be seen as a predicate. So USER OWNER of COMPANY, or USER PATIENT of COMPANY, or RECORD BELONGS to COMPANY, or COMPANY VENDOR of COMPANY.


I dont understand this. Wouldn't the relationships rows need to use foreign keys to know which things they apply to?


what's the advantage (besides debugging) to have all relationships in one table, rather than having a separate table for each relationship? seems like it would be strictly worse


Funny. I’ve done exactly this elsewhere.


They were using it like a NoSQL database back in 2010, just one year after MongoDB started. So there were no options.

In 2022, there are so many more mature, reliable, battle-tested NoSQL options that could have solved their problem more elegantly.

But even today, if I were to build Reddit as a startup, I'd start with a Postgres database and go as far as I can. Postgres allows me to use relations when I want to enforce data integrity and JSONB when I just want a key/value store.


> But even today, if I were to build Reddit as a startup, I'd start with a Postgres database and go as far as I can. Postgres allows me to use relations when I want to enforce data integrity and JSONB when I just want a key/value store.

This is accepted wisdom on HN but I don't see it:

- True HA is very difficult. Even just automatic failover takes a lot of manual sysadmin effort

- You can't autoscale, and while scaling is a nice problem to have, having to not just fundamentally rearchitect all your code when you need to shard, but also probably migrate all your data, is probably the worst version of it.

- Even if you're not using transactions, you still pay the price for them. For example your writes won't commit until all your secondary indices have been updated.

- If you're actually using JSONB then driver/mapper/etc. support is very limited. E.g. it's still quite fiddly to have an object where one field is a set of strings / enum values and have a standard ORM store that in a column rather than creating a join table.


If I were to start building Reddit from scratch without knowing if it's going to be successful or not, I'd rather iterate quick on features rather trying to spend time scaling something that will most likely fail before reaching that kind of scale.

>True HA is very difficult. Even just automatic failover takes a lot of manual sysadmin effort

Nearly all Postgres cloud providers such as RDS, CloudSQL, Upcloud, provide multi-region nodes and automatic failover.

Also, true HA databases are very expensive and have their own drawbacks. Not really designed for a Reddit-style startup.

>You can't autoscale, and while scaling is a nice problem to have, having to not just fundamentally rearchitect all your code when you need to shard, but also probably migrate all your data, is probably the worst version of it.

Hence why I said "as far as I can". I can keep upgrading the instance, creating material views, caching responses, etc.

Lastly, the most important reason is because I know Postgres and I can get started today. I don't want to learn a highly scalable, new type of database that I might need 5 years down the road.


> Nearly all Postgres cloud providers such as RDS, CloudSQL, Upcloud, provide multi-region nodes and automatic failover.

From memory you have to specify a maintenance window and failover is not instant. And relying on your cloud provider comes with its own drawbacks.

> Also, true HA databases are very expensive and have their own drawbacks. Not really designed for a Reddit-style startup.

Plenty of free HA datastores out there, and if we're assuming it'll be managed by the cloud provider anyway then that's a wash.

> Lastly, the most important reason is because I know Postgres and I can get started today. I don't want to learn a highly scalable, new type of database that I might need 5 years down the road.

That's a good argument for using what you know; by the same token I'd use Cassandra. Postgres is a very complex system with lots of legacy and various edge cases.


Does a startup in Reddit's niche demand HA?


I would think that if you want to create a habit where people check you frequently, avoiding downtime would be important. But IDK.


For something like reddit, who cares about consistency ? Under massive load no user cares about knowing the exact number of votes, all the replies to a comment, etc... It's also perfectly fine to refuse some writes if the load is too high to bear.


Reddit has gone through many periods of regular unplanned downtime as they hit various scale issues over the years. It's certainly something to avoid, but not necessarily something that's catastrophic - like most decisions there are tradeoffs.


you probably would have some CDN front of it so some database unavailability would not impact to browsing that much.


On a side note, I find it a bit ironic how NoSQL was all the rage back in the last decade or so but in 2022, NoSQL DBs are racing to add SQL querying back to key value stores.

It turns out that SQL is crucial to non-app-developers such as business analysts, data scientists. Trying to setup an ETL to pull data from your MongoDB datastore to a Postgres DB so the analysts could generate reports is such a waste of time and resources. Or worse, the analysts have to request devs to write Javascript code to query the data they need. For this reason alone, I will always start with a DB that supports SQL out of the box.


People didn't leave RDBMS for NoSQL because they wanted to get rid of RDBMS features, they did it because they couldn't scale their applications on single instances anymore. So they had to give up transactions and foreign keys and schemas, etc, because those concepts just didn't exist across instances. Not because they wanted to, but because they had to.

The entire database world has been working ever since trying to get back those RDBMS features into their horizontally scalable solutions, it's just taken a while because the problems are exponentially harder in a network. It's not like people are using eventual consistency and a loss of referential integrity because they prefer it.


>People didn't leave RDBMS for NoSQL because they wanted to get rid of RDBMS features, they did it because they couldn't scale their applications on single instances anymore.

I agree that this should have been the reason for choosing something like MongoDB over MySQL/Postgres.

In reality, I saw a lot about the companies switching/starting with because so their devs didn't have to learn SQL and they can just stay in Javascript/whatever language they were using.


> In reality, I saw a lot about the companies switching/starting with because so their devs didn't have to learn SQL and they can just stay in Javascript/whatever language they were using.

Or because they wanted their data to be schemaless.

Now, a valid reason for wanting data to be schemaless is that you store free form documents, generic cache entries (Redis) or other such things where it really makes no sense to specify the shape of the document/entry, because at any point in time, different entries will have different shapes. Another can be scalability concerns or other reasons for why you consciously choose to have your data fully denormalised.

But invalid reasons, IMHO, include:

- "It's faster." It is, until you get data corruption and have to manually fix data on prod.

- "We don't know the schema yet." That's not "schemaless", because your application will still assume a schema, the schema is now just implicit. If you want to iterate on your schema, you can easily do so using DB migrations; this is actually easier because the DB will help you verify that you migrated your data properly when you changed its schema, as opposed to changing the schema, but still keeping data around that is invalid under the new schema.

- "Not all of our data has a fixed schema." This is now easily solvable in e.g. Postgres with JSONB columns, so you only need to fully specify the data that does have a clearly defined shape.


> Or because they wanted their data to be schemaless.

I once led a team creating a public-facing site based on its own custom forms/wizards. We had half a dozen internal departments wanting dozens of wildly varying forms defined, and decided we didn't want fixed schemas as future field requirements were unknown.

Rather than the corporate C#/SQL Server we went for Node/MongoDB, and we (successfully) ramped up a new system that transformed the power and cost-base of our public sector employer.

We were well chuffed, and it was deemed revolutionary (it also did cloud integration whilst pulling submissions, parsing/verifying, communicating, managing, and pushing into legacy IT systems, so not hyperbole).

And yet, despite the outcome, privately the team regretted the choice from about half-way through. I think for me the clue was when we introduced Mongoose to ... enforce a schema onto our schema-less Mongo collections and help out with the relational querying.

(Edit: I'm not saying schema-less is bad, I'm saying you need to be clear about when it's good.)


Yeah, that's what I was meaning to say. There's a difference between "the schema evolves" and "there is no schema (not even implicitly)". And it can also be "parts of the schema are fixed, others are not".

> I think for me the clue was when we introduced Mongoose to ... enforce a schema onto our schema-less Mongo collections and help out with the relational querying.

I agree that when you're doing things like that you're not really taking advantage of "schemaless", and that's probably a sign that that's not what you really want.


100%. You build something cool and want to get investment or understand how your marketing efforts are going so need to run some analytics. As soon as even the most basic queries require custom code it's obvious you've wasted a load of dev and analyst time.


We seem to be happy using Hevo + BigQuery + Metabase for analytics need. Hevo can pull from many sources.


> So there were no options.

Hbase was a pretty solid NoSQL option by 2010, no?


These days, for a new Web2 type startup, I would use SQLite.

Because it requires no setup and has been used to scale typcial Web2 applications to millions in revenue on a single cheap VM.

Another option worth taking a look at is to use no DB at all. Just writing to files is often the faster way to get going. The filesystem offers so much functionality out of the box. And the ecosystem of tools is marvelous. As far as I know, HN uses no DB and just writes all data to plain text files.

It would actually be pretty interesting to have a look at how HN stores the data:

https://news.ycombinator.com/item?id=32408322


I sometimes see recommendations like this and I just don’t get it. It’s not like setting up a PosgreSQL databases is hard at all? And using files? That just sounds like a horrible developers experience and problems waiting to happen. Databases gives you so much for free: ACID, transactions, easy DSL for queries, structure to your data, etc. On top of that every dev knows some SQL.

What am I missing?


So I am also "really people, use PostgreSQL", because you have way less tricks you have to play to get it working compared to SQLite, in serious envs. However, some challenges with Postgres, especially if you have less strict data integrity requirements (reddit-y stuff, for example):

- Postgres backups are not trivial. They aren't hard, but well SQLite is just that one file

- Postgres isn't really built for many-cardinal-DB setups (talking on order of 100s of DBs). What does this mean in practice? If you are setting up a multi-tenant system, you're going to quickly realize that you're paying a decent cost because your data is laid out by insertion order. Combine this with MVCC meaning that no amount of indices will give you a nice COUNT, and suddenly your 1% largest tenants will cause perf issues across the board.

SQLite, meanwhile, one file per customer is not a crazy idea. You'll have to do tricks to "map/reduce" for cross-tenant stuff, of course. But your sharding story is much nicer.

- PSQL upgrades are non-trivial if you don't want downtime. There's logical upgrades, but you gotta be real fucking confident (did you remember to copy over your counters? No? Enjoy your downtime on switchover).

That being said, I think a lot of people see SQLite as this magical thing because of people like fly posting about it, without really grokking that the SQLite stuff that gets posted is either "you actually only have one writer" or "you will report successful writes that won't successfully commit later on". The fundamentals of databases don't stop existing with a bunch of SQLite middleware!

But SQLite at least matches the conceptual model of "I run a program on a file", which is much easier to grok than the client-server based stuff in general. But PSQL has way more straightforward conflict management in general


SQLite backups aren't quite that easy.

Grabbing a copy of the file won't necessarily work: you need at atomic snapshot. You can create those using the backup API or by using VACUUM INTO, but in both cases you'll need enough spare disk space to create a fresh copy of your database.

I'm having great results from Litestream streaming to S3, so that's definitely a good option here.


That is a very good point! Obviously a lot of stuff is easier if you just allow yourself downtime, but at one point the file gets big enough to where this matters.


Just use zfs snapshot to backup. Let go of the old ways of dump commands.


"SQLite is just that one file"

How do you get enough time to back up that file when it's in use? Lock the file until you've copied it? zfs?


They actually address this problem.

https://www.sqlite.org/backup.html

> The Online Backup API was created to address these concerns. The online backup API allows the contents of one database to be copied into another database file, replacing any original contents of the target database. The copy operation may be done incrementally, in which case the source database does not need to be locked for the duration of the copy, only for the brief periods of time when it is actually being read from. This allows other database users to continue without excessive delays while a backup of an online database is made.


From the user perspective, then, a SQLite backup doesn't seem materially less complicated than a postgres backup. I understood GP's comment to mean that it's easier to back up SQLite because it's just a file [which you can just copy].


It is just a file that you can copy when nobody is modifying data. In most simple cases you can just shut down the webserver for a minute to do database maintenance. Things that you already know how to do them. You don't have to remember any specific database commands for that. So the real difference is in the entry barrier.


I'm quite confident that if you can snapshot the vm/filesystem (eg VMware snapshot, or zfs snapshot) - you will have a working postgresql backup too...


BTRFS or ZFS sounds just fine to me, you could even create a small virtual disk for it on a host FS (less integrity, and one oight to snapshot the whole system anyways, but hypothetically you could do this just for atomic snapshots as long the rest of the system is cattle and those snapshots are properly backed up / replicated)


BTRFS recommends disabling CoW for DB files as it destroys performance.


I recommend that all developers should at least once try to write a web app that uses plain files for data storage. I’ve done it, and I very quickly realized why databases exist, and not to take them for granted.


Exactly. I'd written my now popular web app (most popular Turkish social platform to date) as a Delphi CGI using text files as data store in 99 because I wanted to get it running ASAP, and I thought BDE would be quite cumbersome to get running on a remote hosting service. (Its original source that uses text files is at https://github.com/ssg/sozluk-cgi)

I immediately started to have problems as expected, and later migrated to an Access DB. It didn't support concurrent writes, but it was an improvement beyond comprehension. Even "orders of magnitude" doesn't cut it because you get many more luxuries than performance like data-type consistency, ACID compliance, relational integrity, SQL and whatnot.


What made my implementation disastrous turned out to be that there was no enforcement of string encodings, and users started to see all kinds of garbled text.


One of my best decisions was to start out ASCII-only. It helped me manage that mess longer. :)


> write a web app that uses plain files for data storage ... very quickly realized why databases exist, and not to take them for granted

You haven't lived until you've built an in-memory database with background threads persisting to JSON files. Oh, the nightmares.


> What am I missing?

Different requirements, expertise, and priorities. I don't like generic advice too much because there are so many different situations and priorities, but I've used files, SQLite and PostgreSQL according to the priorities:

- One time we needed to store some internal data from an application to compare certain things between batch launches. Yes, we could have translated form the application code to a database, but it was far easier to just dump the structure to a file in JSON, with the file uniquely named for each batch type. We didn't really need transactions, or ACID, or anything like that, and it worked well enough and, importantly, fast enough.

- Another time we had a library that provided some primitives for management of an application, and needed a database to manage instance information. We went for SQLite there, as it was far easier to setup, debug and develop for. Also, far easier to deploy, because for SQLite it's just "create a file when installing this library", while for PostgreSQL it would be far more complicated.

- Another situation, we needed a database to store flow data at high speed, and which needed to be queried by several monitoring/dashboarding tools. There the choice was PostgreSQL, because anything else wouldn't scale.

In other words, it really depends on the situation. Saying "use PostgreSQL always" is not going to really solve anything, you need to know what to apply to each situation.


Thanks for the reply. All the situations that you describe sounds very reasonable. I definitely wasn't trying to say "use PostgreSQL always", I too have used files and SQLite in various siturations. For instance, your second example is IMO a canonical example of where SQLite is the right tool and where you make use of its strengths. My comment was more directed at the siturations where PostgreSQL seems to be the right tool for the job, but where people still recommend other things.


>In other words, it really depends on the situation. Saying "use PostgreSQL always" is not going to really solve anything, you need to know what to apply to each situation.

We're responding to this below. The requirement is a Web2 app with "millions in revenue".

>Because it requires no setup and has been used to scale typcial Web2 applications to millions in revenue on a single cheap VM.


> The requirement is a Web2 app with "millions in revenue".

Millions in revenue doesn't really say anything about the performance required.


>Millions in revenue doesn't really say anything about the performance required.

The assumption is that this is a Web2 company which usually means user-generated content. I assumed that there'd be high reads, decent amount of writes.

We're not trying to write a detailed spec here. Just going off a few details and make assumptions.


It's "kick-the-can-down-the-road" engineering. If the server can't hold the database in RAM anymore, they buy more RAM. If the disks are too slow, they buy faster/more disks. If the one server goes down, the site is just down for an hour or two (or a day or two) while they build a new server. It works until it doesn't, and most people are fine with things eventually not working.

This has only been practical within the last 6 years or so. Big servers were just too expensive. We started using cheap commodity low-resource PCs, which are cheaper to buy/scale/replace, but limits what your apps can do. Bare metal went down hard and there wasn't an API to quickly stand up a replacement. NAS/SAN was expensive. Network bandwidth was limited.

The cloud has made everything cheaper and easier and faster and bigger, changing what designs are feasible. You can spin up a new instance with a 6GB database in memory in a virtual machine with network-attached storage and a gigabit pipe for like $20/month. That's crazy.


It's easy to install k8s curl a helm chart too, doesn't mean you should.

What you're missing is complexity and second order effects.

Making decisions about databases early results in needing to make a lot of secondary decisions about all sorts of things far earlier than you would if you just left it all in sqlite on one VM.

It's not for everyone. People have varying levels of experience and needs. I'll almost always setup a MySQL db but I'd be lying if I said it never resulted in navel gazing and premature db/application design I didn't need.


>Making decisions about databases early results in needing to make a lot of secondary decisions about all sorts of things far earlier than you would if you just left it all in sqlite on one VM.

Like what? What could be easier than going to AWS, click a few things to spin up a Postgres instance, click a few more things to scale automatically without downtime, create read replicas, have automatic backups, one-click restore?

I feel like it's the opposite. Trying to put SQLite on a VM via Kubernetes will likely have a lot of secondary decisions that will make "scaling to millions" far harder and more expensive.


> Like what? What could be easier than going to AWS, click a few things to spin up a Postgres instance, click a few more things to scale automatically without downtime, create read replicas, have automatic backups, one-click restore?

Not do that and not have to maintain that? Just a regular AWS instance with SQLite will be far easier to setup, with regular filesystem backup which is easier to manage and restore.

> Trying to put SQLite on a VM via Kubernetes will likely have a lot of secondary decisions that will make "scaling to millions" far harder and more expensive.

The issue is assuming that "scaling to millions" is a design goal from the start. For a lot of projects, the best-case-scenario does not require it. For another big set of them, they will never get to it. For the few that do, it's possible that the cost of maintaining the more complex solution while it isn't needed plus the cost of adapting over time (because, let's be honest, it's almost impossible to get the scaling solution right from the start) is more than the cost of just implementing the complex solution when it's actually needed.


>Not do that and not have to maintain that? Just a regular AWS instance with SQLite will be far easier to setup, with regular filesystem backup which is easier to manage and restore.

You'd have less maintenance using RDS than you would with an SQLite hosted on a VM.

>The issue is assuming that "scaling to millions" is a design goal from the start.

I was responding to the OP who said you can scale to "millions of revenue" on SQLite. This part is true. You could. But it'd be easier to do it on a managed Postgres instance.


> You'd have less maintenance using RDS than you would with an SQLite hosted on a VM.

Considering that the maintenance I need to do for SQLite databases is practically zero...

> I was responding to the OP who said you can scale to "millions of revenue" on SQLite.

Millions of revenue doesn't necessarily need a high performance database. You could have millions on a page that deals with less than one request per second.


>Considering that the maintenance I need to do for SQLite databases is practically zero...

I also do zero maintenance on AWS RDS. Didn't have to setup it up either. Just a few clicks, copy and paste connection string, done. No VM to configure. No SSL to configure. No backup to configure.

Two clicks for a read only replica. A few more clicks and I can have multi-node failover. You'd want a failover if you have "millions in revenue" no? How do you plan to set that up with SQLite on a VM?


If you're at millions of revenue and haven't made the switch to a more robust setup I'd have questions.

The question isn't about what you do when you get there the question is if you get there or if you flounder about deciding between self managed MySQL, postgress, rds, MongoDB, gcp, AWS or Azure.


>If you're at millions of revenue and haven't made the switch to a more robust setup I'd have questions.

Companies with more revenue than "millions" are still running AWS RDS.

https://aws.amazon.com/rds/customers/


I think you misread me or I didn't word it well enough.

I meant switch off sqllite by the time you're running a real business critical application off your db.


That's a cop-out answer. You're making the claim that simply using the filesystem compared to going with Postgres somehow has less complexity and fewer (or lesser) second order effects, but you don't even indicate how come. So here's my question:

how come?


> simply using the filesystem compared to going with Postgres somehow has less complexity

Postgres is also using the file system, it’s just got some extra layers in-front like networking and authentication.

People often refer to those “extra layers” as complexity.

> how come?

Why is it that more code is more stuff that can go wrong? That’s just how it is.


You are also missing the point and in the same way.

You are at A, you want to go to C. We are debating which B to pick between:

1. a filesystem + implementing library over that to cover the interface between A and C.

2. Postgres (which is the same as above - just not invented here)

Now, both of you are imagining a world where #2 somehow implies more complexity and also more second order effects than #1 would.

Nobody uses all of Postgres, but the cost of not using more than 1% of what's available is hardly larger than implementing that 1% yourself for most use cases. And please don't make the misstake of thinking that I'm trying to sell Postgres over any other database product - this is all to do with the worse is better idea that so many of you cultivate.


> We are debating which B to pick between:

No we are not.

Postgres isn't a webserver that can handle 1m http GET requests per second; it doesn't distribute data from 30 event collectors to a reporting database without configuration. *Other stuff needs to be written to satisfy the business.* That's life.

> both of you are imagining a world where #2 somehow implies more complexity and also more second order effects than #1 would.

Most studies into product resolution agree that there's zero difference (from a pass/fail perspective) between writing it yourself or trying to reuse third-party applications. Here's one from some quick ddging: https://standishgroup.com/sample_research_files/CHAOSReport2... and see the resolution of all software products surveyed (page 6).

That means that if you can, it is still cheaper to build new software and test it than to try to use Postgres and scale it, because once software is written the cost is (effectively) zero, whereas the cost to support Postgres is a body -- that I might only be able to use 1% of, and that I might need to hire two so that one can go on holiday sometime, with a real salary that we need to pay.

Now maybe you're working for a company whose software never gets finished. In that case, by all means use Postgres in your application, since you can share the churn on that part with every other person using Postgres in the world, but I'm going kayaking today.


Choosing a DB is not premature optimization, but it eliminates a whole host of data related problems and smoothens development.


> Choosing a DB is not premature optimization

That's your opinion.

> it eliminates a whole host of data related problems and smoothens development

I don't find this to be true, but maybe you have different problems than I do.


I built a gallery web app so I could tag images, for my own use.

At first i was writing in a large json file. Each time i did a change, I would write the file and close it.

It worked pretty well. I now use sqlite+dataset and the port was trivial.

I guess there are better ways to do it than open(), but using dataset is quite simple.

Maybe creating folders and duplicating data works, but it seems more complex than using sqlite and dataset.


You aren't missing anything.

Cloud SQL postgres on GCP with <50 connections is like ~$8/month and has automated daily backups.

The differences in SQL syntax between SQLite and Postgres, namely around date math (last_ping_at < NOW() - INTERVAL '10 minutes'), make SQLite a non-starter imho... you're going to end up having to rewrite a lot of sql queries.


> On top of that every dev knows some SQL.

Where "some" may include "none" ;)


> What am I missing?

only that hn is full, FULL of trolls who give the absolute worst advice possible. it's a sport maybe or it's entirely innocent and related to the Dunning Kruger effect but nothing brings it out more than database threads.


You can get a managed Postgres instance for $15/month these days. Likely cheaper, more secure, faster, better than an SQLite db hosted manually on a VM.

>Because it requires no setup and has been used to scale typcial Web2 applications to millions in revenue on a single cheap VM.

Plenty of setup. How would you secure your VM? SSL configuration? SSL expiration management? Security? How would you scale your VM to millions without downtime? Backups? Restoring backups?

All these problems are solved with managed database hosting.


I recently hopped on the SQLite train myself - I think it's a great default. If your business can possibly use sqlite, you get so much performance for free - performance that translates to a better user experience without the need to build a complicated caching layer.

The argument boils down to how the relative performance costs of databases have changed in the 30 years since MySQL/Postgres were designed - and the surprising observation that a read from a fast SSD is usually faster than a network call to memcached.

[1] https://fly.io/blog/all-in-on-sqlite-litestream/ [2] https://www.usenix.org/publications/loginonline/jurassic-clo...


my brother in Christ, SQLite is not a RDBMS. it runs on the same machine as your application. the scary, overwhelmingly complex problems you've enumerated should indeed be left to competent professionals to deal with, but they do not affect SQLite.


> it runs on the same machine as your application.

Better, it runs in the same process.


> the scary, overwhelmingly complex problems you've enumerated should indeed be left to competent professionals to deal with

You also aren’t getting that for $15/mo, either.


https://www.digitalocean.com/pricing/managed-databases

$15/month for Postgres managed hosting. I would consider DO as competent professionals.


And are they going to deal with that scary stuff for me for $15/mo? Or are they going to turn it off and on and tell me I need a bigger boat if it’s slow?


What scary stuff are you dealing with?


I love SQLite and use it whenever I can, but if you’re building the next Reddit, you obviously can’t live with the lack of concurrent writes. HN is fine as the write volume is really low, just a couple posts per minute, plus maybe ten times as many votes.


> couple posts per minute

There are hundreds of posts that people are reading at every moment. I don't think it's that less.


You don't have to guess. Just check https://news.ycombinator.com/new and https://news.ycombinator.com/newcomments for everything that's posted here.

Or take a quantitative approach. HN item IDs are sequential. Using the API:

  $ curl 'https://hacker-news.firebaseio.com/v0/maxitem.json'
  32409711
  $ curl -s "https://hacker-news.firebaseio.com/v0/item/32409711.json" | jq .time
  1660125598
  $ curl -s "https://hacker-news.firebaseio.com/v0/item/$(( 32409711 - 14400 )).json" | jq .time
  1660031940
  
so, 14400 items in ~26 hours, fewer than 10 items per minute on average. At the peak you'll see somewhat more than that.


What about one sqlite db per subreddit?


Where would you store the user data? Duplicate it across millions of SQLite databases?


Using files sounds insane. 100% chance you end up with a nasty data race.


Why sqlite and not postgres? every sqlite tutorial or documentation or anything at all that I've seen says not to use this tool for a large database


https://remoteok.com and https://nomadlist.com both use SQLite as the database. 100+ million requests a month and there's been no problems.


For what definition of "large"? SQLite surely has no problem holding gigabytes of data or millions of rows. In many cases it's much faster than most other kinds of databases. So, size is not the problem. Throughput on one machine with one active process on the same VM is not the problem. Having multiple processes on one VM in most cases is no problem.

But if you spread your compute across many machines that access a single database it can get iffy. If you need the database accessible through network there are server extensions which use SQLite as storage format but you're dealing with a server and probably could use any other database without much difference in deployment&maintenance.


This all sounds good until you consider high-availability, which IMO is absolutely essential for any serious business. How do you handle fail-overs when that cheap VM goes down? How do you handle regional replication?

You could cobble something together with rsync, etc, but then you have added a bunch of complexity and built a custom and unproven system. Another option is to use one of the SQLLite extensions popping recently like Dqlite, but again, this approach is relatively unproven for basing your entire business on.

Or you could simply use an off the shelf DBMS like Postgres or even MySQL. They already solve all of these problems and are as battle-tested as can be.


> high-availability, which IMO is absolutely essential for any serious business

Depends very much on the business. You can have downtime-free deploys on a single node, and as long as you've setup a system to automatically replace a faulty node with a new one (which typically takes < 10min) then a lot of businesses can live with that just fine. It's not like that cheap VM goes down every day, but just in case you can usually schedule a new instance every week or so to reduce the chance of that happening.

> How do you handle regional replication?

For backup purposes, you'd use litestream. Very easy to use with something like systemd or as a docker sidecar.

For performance purposes, if you do need that performance you'd obviously use something else. Depending on the type of service you have, though, you can get very far with a CDN in front of your server.

> Or you could simply use an off the shelf DBMS like Postgres or even MySQL.

If you need it, sure.


TBH, I've used hypersonic sql before. People thought it was crazy, but there was no concurrency, and backups were just copying a couple of files. Fixing them if they failed was easy to.

People get too caught up in assumptions without knowing the use case. There are a million use cases where a tool like sqlite would be bad, but also a million where its likely the easiest and best tool.

Wisdom is knowing the difference.


> These days, for a new Web2 type startup, I would use SQLite.

There's a whole page in the docs explaining why this might be a bad idea for a write-heavy site:

https://www.sqlite.org/howtocorrupt.html

Section 2 is especially interesting. SQLite locks on the whole database file - usually one per database - and that is a major bottleneck.

> Another option worth taking a look at is to use no DB at all.

We tried that in a project I'm currently in as a means of shipping an MVP as soon as possible.

Major headache in terms of performance down the road. We still serve some effectively static stuff as files, but they're cached in memory, otherwise they would slow the whole thing down by an order of magnitude.


If you're doing .Net Core you could also take a look at LiteDB. Use SQL queries in a local (optionally encrypted) file using a single embedded nuget package. Also creates schemas on the fly when you first persist collection types as per traditional NoSQL systems.

I'm using it to create an embeddable out of the box passwordless auth system that has no external dependencies. Works great.

Admittedly I've not tried it under extreme load yet, but I'm happy to migrate it onwards if something I create goes massive.

(Edit: to be clear it is NoSQL, but you can query it using almost standard SQL which is what I do).


Sure and I would recommend they use C to code the backend, they should not use a CDN either to save some bucks. And on the front-end there is no sense to use javascript as everything can be done in the backend


Exactly! no separate machine required just for db. You can also backup SQLite db asynchronously


Completely disagree with the suggestion to just write files. The file system APIs are incredibly easy to misuse. A database with good transaction semantics and a schema is worth so much.


Facebook was very similar, I believe. The MySQL schema boiled down to two tables: nodes with an id, a type, and key/value properties; and edges (known internally as assocs) between pairs of ids, also with key/value properties and a type.

A simple schema helps when you just need to write stuff. You can make simple queries against a simple schema, directly.

For the juicier queries what you do is you derive new tables from the data using ETL or map/reduce. The complexity of the underlying schema doesn’t have to reflect the complexity of your more complicated queries. It only needs to be complex enough to let you store data in its base form and make simple queries. Everything else can be derived on top of that, behind the scenes.

Example: nodes are people and edges represent friendships, and then from this you could derive a new table of best_friend(id, bf_id, since).


After 15+ years of being an SRE in finance across firms ranging in size from 2 people to 350K people, I've come to the following conclusions:

1. There are no solutions, only tradeoffs.

Sure. You win in the OP example of easy to add things but you lose in the relational aspects.

You could say the same thing in reverse, using a RDBMS make it much easier to do join based lookups but then it's harder to update

2. At the end of the day, it doesn't matter b/c some other layer will do what your base layer can't

I've seen some really atrocious approaches to storing data in systems that were highly regarded by the end users. How did this happen? Someone went in and wrote an abstraction layer that combined everything needed into an easy to query/access interface. I've seen this happen so many times that whenever people start trying to future proof something I mention this point. You will get the design wrong and/or things will change. That means you either change the base layer or you modify/create an abstraction layer.


Re #1: I agree. But with a pedantic semantic distinction:

There are a lot of objectively wrong ways to do something. But among all the right ways, there are trade offs.


But even if you do something objectively wrong, it can still work out anyway. Look at every Perl script written by non-programmers. Horrifying mess of bullshit, and you try to read the code and think this shouldn't even be parseable, but somehow the script works.

There is no right or wrong, only working and not-working.


If someone has written their own bubble sort to sort a list of numbers and you replace the custom bubble sort with a built-in sort - there really aren't tradeoffs to consider. Your method will be faster, easier to understand, more robust. Their method works, but is strictly inferior.


Actually, speaking as a mediocre programmer who got in to management and now only writes code for fun side projects, there are lots of times when I'd prefer a less-performant function that I understand to a more-performant-but-opaque builtin. What if the sort only handles 100 items but I need to be able to change its internal behavior? I'm a lot better off writing my own slow sort than trying to modify one written by a 10x Rockstar Developer whose code I barely understand.


For the same result, within the same accepted parameters, the 'inferior' method is really identical, with no tradeoffs. By measuring the result rather than the design, you see the true value rather than the theoretical value.

If you only need to move a 60lbs bag of sod across the yard, carrying it yourself, using a wheelbarrow, and using an F-150 Lightning all have exactly the same result and none is inferior. They can all be done in the same time by the same person with no real difference. It's only when the parameters change, like moving 400 bags of sod, that the first two require tradeoffs.


>For the same result

You're assuming that the "result" data structure is only a single boolean, is_task_done. That's not how most things work, GP's bubble sort will pretty damn well show up in locked-up UIs and injured-dog-slow bottlenecks, that's as observable as the Sun at 9 AM. That would happen pretty fast too - don't delude yourself with "Move Fast And Fix Later" with quadratic algorithms - 1000 squared is a million, and at 10000 it is a hundred millions.

>carrying it yourself, using a wheelbarrow, and using an F-150 Lightning all have exactly the same result and none is inferior

Off course not, the method used would show up in your health and exertion data, your fuel costs, the amount of noise and heat you emit while doing the work, etc... Every single way of observing a boolean is_bag_moved result can also be used to observe a metric measuring those things, and thus determine the method you used.

You're trying to defend an extremly strong claim : If algorithms A and B gives the same output in memory, NOTHING else matters. Here's an concrete example, consider the following sequence of algorithms:

Algorithm 0 : Print("Hello")

Algorithm 1 : do {Nothing} then Print("Hello")

Algorithm 2 : 2.times do {Nothing} then Print("Hello")

...

Algorithm N : N.times do {Nothing} then Print("Hello")

This is an infinite sequence of algorithms that all have exactly the same behaviour in a time-space agnostic manner. Would you say every algorithm in this sequence solves the proplem of printing "Hello" exactly as well as every other algorithm ? If yes... interesting. If no, then your original claim is false as-is, it needs qualifying with what observable behaviour means, as this example shows, it's a difficult notion to exactly define, a naive definition of "Everything that can be observed in computer memory or on screen" would leave you open to ridicilous conclusions.


This is a thread about reddit and other scaled databases; it's given that the discussion is in 400+ sod bag territory...


When I was a Perl programmer, in our shop we amended the motto "There's more than one way to do it" to "There's more than one way to do it, but most of those ways are wrong."

The Perl community themselves eventually extended the motto to "There's more than one way to do it, but sometimes consistency is not a bad thing either"


You nailed it on the head (and people wonder why there are so many 'bad' laws passed).


In 2004, I started community college program and had a professor teaching intro to databases. He was retired from industry and had lived through the rise of RDBMS's and made his money in that. Teaching was just his hobby. He liked to have fun with the class (120+ of us) by having us do call-response things as a group.

Prof: "Why do we normalize?"

Class, in unison: "To make better relations."

Prof: "Why do we denormalize?"

Class, in unison: "Performance."

It took a lot of years of work before the simplicity of the lesson stuck with me. Normalization is good. Performance is good. You trade them off as needed. Reddit wants performance, so they have no strong relations. Your application may not need that, so don't prematurely optimize.


"Performance is good" is I think something everyone could agree with just hearing it for the first time. There's no reason to prefer worse performance if you can have better performance for no additional cost. Is the same true for normalization? Why is just the state of being more normalized inherently better?


Denormalized means it has multiple copies of the same pieces of data. The application is responsible for keeping those copies in sync with each other. This is a messy process that is easy to get wrong now and then. Then your app starts saying things that are absurd. No one cares much if Reddit does that every now and then.


Some levels of normalisation don't allow nulls. This is absurd. Normalisation alone is theoretical concept which should not be blindly pursued in practice, even without performance requirements.


because a normalized database repeats less things and is much easier to understand than a messy one.

oh, and also, the entire point is that neither is "inherently better"... both have pros and cons


I think my point was "performance" in the sense of faster response time is inherently good. All things being equal, if you have a more performant option you'll choose that. The only time you accept worse performance is in pursuit of some other metric.

On the flip side, "normalization" doesn't have that same inherent good-ness. All things being equal (including performance) there isn't any inherent drive toward more normalization, maybe because a faster performing page will have clear impact to the user while a more normalized data structure would be completely transparent to the user?


sure, from the perspective of the final user it doesn't matter much at all.

with my point about normalization being easier to understand I meant to say that it is easier for a developer learning about the schema.

if I recall ok, I think there's also something about data de-duplication in a normalized database. without this normalization, you could get a lot of repetition, which would actually require to update the same data in multiple places or find some other way to deal with different data about the same entity (some sort of de-synchronization is more likely without normalized schemas).

in any case, I'll grant you that normalization should never be 'the' goal for a database.


Normalized schemas are less prone to certain kinds of hard-to-solve issues.

Example: When you have the same data in two places, which is authoritative? If they disagree, which one is right?

And querying data reliably is much easier when you've got good relations that can be relied on to be accurate. SQL is almost a masterpiece (though imho it should have been FROM, WHERE, SELECT).


> don't prematurely optimize

Premature optimization is the root of all evil


Careful with repeating mantras without reflection.

You may end up using them as arguments when they're just catchphrases that require context.

See people who used "goto considered harmful" as an absolute to never use goto (C).


“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”


How dare you bring full quotes and context to a discussion :)

Seriously true, though. What's usually missed is the first bit, about "small efficiencies", combined with the last bit, the "critical 3%". He's basically saying don't spend time in the mass of weeds until you know where the problems really are, but conversely take the time to optimise the smaller areas that are performance-critical.

Almost the opposite of the carte-blanche overly-minimal MVPs often thrown together by those of us who forget what the V stands for.


So is premature pessimization.


Tuple table performance, though, tends to degrade quickly when querying separate rows for every little attribute.

More _memory_ and row caching in RAM has allowed for a resurgence in denormalized schema.

FB's underlying schema is similarly just a few tables. But the entire dataset is memcached.


> Tuple table performance, though, tends to degrade quickly when querying separate rows for every little attribute.

Interesting. Do you know what causes this steep decline in performance?

A larger sequential read is typically not that much slower regardless on what sort of drive you are using. I'd have expected that the random access pattern of a normalized table would lose out.



Thanks! Macroexpanded:

Reddit's database has only two tables - https://news.ycombinator.com/item?id=4468265 - Sept 2012 (79 comments)


We’re doing something similar at Codeball.

A single Aurora RDS database (with read replicas), with a single table called “blobs”. The blobs table has three columns: id, type, and json.

It’s scaling really really well, to many many millions of rows, and high traffic volumes (many thousands of writes per second).


Wny even use RDS? There is nothing relational about that schema. Just use DynamoDB. It'll be cheaper, you can actually query the JSON, and if you decide to flesh out the documents and go all in on NoSQL then you can do that easily.


It's PostgreSQL, so we can and do query the JSON data, and update deepely nested fields in the JSON natively as well (no need to take a lock, read the entire blob, modify it in the application, save the data, and release the lock).

PostgreSQL supports setting values, appending to arrays, etc etc natively in an ACID compliant way.


> so we can and do query the JSON data

Also supported in DynamoDB - it's a JSON-database after all.

> update deepely nested fields in the JSON natively as well

https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

> PostgreSQL supports setting values, appending to arrays, etc etc natively in an ACID compliant way.

https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transac...

I'm not saying PostgreSQL is a bad choice by any means, or that DynamoDB is suitable for all projects. But good support for JSON is in no way a capability unique to PostgreSQL. At least ideally, a competently built document database should be at least as good if not better than an SQL database emulating a document database.


DynamoDB doesn't let you store JSON natively. You have to translate that json into dynamodb keys and attributes.


> Just use DynamoDB

Or, at that point, just use a CSV and slurp the whole thing into memory.


Here's where I've never been smart enough to use this model... help me explain how you deal with scenarios like this...

1) There's an incident and we are trying to replicate the issue, but our database is just shapeless blobs and we can't track down the data that is pertinent to resolving the issue.

2) Product management wants to better understand our users and what they are doing so that they can design a new feature. But they can't figure out how to query our database or how to make sense of the unstructured data.

3) QA wants to be able to build tests and set up test data sets that match those tests, but rather than a sql script, they've got to roll all this json that doesn't have any constraints.

These are just some examples, but maybe you get the gist of why I don't quite understand how this model scales "outside" the database.


1) I think that the biggest difference between how you use KV/NoSQL databases is that your objects tends to be much bigger than a "normal" SQL row. Each object can be a parent entity, plus many sub-objects and sub-arrays of information. In a relational model, that data would have to go in other tables that you'd have to join etc. I believe that this model makes things much easier for us, as all you need to debug a single failure usually is stored in a single object, instead of spread out across the entire database.

2) Very true. We're using other tools (like PostHog) for collecting and analysing user behaviour. At a bigger scale, we'd probably ship of event data to something like BigQuery for analysis.

3) We don't have QA, but for testing, objects are created in code just like any other object, and unit tests (aka most of the tests), don't need the database at all. :-)


Why not just use DynamoDB at that point, would likely be cheaper.


Yea I agree. Seems like it should have been a NoSQL solution from that start.

Even today, it should be pretty straightforward to port the data over to a NoSQL db.


Dynamodb can be very painful with certain usage patterns or if you need to change/add usage patterns. For startups or MVP apps/features, it's a lot easier and cheaper to adhoc SQL.


Sorry but I do not believe in this magic.

How many queries in realities do you need to fetch all “page” data?


One! PostgreSQL has great support for indexing JSON data.


Looks like you built Hello World website? Sorry if I am wrong.


Sounds like an entity, attribute, value type schema? But where the value is nested data.


This reminds me of when the /r/counting subreddit broke reddit:

https://www.reddit.com/r/counting/comments/ww3vr/i_am_the_be...

Their data model was not designed for very deep comment chains.


I wonder why they need to compute the complete comment tree to begin with


How has this evolved to today? Any growing pains or learning lessons? What stayed good about this as the team and product scaled?


This design is a fault of traditional RDBMS products that have strictly imperative and synchronous DML commands.

The physical schema ought not be a transactional change, but it is in most products, with locks and everything. This isn't necessary, it's just an accident of history that nobody has gotten around to changing.

For example:

Rearranging the column order (or inserting a column) should just be an instant metadata update. Instead the entire table is rewritten... synchronously.

Adding NON NULL columns similarly requires the value to be populated for every row at the point in time when the DML command is run. Instead, the default function should be added to the metadata, and then the change should roll out incrementally over time.

Etc, etc...

Whatever it is that programmers do with simple "objectid-columnname-value" schemas to make changes "instant" is what RDBMS platforms could also do, but don't.


> Adding NON NULL columns similarly requires the value to be populated for every row at the point in time when the DML command is run. Instead, the default function should be added to the metadata, and then the change should roll out incrementally over time.

You mean just like mysql, oracle, sybase and mssql do ?


"Amazon DynamoDB Deep Dive" https://www.youtube.com/watch?app=desktop&v=HaEPXoXVf2k&list...

why and how of the single table design approach


That's exactly what I thought of when reading the title.


I used to work at a company that had dozens of SQL tables for similar data - if there was a selection box in their app there was a table to represent it. I had a very productive meeting with the young (CSLA-trained) programmers on my way out, saying that we could have a single table for all lookup values. The schema went from a giant, whiteboard-covering diagram to a neat 8.5x11 single sheet diagram, resulting in a lot of reused code and data. Maybe there's more than Things and Data in real life, but there is a pretty good medium between "new table for each thing" and "each thing is just another thing row"


Yes, all production system could be built using two database: One for metadata, one for dynamic data. Or two tables: one for metadata, one for dynamic data. What's strange here ?


> What's strange here ?

strange or completely a-typical for 99% of web apps?

Not sure what web apps you've worked on but the rest of the world doesnt default to anything like this. Even if it does make sense for the usecase, even Reddit only evolved to this much later.


Furthermore, you can have several metadata tables that represent different types of graphs of the data tables, using all the relational features.

For example, one can implement role-based ACL on a tree structure in metadata tables, with one data table. Then sql can be used to get permissions, and even join on the data table, if that works for you.


Thank you so much for sharing this, I was exhausted from trying to make my data schemas be perfect for the long term. I will use this technique for my next projecta for sure. Thanks again.


How does this really work if a developer wants to pull most recent discussions posted in in X category, having most number of replies when I only have to tables?


something like this?

SELECT TOP 50 * FROM [things] [post]

JOIN [things] [category] ON [category].[id] = [post].[parent_id]

WHERE [post].[type] = 'post'

AND [category].[type] = 'category' AND [category].[slug] = 'animal-pictures'

ORDER BY ([post].[child_count] / DATEDIFF('hh', [post].[created_date], GETUTCDATE()) DESC;


makes sense. rather than have an ever expanding user table with multiple columns, save attributes as rows. example. thing table, user id 1, 2, 3: data table user id 1, attribute name, value mike user id 1, attribute address, value Chicago.


The Thing model is almost just Third Normal Form, but where table is part of the key.


Inside of you are two tables. One is for things and the other is for data.


One is evil and one is good. Which will win? The one you feed.


Instructions unclear: I now have a millions things and no data.


I've often seen a pattern where the database schema devolves into a "classifier" system, although perhaps a combination of EAV (entity attribute value) and OTLT (one true lookup table) would be a more apt descriptor.

Basically, you have tables a bit like these:

  - classifier_groups (e.g. product_classifiers, basket_classifiers)
  - classifiers (e.g. product_types, discount_types, delivery_methods, payment_methods)
  - classifier_values (e.g. regular_product, bid_product, timed_discount, loyalty_discount, postal_delivery, courier_delivery, credit_card, paypal)
  - classifier_value_attributes (e.g. product_name, product_weight, product_size, bid_start_date, bid_start_value, discount_start_date, discount_percentage, ...)
  - classifier_value_attribute_values (e.g. "Some product name", "0.25 kg", "2022.08.10 09:00", 5 EUR)
Though sometimes the approach folds in on itself and the attributes are classifiers/classifier_values themselves, or JSON is used, or more tables inbetween are used when you need to track where a particular usage is needed (e.g. when you define a template and then have an item that is based on the template but with changed data).

So the end result is that instead of 200 tables you have just a few instead, but the problem is that you can't really figure out what is happening with just foreign keys (especially when you are trying to do the reverse - refer from these classifiers to another table, since clearly you can't have a foreign key there and instead you do the EAV thing, like "link_type"/"link_table_name" and "referenced_id") and oftentimes you can't really use the DB because the enums and other data that you need to be able to write queries is actually stored in the application code.

Regardless of what you call it, the thoughts on the approach, or at least its components seem split, even back in the day:

1) https://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-bi...

2) (PDF) https://novicksoftware.com/wp-content/uploads/2016/09/Entity...

3) https://softwareengineering.stackexchange.com/questions/9312...

I've seen some people swear by it, though personally I really dislike it, since any DB schema visualization tool just turns up a black box and a dozen JOINs across these classifier tables will always be less clear than just referencing tables like products, bid_products, timed_discounts, loyalty_discounts etc. So personally, I'd much prefer the 200 tables with foreign keys between them wherever necessary, with really dumb model bindings in the app instead of a bunch of enums, complex service logic and inevitable orphaned data. And yet, somehow people fear the idea of hundreds of table, but would prefer to shove this functionality into bunches of data instead.

(disclaimer: domain is made up so perhaps not the best example, also a proper term for such a pattern escapes my mind at the moment)


> There are no joins in the database and you must manually enforce consistency

In other words, you've manually developed a relational database on top of a relational database that you're using as a file system.


It's a relational database management system on a database management system. An RDBMSDBMS.


Indeed, I do wonder why industry still defaults to sql/relational instead of mature nosqls (there aint just mongo)


Because schemas let you agree on the shape of the data without having to pull in applications to query things for reporting, analysis, etc.

Additionally, forcing things into a correct schema also protects you from a lot of data inconsistencies out of the box.

Finally, the dbms is going to be far better at JOINs than your application will. Nearly every web application needs JOINs at some point. Developers just don’t recognize what they are doing so they don’t realize what they are missing out on by doing it at the application layer.


"At some point", but how much? I believe it is up to modeling skills

Anyway - Schema slows development as hell, you need to maintain mapper layer between domain models and db models unless you model your system with anemic model mixed with tech stuff(db), but it is mess

Additionally you deal with either stringly typed sql

Or ORMs that require months of practice to be proficient even in c# which has LINQ


Besides all the benefits of relational DBs, I'll give you my main reason for not starting with a NoSQL database:

Because eventually, you'd want to analyze that data and it's a huge pain to do it with NoSQL. Imagine if you hired a business analyst and she needs to look at the data. Are you going to expect her to learn Javascript to query the data? Are you going to write an ETL pipeline to move data from a NoSQL db to an SQL db?

Note: I know many NoSQL DBs now support SQL queries.


Personally it's a lack of trust.

I've set up databases, I've broken databases, I've restored databases, I've scaled databases, and I've migrated them. Never lost so much as a nibble of data.

My introduction to NoSQL was an article about Mongo losing someone's production data. Not even all of the data - partial data loss, the worst kind of data loss.

I'm just gonna head off that massive headache at day 0 and use an ACID compliant database.

From there it's lack of experience. Same reason I wouldn't decide to write my next thing in a new language unless I was specifically learning that language.


MongoDB supports a variety of sorts of ACID transactions in the current day and age

https://www.mongodb.com/basics/acid-transactions


But as I prev. said NoSQL is not limited to Mongo

Ppl hate nosql cuz of mongo when there are other dbs, e.g RavenDB

>I'm just gonna head off that massive headache at day 0 and use an ACID compliant database.

It is just ACID vs BASE


I don't hate nosql. I nothing it. I do not have any knowledge or skillset that includes it. I am most definitely naive towards any strengths and capabilities it may have over an SQL database.

I also have no reason to learn it at this time, an SQL database wins uncontested whenever I need to store data.

If I was to start a new project I wouldn't pick that moment to learn a new programming language. Same thing, different part of the stack

Honestly what it'd take would be the next WordPress or whatever using nosql by default. Make it need learning


Do you have a reference to that article?


Honestly we're talking at least a decade ago or so at this point, sorry.

It was a first impressions thing that stopped any further investigation, not a comment on the current state of NoSQL


Imma guess

Jepsen


Because a relational database give you schemas, constraints and a query language out of the box.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: