Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have been working as a database consultant for a few years. I am, of course, in my bubble, but there are a few things I really don't enjoy reading.

> No single data model can support all use cases. This is a major reason why so many different databases exist with differing data models. So it’s common for companies to use multiple databases in order to handle their varying use cases.

I hate that this is a common way of communicating this nowadays. Relational has been the mother of all data models for decades. In my opinion, you need a good reason to use something different. And this is also not an XOR. In the relational world, you can do K/V tables, store and query documents, and use graph functions for some DBs. And relational has so many safety tools to enforce data quality (e.g. ref. integrity, constraints, transactions, and unique keys). Data quality is always important in the long run.

> Every programmer using relational databases eventually runs into the normalization versus denormalization problem. [...] Oftentimes, that extra work is so much you’re forced to denormalize the database to improve performance.

I was never forced to denormalize something. Almost always, poor SQL queries are a problem. I guess this can be true for web hyperscalers, but these are exceptions.



Completely agree. I remember when Mongo / NoSQL was peak hype cycle, and every new project "needed" to use it despite being a big step down in terms of features, ability and maturity. A few years later I ended up on a system with Mongo as the database, started when Mongo was at peak hype. It was every bit as bad as I expected.

I have never seen a Mongo Based system that didn't work off of a single server, the one place where it did actually have an advantage.


I was of the roughly same opinion, especially after getting familiar with relational theory. Despite all the horrors of SQL-the-language, the backing model is close to being universal for data storage and manipulation.

Having said that, we did encounter one good use case recently: storing tree-like documents complete with history changes. Mongo-like DBs are really good for this. While possible in relational DBs, this is superinconvenient.

Notably, this has nothing to do with performance.


> I remember when Mongo / NoSQL was peak hype cycle, and every new project "needed" to use it despite being a big step down in terms of features, ability and maturity.

It one for one reason and one reason only - performance over everything else. Any time I'd suggest using MySQL or PG during this period the reactions were as if I'd suggested storing data in a cucumber.


Stripe runs on Mongo, on multiple servers as you can imagine :)


Thats wild to me. Payment data is like, the most relational. You need to learn 20 different data models and all their many interconnected relationships just to implement a custom API integration. Maybe thats why it sometimes feels like pulling teeth to get the data you want out of a particular Stripe flow.

I used a document db for our side of the billing integration and I really, really wish I hadn't.


I am trying to be constructive here: "Relational" in RDBMS does not refer to what you are implying that it refers to. It's a common misconception honestly.

Anyway many interconnected relationships become much simpler to model when you have the richness and flexibility of the document data model.


They have credit card numbers and transaction ids in mongo?


servers were having much less RAM, cores and there were no SSD at those time, so NoSql was reasonable for many large scale applications.


My mental model is to think of a relational database as an amazing Swiss Army Knife. It can do anything I might need to do, but it's not awesome at anyone thing. I can cut a branch with a Swiss Army Knife, but it's much harder to open a can with a chainsaw. Unless you are 100% certain what problem you are solving take the Swiss Army Knife. Swing up multiple specialized tools or trying to make them do things they weren't designed for will just cost you time that you aren't spending on finding product-market-fit and then it won't matter if your solution won't scale to millions of users.


I think the point of "no single data model" is that you will have different call patterns for different parts of your data.

More, your aggregated data is also data. Do not think that you can just do the "bare truth" data model and run aggregate queries on demand whenever you want. Instead, consider having aggregate queries run off of what can be seen as a realized view over base data. And schedule this view so that it only has to rollup new data every $period of time. No need to recalculate aggregates that you knew yesterday.


The distinction is not between relational or not, rather normalized or not. You can model relational data in a key value store, but it isn’t easy. However it has the advantage of having more predictable performance.


The modern relational database is a near-miracle of human engineering. In the software field, it's probably only surpassed by the operating system, and maybe (or maybe not) the web browser.


I would hold SQLite up against anything.

Likely the second most widely deployed software ever. https://www.sqlite.org/mostdeployed.html

The best tested piece of software that I know of. https://www.sqlite.org/testing.html

People call it a toy database. Some toy.


I think that there's a tendency to think a technology is "a toy" if it is in a sense too good or elegant.

In the case of databases, I can well imagine Oracle or even Postgres people thinking that SQLite must be a toy because otherwise all the faff they do to set up and admin (and pay obscene amounts for) the databases is actually pointless.


> People call it a toy database.

Stands to reason. It is something you want to play with, not a figurine to leave on the shelf to forget about like some of those other database solutions.


Next time a noob tells me test coverage is bad I’ll show them this.


SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.

(Taken from https://www.sqlite.org/whentouse.html)

A toy that can serve the vast majority of DB use cases. I get it, you can't build a massive-scale project with SQLite, but that doesn't exactly make it a toy DB...


Choose boring technology.


Don't waste your breath on these people, either ones writing this blogspam garbage, reading, or upvoting said garbage. It's fashionable to use terms like "I/O bound" or "denormalisation" to the point where I'm longer sure the majority of commentators really _think_ these terms, or just _say_ them compulsively, almost mechanically, only to later fit/re-arrange the remaining sentence so as to accommodate them. I/O bound this, I/O bound that. Data access patterns, normalise here, denormalise there, best tool for the job!! No, please, you're not fooling anyone with your "measured" take that is claiming nuance all the while making no actual nuanced statements. When it comes to SQL, I'm not sure they even understand how it's not about performance, or "data access" for us, but rather a matter of sanity! I don't want to implement ETL routines for hours on end any time I'm tasked with data integration. Instead, I would probably just use Postgres foreign data wrappers, write some SQL queries, and be done with it. If I couldn't use temporary tables, or materialised views, I would probably go insane doing any kind of data work. (Not to mention a bazillion other things that make a modern, relational database.) I straight up wouldn't do my job because why, I would have to figure the tools out when I could really be doing analysis.

Oblivious as they are, the key takeaway from interacting with these people is; they're probably not doing real data work and don't really care about data, or analysis, for that matter. And this becomes even more obvious with all the talk about "audit tables" and such nonsense. No, please, no. We know how to manage data that is changing; there's Type 2 dimension, or whatever you want to call it, append-only, select distinct on. Downsample it all you want, compress it all you want. I digress. The best we can do is ignore these people. Let them write their shitty blogposts, let them get upvoted, and then when the time comes, simply make sure not to hire these people.

I say, never interrupt your enemy when he's making a mistake. One man's blogspam is another man's saving grace.


Yes, but performance does matter. It's just that the QL isn't the source of performance problems, so all the NoSQLs are a bit silly, and they tend to grow QLs eventually because of that.


You’re looking at it from the databases point of view (makes sense, you consult on them) but there’s a lot going on in the developer world. For instance, MongoDB exists and is popular because it doesn’t center the database but rather the developer. And in particular the needs of the developer line burning down a backlog without the database slowing that down.

Other databases focus on and optimize around a certain set of problems which yes, not as many people have as they think, but they aren’t just reserved for Google either.

And then there’s the world of analytics and data science etc where a host of databases that are not SQL become useful tools.

I do agree though that SQL should be a first consideration. But having worked up and down the stack from dev to dev-ops over the last 15 years I’ve gone from skeptic to enthusiastic in the choices.

Horses for courses.


It’s amazing the amount of work people will put into relating data in a non-relational database.

Maybe graph databases are what some folks are looking for.


> Maybe graph databases are what some folks are looking for.

Beware. These sometimes sell themselves as great for everything ("Look, if you squint, all your data's a graph! Clearly RDBMS is the wrong fit, use our graph database for all of it [and pay us money]!" — the general tone of marketing for at least one major graph database vendor) but tend to have to make interesting performance trade-offs to achieve the kind of quick graph-operation performance they like to showcase.

Step outside the narrow set of things they're good at (sometimes not even including all there reasonable "graphy" sorts of things one might want to do!) and performance may drop to the Earth's core.

They also may be way behind your typical mature RDBMS at things like constraints and transactions, or support far fewer datatypes, which can lead to some real pain.

(this basic pattern actually seems to hold for most attempted RDBMS "killers"—they get hyped as a replacement, but actually their best fit in a sane stack is as a supplement to RDBMS for specific purposes for very particular workloads, if you really, really need them. See also: the wild "NoSQL"/MongoDB hype of some years back)


Fair points.

The graph extension for Postgres is interesting.

As for a pure graph db options like TypeDB seem to be a bit different than the historical alternatives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: