Hacker News new | past | comments | ask | show | jobs | submit login

“Why didn’t we use an RDBMS in the first place? “

Because initial application specifications are sparse and definitely wrong. If your application is still up and running 5 years later and your data definition hasn't changed much in the past 3, then maybe refactor around an RDBMS. Designing around rigid structures during your first pass is costly. This is why there's been a rise in NoSQL and dynamic languages.

It seems the other way around to me. RDBMSes have well-defined ways of managing change in schema. Through a combination of modifying the schema itself and judicious use of views and stored procedures, it's often relatively easy to evolve the data model in ways that produce minimal disruption for the application's consumers.

Contrast with, I was at a Cassandra workshop a few weeks ago, and the speaker, when asked directly, conceded that, in Cassandra, you really do need to nail your schema on the first try, because there are no great retroactive schema migration mechanisms, and any evolution is going to result in all consuming applications needing to know about all possible versions of the data model. Which ends up being a huge source of technical debt. And heaven help you if you didn't get the indexing strategy right on the first try.

I think that this might point to the classic tension between easy and simple: RDBMSes are focused (admittedly to varying degrees of success) on trying to keep things simple, but there might be some work involved. NoSQL solutions are often sold as being easy to work with, and I don't deny that at all, but my experience is that, in the long run, they can become a huge source of complexity.

This isn't a tension that's unique to software. In my contractor days, we'd also do things one way when it was just a quick-and-dirty job, and a whole different way if we were looking to build something to last. e.g, you'll never catch me using a sprayer to paint my own house, no matter how much faster it is.

I've been trying to deal with migrating data in a Firebase database, and it's come down to exporting the data to JSON and searching through it with ripgrep and jq. Not ideal! And I've had to deal with entries missing "required" props, and entires that have keys with typos, etc. Luckily our dataset is still small enough that this is practical, but none of this work would be required if our product had been built on a SQL database in the first place (I wasn't around when that choice was originally made).

Completely disagree with this. SQL is more flexible than NoSQL. NoSQL databases generally have poor query support, which means if you want to do complex queries you need to precompute the results. This requires you to know your requirements up front. If you're using SQL, then you can do these queries on the fly.

Sure, SQL databases make you defined a schema. But it's very easy to change this when requirements change.

It's really not. That's a myth and you get all of the benefits that you want just by designing with a dynamic language.

It's less costly to create a relation table when you realize there may be multiple instances of a piece of data associated with a record than it is to just stick those pieces in an array.

Because when you don't use the relational data, you get the extra work of modifying all of your existing NoSQL records to use the array structure. And as a bonus, you make it easy to do queries in both directions with the relational data.

NoSQL offers virtually no efficiency benefit unless you're actually consuming unstructured and variable data.

With NoSQL solutions you're typically pushing data migrations to code. Yes this is technical debt. But not having to deal with SQL based data migrations is pretty big time saver early on.

Not really. Writing a SQL migration takes what, 10 minutes max? Or you add the columns as you go, and it all just merges into the normal dev time of the feature anyway.

You'll easily make up this lost time just in not having to immediately clean up crappy data that you've written to the database while you're developing the feature. That's been my experience anyway.

I mean, we're all programmers here, and we've all dealt with the growing pains of changing requirements.

But has an RDBMS ever been a major source of that pain? I can't say I've ever encountered a time when it has. If you need to change the structure, just write a script.

I'm not saying it's completely painless, but nothing is.

Before I started using migration scripts it was a bit of a pain (and before there was off-the-shelf libraries for this), it was a bit of pain. But these days, nope. MySQL is slightly more annoying than postgres because it doesn't let you wrap DDL queries in transactions so you can get left in an inconsistent state if your migration has a bug. Postgres is seamless.

IMO if your data model's still so nebulous that you're changing it so often that SQL migrations are a serious impediment to progress, you probably don't need any real datastore yet. You can usually figure out WTF you're going to do, broadly speaking, before persisting anything to a remote database. And if you can, you very much should.

Yes, yes, there are sometimes exceptions, one must repeat explicitly despite having already said it (see: "probably") because this is HN.

Can someone explain to me any circumstances where having no well defined data model is better than coming up with a clear relational model? Honest question. It seems like it would just be a huge hassle trying to deal with your dissimilar data.

When I'm designing an app from scratch I often think about the SQL tables first and how I'm going to build them, and it really sharpens my idea of what my program will be.

I don't see how skipping that process would make things easier.

I've built IoT platforms where data from any device must be accepted. This is largely where my preference for NoSQL comes from. A device created tomorrow will not have a schema I can predict or control. NoSQL allows the easy integration of that device while a traditional database will at worst require a migration for each new device you want to support.

Please correct me if I'm wrong, but that sounds like a very narrow use case, and also something that could be solved by simply stuffing JSON into a RDBMS.

However, perhaps there are tools that NoSQL provides that are handy.

What's wrong is that you asked for any usecase and then critique one because it's not broad enough for you.

Sorry, I'm not trying to be argumentative, I just argue in order to understand better.

Then argue honestly and work to steelman other's arguments. To address your point, I'd hardly consider IoT platforms to be a "narrow" usecase. Smaller than the whole of computing, surely, but it's a growing field. The reality is that more and more devices will become available that generate all sorts of hard to predict data. Being able to handle those easily will be a large strength for platforms going forward. Dropping this hard to predict data as JSON into a RDMS will certainly come back to haunt you in 5 years.

How will it come back to haunt? Again, just curious.

You'll have dumped it into a strict database, giving yourself a false sense of order and organization. But later when you need to query that amorphous data, you might be able to use OPENJSON or something else, but a NoSQL solution will have been built to handle this type of query specifically with utilities like key exists and better handling for keys missing or only sometimes being present.

You can't really design your tables well enough without knowing your UI, its a back and forth, forth and back process.

And it's something you can do entirely on paper, too, before you go to code.

However I've never greenfielded an actually large application.

How does UI inform table design?

It will give you lots of insight when planning your data models/tables.

I've found that MongoDB is really the Visual Basic of databases: It lets you rapidly get something running where your persistent data looks very similar to your data structures.

But, more importantly, this quote really is right:

> The more I work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand.

Far too many software engineers just don't understand databases and don't understand the long-term implications of their decisions. Good SQL is rather easy, but it does take a few extra minutes than sloppy approaches.

Or, to put it differently, some NoSQL databases are great at rapid prototyping. But, they can't scale and hold little value for someone who will put in the extra hour or two upfront to use SQL.

If you embrace both structure and a defined path to change that structure, you get the best of all worlds. Database migrations and scheme changes should be the norm, not a feared last resort.

Otherwise you end up with equivalent logic littered throughout multiple application codebases sharing the same database.

I feel like this is where ActiveRecord shines, as the initial development is thought of as objects and their relations to each other and it's very easy to alter data definitions. And in the end you get a reasonably normalized database underneath everything. The reality is most of your models aren't going to be undergoing wild schema changes and if they are or you need traversable unstructured data you can defer to json columns for those scenarios.

The lack of (or too-weak) safety & consistency guarantees will waste more time than it saved way before 5 years are up. More like within 3-6mo of going to production, if not before you hit production. Even more true if you're dynamic the whole way up (JS-to-JSON-to-[Ruby/Python/JS]-to-NoSQL and back again).

With the usual caveat that yes, some datasets & workloads probably do benefit from certain NoSQLs, but that's a pretty small minority.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact