Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Might as well start with SQL and migrate to NoSQL later
22 points by Supermighty 2602 days ago | hide | past | web | 30 comments | favorite
All of the articles I've read recently talk about the successful migration of very large websites to NoSQL solutions. The argument against are that they NoSQL is only needed for a very small percentage of websites. That coupled with the much larger SQL tools and knowledge base and uncertainty that any new website will need such scaling leads me to my question.

Why not just start with SQL and plan a migrate to a NoSQL solution later, when it's really needed?

It's certainly doable. I feel that the benefit of a NoSQL solution is, at this time, outweighed by the ease of an SQL solution for such a majority of sites that to plan from the beginning for a NoSQL solution is overkill.

Thoughts?




Given that HN is primarily peopled by founders of small bootstrapped startups, it's surprising how many advocates of key/value stores you see around here.

Overgeneralizing, key/value datastores are useful for enormulous things that operate on Twitter/Facebook scale and handle loads measured in thousands of operations per second. Looking at a few existing services that need that sort of throughput, you'll quickly notice that they're all backed by tens of millions of VC and have big teams they can devote to dealing with the overhead needed to build Big Scalable Things.

On the other hand, if you look at the dozens of "little services that charge people money to use them" that the typical bootstrapped startup founder should be striving to build, you'll notice that they all would work just fine on a relational database. So given that there are tools to get you shipped 10X as fast on an oldskool RDMS, it's surprising to see that ANY of these lean little startups would choose to go with NoSQL right out of the gate.


I agree. There's absolutely no problem with using, for example, ActiveRecord and MySQL when using Rails because it's the default option and it's not a completely horrible option as some of the NoSQL crowd says.

Due to the vast number of libraries for any language, and the vast number of helper/managers like ActiveRecord or DataMapper, I'm surprised nobody is advocating them more as they do support the "lean, fast, daily updates" while still small. When you need a higher performing system, it's available, but take advantage of all the previous library research. You're doing a disservice if you spend too much time to ship on exotic system, when you don't need it for another 1-3 years.


This is very close to my line of thinking.

What real benefit is there starting out with NoSQL when most websites really don't need to scale that big.


Though I do think there will come a time when it will be just as easy to start with NoSQL as SQL.


I think another benefit of NoSQL datastores is the ease of making changes on a live system. There's a great presentation[0] by Steve Huffman (in part) about how reddit ended up using Postgres as a key-value store as much for ease of adding new features as performance. Go with what makes sense for your application and worry about scaling when you have users.

[0] http://news.ycombinator.com/item?id=1330552


Not using a relational database alone doesn't make changes any easier. Often times the complexity is just moved on to a different place or time. Of course a completely generic data model makes it easier to make a particular modification on a live system, but it makes understanding the system and the consequences of that change harder.

The question is really which parts of the entire system depend on particular invariants and constraints? Where are they checked and enforced? How are the rules represented? Moving lots of rules out of declarative and into procedural code won't make things simpler.

I have experimented a lot of with all kinds of data representations, non-relational ones mostly, but I have to say, there are trade-offs that are rarely discussed in a NoSQL context. Probably because NoSQL was primarily the answer to "how do we do something incredibly simple on an incredibly huge scale?"

Many years of doing data integration have convinced me of one thing. Closely linking application code to data makes things simpler in the beginning but it leads to a completely unmaintainable mess rather more quickly than people anticipate. Data and applications have a very different life-cycle. Data lives longer and there is rarely a 1:1 relationship between application and data. That's actually where the real impedance mismatch originates, and I'm afraid it will never go away or become much simpler.


For me, half of the advantage of NoSQL is that it's much quicker and easier to build with, because you don't have to worry about schemas.

I'd go the opposite - start with NoSQL, and migrate to SQL if you need some more formalized relationships.


For Ruby, using datamapper can allow you to (for example) use both a relational database and MongoDB together. You could also have a migration strategy based on datamapper.

That said, I would suggest picking two data stores like PostgreSQL and MongoDB and write some code examples, play around with data models for your app. If you spend several hours doing this then you can make a better decision.

BTW I have tried most of the NoSQL data stores and MongoDB is probably the easiest to use from a developer's point of view. I have used bindings for Ruby and Clojure and being able to store things like Ruby or Clojure maps directly in a documents is great. That said, PostgreSQL with built in text indexing support, etc., etc. is also a great tool. I think that if you become fluent with PG and MongoDB then that is sufficient for a wide range of problems. For my development, I toss RDF data stores into the mix, but that might just complicate things for you.

Anyway, pick one of each, have fun playing with each one before making a big commitment.


First of all NoSQL isn't really that hard and the performance gains are indisputable.

For a bootstrapped startup nosql makes total sense b/c you get much more performance for your dollar.

Sure you can go the vanilla PHP/MySql or Django/Postgres route if you want to build your prototype but you will be in a bad position if your system can't scale with demand.

I would build the prototype first in your favorite framework/setup.. If it has market potential after some initial private beta testing I'd definitely go the nosql route.


I am basing my startup with MongoDB and Memcached as my storage choices, rather than MySQL, for performance reasons. I can fit nginx, 2 instances of my tornado application, a small memcached slice, and mongodb on a single 256MB memory cloud machine, and am confident when I start driving a bit of traffic too it, I'll just need to bump it up to 512MB of memory to make sure I don't swap.

The real advantage will be when I get to the point of seeing real traffic, I can scale horizontally on inexpensive machines. I'll be able to maxmize all parts of physical machines, when I make to the move to maintaining my own infrastructure.

Honestly, I'm not sure why I'd want to use SQL at all, especially for a project in it's infancy. With many NoSQL implementations you can adjust your schema as you go, instead of locking yourself early into a schema that may not fit a few years down the road, and it becomes a very difficult project to make changes to it.

The caveat being that everything I'm doing works with with a key/value table. My data set and how I access it wouldn't take advantage of a lot of what SQL provides. So, if your application does benefit from SQL, then by all means use it.


The model for your data depends on your data. What data are you trying to store?


if you know SQL well, start with that.

If you think non-relational makes sense for your use case, try it out for smaller, non-critical parts of your system till your understanding grows. you can then decide more confidently


It's not just about massive scale. I like SQL, but my experience is entirely on MS SQL, and hosting costs for that are pretty high. So I can either go with mysql/postgres, or a nosql platform. I have to learn something either way. It's looking to me like nossql can be simpler, with less administration, and very high performance on limited resources.

Redis, for example, is much more about making single boxes perform like crazy, rather than massive scaling (although they are working on a clustering for version 2.0).

Cassandra is mostly about the scaling, but is also very fast (though not as fast as redis), and if you set it up on three nodes you've got high availability. Since I plan to keep my day job, I can't drop everything to deal with server issues. I want something that just keeps on trucking no matter what.


You kind of have to go with the tech you know best, otherwise getting started is going to take too long. If you already know a SQL database system, go with that. If you don't know an RDBMS or a NoSQL system, it might be worth learning the NoSQL thing.

The reason is, if you're successful, the cost of migrating later on is going to be larger than you expect. You're going to be spending a lot of time doing a lot of back-end work that users never see. Once you are dealing with huge amounts of data, it takes hours and possibly days for data migrations and conversions to even run. You screw up a data import that took 4 hours... 4 hours is now killed and you have to do it over again. The more data you have, the more potential time these types of issues will absorb.


The choice between SQL/NoSQL should be made based on the features of each and what you need, and it's definitely not an exclusive choice. Remeber: NoSQL == "Not Only SQL" Use all tools where they make sense.

Construction workers use both nails and screws, even though they do much the same thing. It's just a question of what's appropriate for a given task.

Finally, if it's only a question of performance, I'd suggest tabling that question until it's a real issue. Scale when you need to, not before. Only once you have a "real" system with scaling stress can you actually see what needs to be optimized. Before that, it's all theory.


Get up and running and making money as fast as possible without being too stupid.

This might mean using a NoSQL database. MongoDB seems easy to try out, it could be something you'd use for some component.

But most projects fail before they ever reach some minimally useful state, not because they don't scale. Plan ahead for scaling where appropriate, but the main focus should be getting it out there in a useful state.


performance/scalability isnt the only reason people go with "nosql" solutions, some people just really dont like working with traditional rdbms's

a lot of these stores let you have tighter integration with your language (and its data formats), match your mental model better and sometimes have pretty revolutionary things that you just cant do easily any other way (couchapps + replication etc)


All good reasons, but for me and mine it's still easier to start with MySQL. Now I say that coming from a lazy standpoint. I use debian and do my best to not stray from the default install. Having NoSQL servers in the repo and books on the self go a long way towards a rich and vibrant ecosystem that makes it easier to get started with.

I know CouchDB is in debian stable, but it's at .8 last time I checked and >.9 breaks some things between the two. I would feel more comfortable once a solution is in the repo and basically feature stable.


CouchDB is pretty stable. The BBC are using it live, so it's safe for production use.


I'm not saying it isn't stable, rather it's still in development. I don't want to use couchdb .8 in the debian stable repo only to have to recode my app when I upgrade to .9 or .10 because of breaking changes.

http://wiki.apache.org/couchdb/Breaking_changes


That's a bit like saying you drive an automatic because it's too hard to learn to drive a car with a manual transmission.

First of all, you lock yourself out of a lot of useful things that other people take for granted that way. And second of all, it's really not that hard to learn.


I in no way said that people use "nosql" stores because they couldnt understand how to use RDBMS's

Anecdotally I would say the opposite, that people use "nosql" stores because they do know how to use an RDBMS and they have commonly found situations when relational databases are a square peg in a round hole.


Try one. If it's not doing what you want, try the other way. It's more important to start on something than to obsess over these things :)


It makes a lot more sense to start with a relational DB, because as pg notes, you are likely to make at least one change of direction in your startup, and various NoSQL databases are optimized for different things. Beginning with a common RDBM like MySQL will help you get started quicker and you can optimize or upgrade later, if it ever becomes necessary.


You could always start with MySQL then migrate quite easily to clustrix since they support the MySQL API. They're expensive, but you may find that by the time you need to scale to that degree you can easily afford them.

http://www.clustrix.com/


Agreed. I think this is an area where we're going to see a lot of competition in the next years so prices should go down quickly.


quote from Peter:

There are 2 things you should ask yourself. First is the scale comparable – the recipes from Facebook, YouTube, Yahoo, are not good for like 99.9% of the applications because they are not even remotely close in size and so capacity requirements. Second if this “smart thing” was truly thought out architecture choice in beginning or it was the choice within code base constrains they had, and so you might not have.

http://www.mysqlperformanceblog.com/2009/03/01/kiss-kiss-kis...


Adapting your database models to an exotic NoSQL may eventually alter your functionality. Users don't like change.


reddit, digg and others were able to make the transition without altering their functionality. I don't see this as a very hard argument for not.

Even if it were the case, careful planning for the transition could mitigate distasteful changes.


Reddit uses Cassandra for a very small part of their functionality - their primary database is Postgre. Digg has yet to transitation over to Cassandra and generally, their transition has been delayed (even if they have a lot more resources than a regular startup). Facebook only uses Cassandra for inbox search, their primary database is MySQL.

The bottom line is to use NoSQL where it makes sense and where it solves the job better than relational databases.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: