

Ask HN: Might as well start with SQL and migrate to NoSQL later - Supermighty

All of the articles I've read recently talk about the successful migration of very large websites to NoSQL solutions. The argument against are that they NoSQL is only needed for a very small percentage of websites. That coupled with the much larger SQL tools and knowledge base and uncertainty that any new website will need such scaling leads me to my question.<p>Why not just start with SQL and plan a migrate to a NoSQL solution later, when it's really needed?<p>It's certainly doable. I feel that the benefit of a NoSQL solution is, at this time, outweighed by the ease of an SQL solution for such a majority of sites that to plan from the beginning for a NoSQL solution is overkill.<p>Thoughts?
======
jasonkester
Given that HN is primarily peopled by founders of small bootstrapped startups,
it's surprising how many advocates of key/value stores you see around here.

Overgeneralizing, key/value datastores are useful for enormulous things that
operate on Twitter/Facebook scale and handle loads measured in thousands of
operations per second. Looking at a few existing services that need that sort
of throughput, you'll quickly notice that they're all backed by tens of
millions of VC and have big teams they can devote to dealing with the overhead
needed to build Big Scalable Things.

On the other hand, if you look at the dozens of "little services that charge
people money to use them" that the typical bootstrapped startup founder should
be striving to build, you'll notice that they all would work just fine on a
relational database. So given that there are tools to get you shipped 10X as
fast on an oldskool RDMS, it's surprising to see that ANY of these lean little
startups would choose to go with NoSQL right out of the gate.

~~~
Supermighty
This is very close to my line of thinking.

What real benefit is there starting out with NoSQL when most websites really
don't need to scale that big.

~~~
Supermighty
Though I do think there will come a time when it will be just as easy to start
with NoSQL as SQL.

------
Zak
I think another benefit of NoSQL datastores is the ease of making changes on a
live system. There's a great presentation[0] by Steve Huffman (in part) about
how reddit ended up using Postgres as a key-value store as much for ease of
adding new features as performance. Go with what makes sense for your
application and worry about scaling when you have users.

[0] <http://news.ycombinator.com/item?id=1330552>

~~~
fauigerzigerk
Not using a relational database alone doesn't make changes any easier. Often
times the complexity is just moved on to a different place or time. Of course
a completely generic data model makes it easier to make a particular
modification on a live system, but it makes understanding the system and the
consequences of that change harder.

The question is really which parts of the entire system depend on particular
invariants and constraints? Where are they checked and enforced? How are the
rules represented? Moving lots of rules out of declarative and into procedural
code won't make things simpler.

I have experimented a lot of with all kinds of data representations, non-
relational ones mostly, but I have to say, there are trade-offs that are
rarely discussed in a NoSQL context. Probably because NoSQL was primarily the
answer to "how do we do something incredibly simple on an incredibly huge
scale?"

Many years of doing data integration have convinced me of one thing. Closely
linking application code to data makes things simpler in the beginning but it
leads to a completely unmaintainable mess rather more quickly than people
anticipate. Data and applications have a very different life-cycle. Data lives
longer and there is rarely a 1:1 relationship between application and data.
That's actually where the real impedance mismatch originates, and I'm afraid
it will never go away or become much simpler.

------
japherwocky
For me, half of the advantage of NoSQL is that it's much quicker and easier to
build with, because you don't have to worry about schemas.

I'd go the opposite - start with NoSQL, and migrate to SQL if you need some
more formalized relationships.

------
mark_l_watson
For Ruby, using datamapper can allow you to (for example) use both a
relational database and MongoDB together. You could also have a migration
strategy based on datamapper.

That said, I would suggest picking two data stores like PostgreSQL and MongoDB
and write some code examples, play around with data models for your app. If
you spend several hours doing this then you can make a better decision.

BTW I have tried most of the NoSQL data stores and MongoDB is probably the
easiest to use from a developer's point of view. I have used bindings for Ruby
and Clojure and being able to store things like Ruby or Clojure maps directly
in a documents is great. That said, PostgreSQL with built in text indexing
support, etc., etc. is also a great tool. I think that if you become fluent
with PG and MongoDB then that is sufficient for a wide range of problems. For
my development, I toss RDF data stores into the mix, but that might just
complicate things for you.

Anyway, pick one of each, have fun playing with each one before making a big
commitment.

------
fjabre
First of all NoSQL isn't really that hard and the performance gains are
indisputable.

For a bootstrapped startup nosql makes total sense b/c you get much more
performance for your dollar.

Sure you can go the vanilla PHP/MySql or Django/Postgres route if you want to
build your prototype but you will be in a bad position if your system can't
scale with demand.

I would build the prototype first in your favorite framework/setup.. If it has
market potential after some initial private beta testing I'd definitely go the
nosql route.

------
jrussbowman
I am basing my startup with MongoDB and Memcached as my storage choices,
rather than MySQL, for performance reasons. I can fit nginx, 2 instances of my
tornado application, a small memcached slice, and mongodb on a single 256MB
memory cloud machine, and am confident when I start driving a bit of traffic
too it, I'll just need to bump it up to 512MB of memory to make sure I don't
swap.

The real advantage will be when I get to the point of seeing real traffic, I
can scale horizontally on inexpensive machines. I'll be able to maxmize all
parts of physical machines, when I make to the move to maintaining my own
infrastructure.

Honestly, I'm not sure why I'd want to use SQL at all, especially for a
project in it's infancy. With many NoSQL implementations you can adjust your
schema as you go, instead of locking yourself early into a schema that may not
fit a few years down the road, and it becomes a very difficult project to make
changes to it.

The caveat being that everything I'm doing works with with a key/value table.
My data set and how I access it wouldn't take advantage of a lot of what SQL
provides. So, if your application does benefit from SQL, then by all means use
it.

------
epoweripi
if you know SQL well, start with that.

If you think non-relational makes sense for your use case, try it out for
smaller, non-critical parts of your system till your understanding grows. you
can then decide more confidently

------
lsb
The model for your data depends on your data. What data are you trying to
store?

------
DennisP
It's not just about massive scale. I like SQL, but my experience is entirely
on MS SQL, and hosting costs for that are pretty high. So I can either go with
mysql/postgres, or a nosql platform. I have to learn something either way.
It's looking to me like nossql can be simpler, with less administration, and
very high performance on limited resources.

Redis, for example, is much more about making single boxes perform like crazy,
rather than massive scaling (although they are working on a clustering for
version 2.0).

Cassandra is mostly about the scaling, but is also very fast (though not as
fast as redis), and if you set it up on three nodes you've got high
availability. Since I plan to keep my day job, I can't drop everything to deal
with server issues. I want something that just keeps on trucking no matter
what.

------
starkfist
You kind of have to go with the tech you know best, otherwise getting started
is going to take too long. If you already know a SQL database system, go with
that. If you don't know an RDBMS or a NoSQL system, it might be worth learning
the NoSQL thing.

The reason is, if you're successful, the cost of migrating later on is going
to be larger than you expect. You're going to be spending a lot of time doing
a lot of back-end work that users never see. Once you are dealing with huge
amounts of data, it takes hours and possibly days for data migrations and
conversions to even run. You screw up a data import that took 4 hours... 4
hours is now killed and you have to do it over again. The more data you have,
the more potential time these types of issues will absorb.

------
MicahWedemeyer
The choice between SQL/NoSQL should be made based on the features of each and
what you need, and it's definitely not an exclusive choice. Remeber: _NoSQL ==
"Not Only SQL"_ Use all tools where they make sense.

Construction workers use both nails and screws, even though they do much the
same thing. It's just a question of what's appropriate for a given task.

Finally, if it's only a question of performance, I'd suggest tabling that
question until it's a real issue. Scale when you need to, not before. Only
once you have a "real" system with scaling stress can you actually see what
needs to be optimized. Before that, it's all theory.

------
RobGR
Get up and running and making money as fast as possible without being too
stupid.

This might mean using a NoSQL database. MongoDB seems easy to try out, it
could be something you'd use for some component.

But most projects fail before they ever reach some minimally useful state, not
because they don't scale. Plan ahead for scaling where appropriate, but the
main focus should be getting it out there in a useful state.

------
daleharvey
performance/scalability isnt the only reason people go with "nosql" solutions,
some people just really dont like working with traditional rdbms's

a lot of these stores let you have tighter integration with your language (and
its data formats), match your mental model better and sometimes have pretty
revolutionary things that you just cant do easily any other way (couchapps +
replication etc)

~~~
Supermighty
All good reasons, but for me and mine it's still easier to start with MySQL.
Now I say that coming from a lazy standpoint. I use debian and do my best to
not stray from the default install. Having NoSQL servers in the repo and books
on the self go a long way towards a rich and vibrant ecosystem that makes it
easier to get started with.

I know CouchDB is in debian stable, but it's at .8 last time I checked and >.9
breaks some things between the two. I would feel more comfortable once a
solution is in the repo and basically feature stable.

~~~
peterbraden
CouchDB is pretty stable. The BBC are using it live, so it's safe for
production use.

~~~
Supermighty
I'm not saying it isn't stable, rather it's still in development. I don't want
to use couchdb .8 in the debian stable repo only to have to recode my app when
I upgrade to .9 or .10 because of breaking changes.

<http://wiki.apache.org/couchdb/Breaking_changes>

------
nadim
Try one. If it's not doing what you want, try the other way. It's more
important to start on something than to obsess over these things :)

------
billswift
It makes a lot more sense to start with a relational DB, because as pg notes,
you are likely to make at least one change of direction in your startup, and
various NoSQL databases are _optimized for different things_. Beginning with a
common RDBM like MySQL will help you get started quicker and you can optimize
or upgrade later, if it ever becomes necessary.

------
whyme
You could always start with MySQL then migrate quite easily to clustrix since
they support the MySQL API. They're expensive, but you may find that by the
time you need to scale to that degree you can easily afford them.

<http://www.clustrix.com/>

~~~
cx01
Agreed. I think this is an area where we're going to see a lot of competition
in the next years so prices should go down quickly.

------
tszming
quote from Peter:

There are 2 things you should ask yourself. First is the scale comparable –
the recipes from Facebook, YouTube, Yahoo, are not good for like 99.9% of the
applications because they are not even remotely close in size and so capacity
requirements. Second if this “smart thing” was truly thought out architecture
choice in beginning or it was the choice within code base constrains they had,
and so you might not have.

[http://www.mysqlperformanceblog.com/2009/03/01/kiss-kiss-
kis...](http://www.mysqlperformanceblog.com/2009/03/01/kiss-kiss-kiss/)

------
gorog
Adapting your database models to an exotic NoSQL may eventually alter your
functionality. Users don't like change.

~~~
Supermighty
reddit, digg and others were able to make the transition without altering
their functionality. I don't see this as a very hard argument for not.

Even if it were the case, careful planning for the transition could mitigate
distasteful changes.

~~~
amix
Reddit uses Cassandra for a very small part of their functionality - their
primary database is Postgre. Digg has yet to transitation over to Cassandra
and generally, their transition has been delayed (even if they have a lot more
resources than a regular startup). Facebook only uses Cassandra for inbox
search, their primary database is MySQL.

The bottom line is to use NoSQL where it makes sense and where it solves the
job better than relational databases.

