
SparkFun's migration from MariaDB to PostgreSQL - Mister_Snuggles
https://www.sparkfun.com/news/1312
======
strlen
Not to bikeshed, but what about Postgres makes the _big_ scalability story any
different than the MySQL story, unless you're talking about a commercial
distributed RDBMS built on top of Postgres like EnterpriseDB?

In terms of scalability in the small, many of the MySQL 5.6 replication
features skew this comparison in MySQL's direction: multi-threaded (optionally
row-based) replication, global transaction ids, and the like make it easier to
improve availability, add read-slaves, and host multiple databases/shards on a
single machine (replication afaik is no longer per-server, it is now per-
database).

There's also been work on tuning innodb for SSDs (which is a near certain
recommendation for your dataset -- it gives plenty of breathing room to
forestall horizontal partitioning).

I'd also look very heavily into performance with large buffer caches,
compression (great way to reduce IOPS and squeeze the most out of SSD or
memory space), etc... I am curious to see if any of these were compared
between MySQL and Postgres. As far as I understand, innodb's compression is
somewhat more advanced than Postgres, but running either on ZFS is an even
better bet for this.

On the other hand, Postgres has a better query optimizer, supports more
complex data and relation formats, and so on. However, most of these features
aren't going to be used at scale (I am presuming you're talking about OLTP
workloads).

Honestly this is a bit of an unusual decision -- I'd probably start a project
with Postgres and avoid using MySQL until later, but choosing Postgres out of
scalability reasons seems a bit odd. I'm curious to know why!

~~~
gbog
Maybe it was not only for scalability, but also for reliability or embracing
the Do the Right Thing thing?

From my experience with PostgreSQL and MySQL, it is a bit like the difference
between python and php, one is a great work of craftmanship that is reliable,
predictible, coherent, enjoyable to work with and minimise the wtf/mn rate
(which is the best quality measure in software), while the other is a bunch of
hacks knit together to make it work asap and its wtf/mn is skyscraping.

~~~
csmuk
Exactly this.

To put it bluntly after managing both for years, Postgres doesn't scare the
shit out of me like MySQL does. I've had quite a few moments with MySQL doing
stupid things that aren't intuitive or right, particularly in the
backup/restore space. For example backup one schema and restore onto later
versions doesn't always work.

Postgres has the same feel as *BSD i.e. deterministic. It is well documented
and does exactly what the manual says and is devoid of surprises. It feels
engineered and I can provide reliably engineered solutions because of this.

Basically I can sleep at night.

I really give less of a crap about scalability. I'd throw a bigger box at the
problem or use heavy caching. I've built much larger ecommerce solutions with
orders of magnitude more hits/orders than sparkfun on much smaller kit

~~~
Negitivefrags
I have to say, my experience with Postgres is the opposite. Sometimes the
query planner will suddenly decide to pick a bad execution plan.

I have been woken up in the night multiple times and found that the problem is
that Postgres has arbitrarily decided to stop using an index and changed to do
a full table scan and the database has ground to a halt under the load.

Doing an ANALYZE sometimes fixes it. Increasing the statistics target for the
table in addition sometimes fixes it. One time we had to add an extra column
to an index to make it decide to use it even though it shouldn't have needed
it (and didn't a few hours before!).

The developers seem dead set against allowing you to override the query
planner to add determinism.

I'm not saying MySQL is any better, as I have not used that in production
before.

~~~
lucian1900
MySQL is much, much worse. It barely even has a query planner in the first
place and it will often ignore perfectly good indexes.

Query planner changing its mind can be a problem for most DBs, Postgres is not
special in this regard.

~~~
vidarh
But the difference with MySQL is that you can easily forcibly make MySQL use
the indexes you want it to, as long as it is possible.

With Postgres, while it usually seems to do a better job, when it refuses to
use your indexes it can be a massive pain to coax it into doing what you want
it to, even when you know it's making a bad choice.

Postgres has a number of issues like that, where ideological purity has a
negative impact on usability. It still mainly use Postgres these days, but I
regularly swear at it over things I know would be far simple than MySQL for
those reasons (replication used to be a huge one, though that is now finally
getting there, 8 years after I set up my first "large" replicated MySQL setup)

~~~
sergiosgc
Postgresql's explain command actually, while cryptic, contains the full
explanation behind the planners' decision. With that info, it's quite easy to
force it into a given path.

------
Mister_Snuggles
One of the links in this article goes into a little more detail of why they
switched:
[https://www.sparkfun.com/news/1239](https://www.sparkfun.com/news/1239)

    
    
      > See, MySQL and its forks, for all of the really huge 
      > and amazing stuff built on them in the last decade, 
      > have got some drawbacks. On this point, I risk triggering
      > a lot of strong feelings from the sort of people who 
      > have strong feelings about databases, so for the time 
      > being, let's just say that it started to seem like we'd
      > have a better time writing business software on a
      > database that embodied a more rigorous approach to 
      > formal types and relationships. And because of years 
      > of our nerd friends telling us we should probably just 
      > use PostgreSQL, it kind of seemed like PostgreSQL might 
      > be that database, especially since we already knew we 
      > were going to use something with open code.
    

The phrase "...we'd have a better time writing business software on a database
that embodied a more rigorous approach to formal types and relationships." is
what nails it for me. This isn't a technical feature, per se, of PostgreSQL,
it's more of an attitude that's embodied in the product.

------
duncan_bayne
One thing stood out for me in the article:

"As mentioned here before, SparkFun.com is really the tip of a much bigger
iceberg called Sparkle. Sparkle is an internal website where we hang every
tool we need. It does everything but the core accounting."

If you can share, what's the reason you folks went with this architecture?

Beyond DB choices, I'd be very concerned about a design that tied so many
important business functions into the one tech. stack, let alone the same
application.

I'd like to be able to do things like bring internal tools down for
maintenance without hosing the public-facing CMS. Or change database
technologies to suit whichever particular application I was optimising.

~~~
brennen
> If you can share, what's the reason you folks went with this architecture?

In its present state it's a fairly sane architecture for our purposes, but the
only honest & concise answer to this question is that we didn't know what we
were doing and it just kind of worked out that way.

Not coincidentally, this also describes the architecture of the business as a
whole.

~~~
duncan_bayne
Thanks for the reply - & I'm glad it's working out :)

------
lstamour
It sounds like they really wanted PostGIS. Though they never mentioned it. Or
how MySQL has spatial extensions.

Really, I'm at a bit of a loss as to why they switched, but congrats and good
luck. Glad to see another MariaDB mention. Just found Galera myself. Wish
WordPress supported Postgres.

~~~
Frencil
I presume you're recommending PostGIS because of the allusions to the
inventory location problem. While better modelling of geo data would be nice
it's kind of a different concern. For us inventory location isn't about where
on Earth our inventory is, or even where geometrically it is relative to
something else, it's about where in a hierarchy of named locations it is. For
instance: we have 10 of widget X in the storefront stock room, row F, rack 17,
shelf 3, bin 1. Or we have 22 of widget Y in the receiving room on the
quarantine table. Stuff like that. When modelling that many semantic nested
locations and all of the transfers of items between them the primary issue is
scalability and speed, not so much modelling geographic data.

~~~
lstamour
Thanks for the reply. I see in another comment thread you agreed with the
sentiment, "From my experience with PostgreSQL and MySQL, it is a bit like the
difference between python and php, one is a great work of craftmanship that is
reliable, predictible, coherent, enjoyable to work with and minimise the
wtf/mn rate (which is the best quality measure in software), while the other
is a bunch of hacks knit together to make it work asap and its wtf/mn is
skyscraping."

Makes sense. I guess I've always assumed that while MySQL has gotchas, so too
would database X. They come with the territory of databases, being such a
large part of any app, yet separate from it and usually infinitely more
complex than the apps that use them. To that end, I think it's why some people
end up writing their own simpler data stores or caching middleware between
databases/OLAP and front-ends. Such written-from-scratch solutions can be
easier to debug when things are mission-critical.

From what I've seen with databases, the really complicated part is "setting
them up correctly". If you start out with most defaults, something will
eventually become a gotcha forcing a schema rewrite or a database transplant
using an SQL dump. But that's just been my experience: with databases, expect
to not get it 100% right the first time (unless it's not your first time with
the technology, of course). And from that perspective, the technology used is
less important, it will equally have "gotchas".

That all said, I too will be playing around more with Postgres to see how it
compares to the InnoDB/XtraDB I've a hate/love relationship with. ;)

------
rpedela
So why did you migrate? There is a very brief mention of needing to scale an
order of magnitude or two. Could you elaborate?

~~~
brennen
Awright, so I didn't write the linked piece, but Chris isn't talking about
scaling our web traffic by an order of magnitude. We're a small-to-mid-size
web store & light manufacturer and we do decent business and very occasionally
get a bunch of traffic as far as mid-sized web stores go. We're not a big
service provider, and we're never going to be one in the sense where we have
to worry about scale the way that people with Real Problems do.

We do plan to store a bunch more data than we deal with now, but mostly we
migrated because we're looking to build a better ball of mud, to borrow a
perfectly legitimate analogy from elsewhere in the thread. What we'd like to
scale somewhat cleanly is a business which benefits from us building less
broken applications, and PostgreSQL is obviously a less broken foundation for
the kind of software where it matters if you are correct about things.

If anyone would like a reason to let us know at length what incomparable
buffoons we must be for any of writing PHP, rolling our own application
framework, still owning physical hardware, or getting a little overexcited
when the code doesn't break too badly, I will cop to all of those and more in
public, but please don't read the linked post as a strong technical claim
about Hard Problems in the Land of Webscalonia which demands & merits a
detailed refutation. Posts on sparkfun.com don't tend to be written for a web
nerd audience (we sell blinkenlights) and most of us survey a very different
conceptual landscape on a day-to-day basis from the average HN reader.

(For clarity here, I write web stuff at SparkFun, and most of my time lately
has been on this project. Author of linked piece is my boss.)

Edit: And my apologies if this reply to a rather simple question comes off a
bit reactive; consider this my reply to the thread as a whole?

~~~
falcolas
Wait...

> PostgreSQL is obviously a less broken foundation for the kind of software
> where it matters if you are correct about things.

and

> writing PHP, rolling our own application framework

In the same explanation? Seems like you still have some points where you can
make serious gains in your technology stack that would have bought you more
tangible benefits albeit not as much positive publicity.

~~~
brennen
Thanks. I never know I'm actually in a conversation about how I do my job
until this comment shows up.

~~~
falcolas
In my honest opinion, if you choose to change to PostgreSQL because it's "less
broken" but still use PHP as your front end, I feel completely justified in
asking WTF. Sure, you're securing the barn doors nicely, but the rest of the
barn might as well be made of tissue paper.

It is your job, and it's not my place to comment about how to do your job; but
when making such a public announcement, 'less broken' when wrapped with PHP
and custom frameworks isn't a very compelling message.

~~~
brennen
You can always ask WTF. What else is the internet for? I'll try to give you a
face-value answer.

If we could just sort of rewrite all of our code in a nicer language than PHP
(taking it as read that most of us can think of one or two of those) with a
commensurate effort, and the idea somehow just hadn't occurred to me until I
read your comment, I probably would have started the project over my morning
coffee.

In the real world, we have a bunch of existing software, a business to run on
it, and a well-developed understanding of the pitfalls of the language and
environment. There's no conceivable justification for the amount of effort it
would take to rebuild our stuff from scratch in one fell swoop. PHP is a pain
in my ass, I yield to almost no one in my visceral contempt for PHP, but PHP
is not my _problem_ , and it hasn't been for a long time.

What we can do, with a reasonable investment of time and energy, is move more
of our business concerns into a more rigorous data store, and explore building
new features against the data we model well using other languages and tools.
Right now we've got various bits of tooling written in (at least) Ruby, Go, C,
Perl, R, Python, and Node.js. Maybe someday we'll really zone in on something
and figure out that it's a good time to start slicing off chunks of
application and rebuilding them in [better language x] until the PHP has
evaporated. Maybe we'll decide that everything should live in [third party
system x]. Maybe we won't bother with any of this, because PHP gets a little
less heinous with each passing release, we already know what we're doing with
it, and there are bigger fish to fry.

Reasoning by analogy is frequently suspect anyway, but in your picture, I'm
not sure how databases are barn doors. They tend, at least for the kinds of
things we're doing, to be much more fundamental to the framing of the barn. Or
maybe they're the stuff stashed in the hayloft. I dunno. At any rate, I don't
think it's super coherent to argue that all the things must be fixed at once
in order to justify improving one of the things.

~~~
falcolas
> At any rate, I don't think it's super coherent to argue that all the things
> must be fixed at once in order to justify improving one of the things.

I don't disagree with anything you said here, especially this statement above,
but it seriously undermines the blog post and makes it sound more like
"PostgreSQL is the latest hawtness, so we're going with that and grandstand
about it for publicity".

I guess that's my real problem. It makes the whole thing sound like
rearchitecting for the sake of publicity, instead of to address an actual
issue.

------
jeffdavis
I don't like the "right tool for the job" analogy. When you use a hammer to
put in a nail, the hammer isn't a part of the building when you're done with
it. But when you choose a DBMS, you're stuck with it in your architecture for
a long time and it's a huge cost to migrate.

(I'm a postgres community member, so of course I'm glad it's working out for
them, but I just don't like the analogy.)

~~~
Snowda
I think its less hammer more nail vs screw. Both can hold wood to a wall and
will be part of the building when you are done. For example a nail can
splinter the wood which can cause a break down the line while a screw wouldn't
encounter such an issue.

~~~
girvo
That goes even further, a nail gun can be much faster than a drill when doing
large construction, but is more dangerous and easier to hurt yourself or mess
up with.

Ie. MySQL is easier to get started with (well, was, nowadays it's much of a
muchness), and is easier to scale in some situations but you trade for some
other features and useful things.

Screws and nails are both useful, but I wouldn't just rely on one for every
project :)

------
mkesper
As a sidequestion, what's a good ressource for getting into the postgres
world?

~~~
adamzegelin
The PostgreSQL documentation.
[http://www.postgresql.org/docs/9.3/static/index.html](http://www.postgresql.org/docs/9.3/static/index.html)

The introduction tutorial is concise but more than sufficient.

The reference guide is well written, laid-out and covers everything you need:
the database server, its configuration and tools, the SQL language, its
operators, data types and grammar, the built-in procedural languages, the low-
level client library, and all the database internals (system tables, data file
formats, network protocol, query optimiser, indexes, etc).

~~~
brennen
I'd second this. I've pretty much fallen in love with the official docs over
the last few months. They remind me of perldoc in terms of clarity and general
quality.

The mailing lists also seem to carry a lot of good material.

------
gnur
As a hobby-projects-only developer, would there be any reason to switch from
mariaBD to postgreSQL? Scalability is no issue for me (I don't have to scale).
And lately I hardly write any queries myself because of django..

~~~
andybak
Yep. South is much happier with postgrSQL as it can do schema changes in a
transaction and roll them back if there's a problem.

MySQL will just leave you with a half completed schema change which you'll
have to fix by hand.

~~~
gnur
Ah good to know, doing my research also tells me Django 1.7 will include
migrations out of the box with schema change transactions as well. Thanks!

------
icebraining
I have to wonder why are they trying to build an homegrown ERP system.
Inventory location is a solved problem, readily available as Free Software.

~~~
Frencil
Inventory location is one of many, many problems our ERP system is tasked
with. And we've looked at a _lot_ of software packages over the years, both
proprietary and open. In terms of open software the options are actually
pretty limited and have the potential to introduce more problems with
integration and customization than they would purport to solve by bringing in
new suites of features. What is less limited and truly abundant are examples
of other companies in our same position that went full-bore into an off-the-
shelf solution and ended up severely disrupting their business operations and
aborting transitions.

~~~
icebraining
Well, I'm biased since I'm paid to build those systems (we are OpenERP
partners), but if it works for you, I won't argue ;)

~~~
brennen
If nothing else, it's probably time we give OpenERP another look just to get
an idea of how problems are being handled there. Pointers to useful reading
would be welcome, if you've got any.

~~~
icebraining
The best introduction is probably _Integrate You Logistic Processes with
OpenERP_ , a free ebook written by OpenERP S.A. themselves[1]. It's slightly
outdated, since it was published in 2011 and meanwhile version 7.0 has been
released, but the changes to the stock area where limited (except for the UI).

We have a demo instance for version 7.0 [2], but it's in Portuguese (though
you can change the language). OpenERP S.A. offers a free trial.

[1] [http://www.brain-tec.ch/ebusiness-de-2/openERP/logistic-
proz...](http://www.brain-tec.ch/ebusiness-de-2/openERP/logistic-
prozesses.pdf)

[2]
[https://demo7.thinkopensolutions.com/?db=demo&user=demo&pass...](https://demo7.thinkopensolutions.com/?db=demo&user=demo&password=demo)

