
Urban Airship: Postgres to NoSQL and back - samstokes
http://wiki.postgresql.org/images/7/7f/Adam-lowry-postgresopen2011.pdf
======
donw
Maybe this is because of my Ops background, as I was a sysadmin for close to a
decade, but I don't understand the aversion to buying actual hardware.

When you're starting out, you go with something on the cloud -- EC2,
Rackspace, Softlayer, Linode, or whatever cheeses your hamburger.

That's just common sense. There's no reason for a fresh web or mobile startup
to have tens of thousands of dollars of server inventory, bought for the
anticipated rush of people that want to get in on the 'social synchronized
tuba-thumping' craze.

But when you hit the point where you're spending $10k a month on cloud hosting
_and_ your team is having to put out performance fires thanks to the shared
hardware, why not just buy servers and throw them in a datacenter?

Amortized over a year, you'll spend a lot less on capital (network and
hardware), and a lot more on labor, as you'll need at least one more team
member to handle the on-call rotation load.

Apologies if this is rambling -- it's about 7:30am here, and I'm a bit low on
sleep. :)

~~~
LeafStorm
Maybe datacenter companies could offer bare-metal cloud hosting. They rent out
the servers and keep the hardware physically working, your devops team handles
all the software aspects.

~~~
gtuhl
Softlayer offers this as well. You can spec out a box however you want up to
pretty exotic configurations, get billed monthly, and can SSH into them in 3-4
hours.

It is expensive, but not as expensive as something like heroku or aws for
equivalent cores/RAM, and you get real hard drives.

~~~
stoph
We are using Softlayer and we found that on our server we were not
bottlenecked by memory, CPU or disk I/O. However, we were bottlenecked by our
bandwidth which sat around 5 MB/s. Perhaps we could upgrade to a higher-
throughput network card or something (I don't maintain our servers), but we
ended up just offloading static files to a CDN instead.

~~~
gtuhl
That does seem odd. I'd file a ticket and see what they have to say about it.

In certain datacenters (think might be dallas05?) you can even get a 10Gbps
uplink brought to a box so they have pretty big pipes available.

~~~
nivertech
Where can I get dedicated or cloud servers with one or multiple 10GbE NICs?

~~~
gtuhl
I just poked around with some of our servers to verify.

In Softlayer's Dallas facility (selected when ordering a server) 10 Gbps is an
option.

The maximum amount of bandwidth available is 20TB.

Sometimes you can get more options by contacting their sales people directly
instead of using the shopping cart.

------
Spyro7
This is a good presentation. It is a good demonstration that there is no magic
bullet for database scalability problems.

I love NoSQL as much as the next person, but turning straight to NoSQL when
you are faced with scalability problems in a more conventional relational
database is always going to be a mistake. Before you translate everything over
to a NoSQL db, try dropping the ORM (or find another one), looking at your
table structures, or tuning your indexes. If you do this, then there is a
strong possibility that you will save yourself some time, energy, and effort.

It pays to think deeply about your issue. NoSQL should be another tool in the
toolkit, and not the hammer to be used to drive all of your database issues
into the wall.

When you turn to a NoSQL solution, you should not be doing so with a mind to
find a "magic bullet". You should be doing so because you have thought deeply
about the problem and have found that a NoSQL solution answers a specific
need.

If everyone could use NoSQL when it is appropriate to do so, then there will
be less "horror stories" and more illustrations of valid use cases than what
we have today.

~~~
julian37
_try dropping the ORM (or find another one)_

Dropping or replacing the ORM is a big undertaking in anything but small toy
projects.

Better advice is to run a profiler and crank up the ORM's verbosity in order
to determine the extent of overhead imposed by the ORM, and something like
pg_stat_statement and EXPLAIN ANALYZE (in the case of Postgres) to find slow
statements and see why they are slow. This will give you a much better idea of
where time is spent, to what extent things can be optimized, and whether the
ORM is to blame for any performance issues.

------
iamelgringo
One month of EC2 XL with 15GB of memory is roughly $500 a month.

A 90 GB database fits in a single $150 SSD. You can get 1 TB of SSD storage
for $3,000.

"One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz
2007 Opteron or 2007 Xeon processor. This is also the equivalent to an
early-2006 1.7 GHz Xeon processor referenced in our original documentation. "
[1]

You can get a 6 core AMD Phenom II which runs at 3.2Ghz. for $180.

16GB of RAM will set you back $400.

From the sounds of what they went through, spending $10k on decent hardware
might have saved them a man year or two of developer time.

Granted that's not nearly as fun or sexy as trying to use MongoDB, Cassandra
or HBase in production. And, saying that you're going to use actual hardware
is soooo old school.

ref: [1]
[http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_...](http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it)

~~~
robotadam
You are very right; I should have tried to have us move to physical hardware
before we did. Definitely one of the things that I would have done
differently, in hindsight.

The larger EC2 instances (especially for always-on systems like primary
databases) do get quite a bit cheaper with the 1-year reserved instance
reservations, so if you are on EC2 be sure to get those as soon as you're at a
somewhat stable point.

~~~
catch23
I think the introduction of the SSD is causing site owners to "leave the
cloud". I haven't seen any cloud hosting that lets you choose SSD-based disks
yet, probably because it's low demand and high expense. The throughput of a
single sata3 SSD blows away any kind of raid setup you can accomplish on EC2.

Also, I don't think you even need $10k in hardware. Sounds like you could do
just fine with a $3k 1U server. $3k can still get you 16 cores w/ 32GB memory
and SSD drives.

~~~
jacques_chester
SSDs make a _stunning_ difference in performance for large, complex joins that
won't fit in RAM. However, I imagine that in 5-10 years time 'the cloud' will
use them too; even if only as part of a larger storage pool.

------
zobzu
I used PGSQL on such a performance-intensive project as well as nothing came
close while retaining reliability. Had PGSQL devs help me to fine tune some
aspects, they're extremely helpful and knowledgeable. And that is NOT
something I have found even in commercial companies. I mean by that: i've
never had such good help for highly technical issues, ever. And this one was
free and fast.

------
jamwt
I wouldn't be so quick to throw the baby out with the bathwater. They just
made the wrong choices--Riak is fantastic. Having said that, it's incredibly
easy to make the wrong choices right now:

1\. The "NoSQL" field is generally immature and filled with land mines, so a
random selection favors a bad outcome

2\. The mass-appeal rank ordered reputation of these projects is not in line
with their actual quality/robustness, especially when it comes to very high-
demand situations (node counts are high enough to render node failures common,
write loads are heavy, machines are within 50% of their I/O limit).

So, I wouldn't let these reports discourage one too much about the
possibilities of NoSQL, especially the scaling possibilities of fully
distributed databases. It really can be a game changer with a solid
implementation. Riak is one.

(Note: "wrong decision" is not meant to be judgmental, we (bu.mp) are
ourselves just recovering from a similar "wrong decision"... VERY similar. :-)

Agree completely re: bare metal. If you have scaling problems you should have
a very good reason not to be doing dedicated hosting on bare metal IME.

~~~
jeffdavis
"It really can be a game changer with a solid implementation. Riak is one."

Can you write up a case study? What did Riak do right where the others failed?
What are the downsides of it, and why are those downsides more tolerable than
the downsides of other systems?

~~~
pvh
Seconded!

------
dhpye
For a notification-heavy app, I'm surprised at the lack of consideration given
to Queue/Topic Messaging servers. 5B notifications/month isn't quite 2k/s,
which really isn't all that onerous. Routing within messaging apps would seem
like a natural fit.

Or perhaps I'm missing something?

~~~
robotadam
We make heavy use of queue and internal messaging systems. This discussion was
solely about the data storage layers, which always causes us more headaches
than the messaging/queuing systems.

~~~
moe
You should check out redis if you haven't yet.

It's ideal for the counter use-case and with bit of smart thinking you can fit
a surprising amount of data into a 32G or 64G machine (and then you can always
shard).

------
apinstein
Just a quick datapoint for you on EC2 vs SSD... we tried restoring a 30GB
pgsql database to a Large EC2 with RAID0'd EBS stores for testing purposes. It
took ~4 days.

I tried doing the same thing on a Macbook Pro i7/SSD and it took ~1 hour.

EBS disk performance is reliable, but miserable.

~~~
moe
_EBS disk performance is reliable, but miserable._

EBS performance is unreliable and miserable. There, fixed that for you.

However, EBS also has a couple things going for it. Namely: easy snapshots,
fast/seamless provisioning (need another 1T?), mobility (need that volume on
another instance?) and quite a fair price point when you take all that into
account.

~~~
whacker
easy snapshots, fast/seamless provisioning (need another 1T?)

All of which are doable with LVM.

~~~
moe
Sure, within the boundaries of your disk array. Assuming a non-trivial size
you'll need 1-2 fulltime people to run and scale that array, plus a nice chunk
of investment into hardware.

EBS is just a mouse-click.

~~~
count
If you are so unconcerned with your storage that you're just mouse clicking
around EBS setups, you don't' need 1-2 full-time people to run an array.

~~~
moe
Sounds like you've never dealt with storage at scale.

It's far from "rack and forget" once you've grow beyond 2 racks.

~~~
count
I think you read the opposite of what I was trying to say.

I have dealt with storage 'at scale' enough to require a full team of storage
people.

Some guy clicking buttons on the Amazon website is usually not a very
effective way of dealing with storage 'at scale'.

~~~
moe
Yes, sorry for reading you backwards.

However I maintain that EBS has it's place in a wide range of applications
where raw I/O performance isn't the main concern (e.g. archival).

------
zzzeek
Not to mention they're using one of those nasty "O-R-M"s we all were told were
an antipattern yesterday.

~~~
pbh
SQLAlchemy (and its excellent documentation) is the reason I switched to using
an ORM.

(Now that I am in Ruby land though, I am a little sad that
arel/ActiveRecord/DataMapper do not seem as on their game.)

~~~
ssmoot
You should try Sequel. It does things AR and DM don't (CPK+FK for example), is
generally faster, and gives you the flexibility of working with Models or raw
data-sets depending on what the situation calls for.

I've used and written a number of O/RMs and Sequel is definitely the best.

~~~
pbh
This is really helpful. I will definitely check it out.

Also, I'm really amused that apparently the creator of DataMapper is (a)
selling me on using Sequel and (b) the testimonial on their front page. (Not
that there's anything wrong with that.)

~~~
bluehavana
Sequel and DataMapper have different purposes I think. Sequel is a lot more
transparent in the translation between the ORM and SQL and allows a more
powerful control of SQL statements. DataMapper allows some very powerful
controls on the ORM and Ruby syntax side.

------
megaman821
One of the easier things to do is pull out all the counters and sessions and
put them in Redis or Riak. No use in tying up your database doing things it is
not well suited for anyway.

~~~
sehugg
Yes, Redis rarely keeps me up at night, besides making me worry about what
I'll do when I run out of RAM :)

~~~
antirez
In a "just counters" scenario you can't imagine how much data a single Redis
server can hold! It is sensible to understand when it is a good idea to use a
database that holds data in memory, as there are obvious limits (that are are
trying to break in a comfortable way with Redis Cluster instead of forcing
client-side sharding), but on the other hand it is good to do some math before
being worried.

Example: Redis can hold 1 million counters, every counter stored in a
different key, for every 200 MB of memory. This means 5 million counters per
gigabyte. Clearly you can have a lot of counters. But you can make this figure
a lot higher if you aggregate counters using hashes: with this trick when
applicable the above figure will be five times bigger.

~~~
mike_esspe
Can you hint how many distinct counter increments can i expect to have per
second with Redis?

~~~
antirez
100k increments per second in an entry level Linux box. 50k increments in a
decent EC2 instance. 150k - 180k increments per second in a fast modern Linux
box.

------
jerrya
This was really interesting, thank you.

If anyone reads this, note that in Opera at least, clicking the builtin pdf
reader's down button skips some very informative notes that Adam left in the
presentation. Press page down instead.

~~~
Fizzer
Oh, wow, I missed those comments completely since I was using the arrow keys
to navigate in Chrome. Thanks for pointing this out.

------
robotadam
Thanks for the kind words, everyone. When I was writing the talk I was worried
it'd be too tied to our/my experience, but at least people found it a little
entertaining. The video should be posted soon; the Q&A was interesting and
brought up some pieces that I glossed over during writing.

------
absconditus
I was hoping that Urban Airship actually made urban airships.

~~~
drats
Good to live in an age where this assumption is perfectly reasonable. Here's a
link to salve your disappointment (FPV remote control planes, music
soundtrack):

<http://www.youtube.com/watch?v=MUGvlMDRC98>

------
krisw
It's fascinating to read another company having evaluated NoSQL and come to an
identical conclusion as we did internally.

I love the idea of NoSQL, but Cassandra was horrible (just look their source
code or Thrift) and Mongo lost data. I guess 40 years of relational databases
isn't so easy to replace.

~~~
eonnen
To be clear, we are still heavy users of Cassandra. We try and objectively
match the tool to the problem and in some cases, PostgreSQL was a better fit
but not all. In some cases HBase was a better fit.

Also to be clear, scaling is difficult, no matter what the tool. We've had
problems with Cassandra, HBase and PostgreSQL (most recently Friday), no
storage option is as good as we would like under stress.

~~~
robotadam
I definitely could have been more clear on that. Cassandra has so many great
properties, and when we made the decision to use Postgres for the large
dataset under question was shortly after 0.7 was released, and it took a while
to get more stable.

------
cmer
A little bit unrelated, but as of _today_ , what would be the advantages of
choosing pgsql over mysql for someone who already has decent knowledge of
mysql? Is it still a matter of preference (and being able to deal with the
mysql license) or did pgsql progress so much over the last 3-4 years that it
became superior? What are the pros over mysql?

I'd consider switching to pgsql for a future project but throwing my mysql
expertise out the window for nothing seems like a bit waste of time.

~~~
alexg0
This has a pretty decent, if dated list: <http://sql-
info.de/mysql/gotchas.html>

Few of the items on that list bit me personally.

~~~
cmer
I'm more interested in knowing what PgSQL does better than MySQL today, in
2011. MySQL definitely has a few caveats, my question is mostly, is PgSQL now
significantly better than MySQL or they're both more or less the same, with
different pros and cons.

~~~
leandrod
No, PostgreSQL is way better.

It has a mostly compliant SQL dialect.

It is more compatible with other DBMSs, specially DB2 but also Oracle.

It implements a host of procedural programming languages and extensions,
including types and operators.

It is way faster, and scales way more.

It is totally free (no paying Oracle to do InnoDB hot backups).

It is way more consistent.

It does geographic data second to none.

Its development is totally open and way faster. Reading the PostgreSQL TODO
wiki page is a joy.

And so on and so on.

------
mattslight
I can summarise this presentation as:

    
    
      * Postgre 8.3 on AWS not good enough
      -> lets try Mongo
      * Mongo performance not good enough
      -> lets try and fix this
      * Oh shit we don't know what we're doing lets go back to Postgre
      * There's a new version 9.0 this might sort us out
      * No sadly Postgres 9.0 is no better
      * Oh wait maybe Amazon performance is to blame
      * Lets try it on real hardware
      * Oh yes that's better

------
asomiv
If only there are tools for sharding PostgreSQL. Easy sharding is still a
major weakness in relational databases.

~~~
moe
SQLalchemy (the orm that they're using) actually comes with some nice sharding
hooks and examples.

I used to complain about the lack of "out-of-the-box" sharding middleware
myself, but came to realize that this is really something that belongs in the
application layer for all but the most trivial use-cases.

However, there is no reason why full-stack frameworks such as Django and Rails
couldn't grow meaningful sharding support.

~~~
zzzeek
We've got a rudimental hook sure, but it's not what I'd call "easy"...it still
needs to be attended to with custom logic anytime you need to cross shard
boundaries.

~~~
moe
I've been there, so can confirm it's not "easy".

But at least it gave me starting point, which is already more than most other
ORMs can offer.

------
ww520
The last slide mentions automatically failover is still needed. For that, look
into DRDB + HeartBeat on Linux. They allow you to build automatically failover
cluster easily.

In my past life, I've built high available MySQL cluster using the combo and
it worked great. The same technique can be applied to Postgres.

------
jbellis
From the slides it's not 100% clear, but as Erik from Urban Airship commented
here [1], "[they] are still heavy users of Cassandra."

[1] <http://news.ycombinator.com/item?id=3015475>

------
moe
I was a bit disappointed to see the HBase slide blank.

It has some interesting properties and it's rare to find real-world rapports
about how it fares outside of Hadoop.

~~~
robotadam
Sorry about that -- I couldn't speak as much about our Hbase experience, but I
can speak to the toll its taken on our ops team so far, which is very high,
with lots of crashes.

~~~
eonnen
To clarify, we've had lots of region server crashes, mostly due to our own
data model and generally not as a result of any intrinsic fault of HBase. To
my knowledge, not many of these (any?) have actually resulted in a total HBase
failure. The system generally degrades as it should.

------
jforman
How many startup man-hours were spent tracking that Ubuntu bug down on
Cassandra instances? I've heard several nightmarish stories.

------
philwise
One thing to note is that dedicated server suppliers will foot the capital
cost of servers for you, so there is no need to find the cash for the server
up front. 100tb.com do this (I've not used them, know people who do) for
example.

------
bartz
What I really didn't understand is why they even decided to switch to NoSQL in
the first place. I mean Urban Airship seems like the perfect example of data
is completely relational and schema-driven.

------
whacker
Direct link to pdf.

[http://wiki.postgresql.org/images/7/7f/Adam-lowry-
postgresop...](http://wiki.postgresql.org/images/7/7f/Adam-lowry-
postgresopen2011.pdf)

------
foobarbazetc
Real hardware was the answer?

Really guys? :)

My recommendation: switch to SoftLayer. Save thousands a month, use real
hardware.

~~~
ericd
I love SoftLayer (I use them), but I wouldn't say they're the cheapest by any
stretch of the imagination.

------
pointyhat
tl;dr: "PostgreSQL is still my weapon of choice — it doesn’t keep us up at
night."

I agree with the guy. PostgreSQL (and SQL server ironically) have kept me up
less than any other piece of software out there.

I'm watching a turd occuring with MongoDB at the minute. Every time something
breaks, I hear the team saying "it'll be fixed in the next version" or
"feature X works around it for now". Grr. Too much marketing, not enough
reliability study.

~~~
sheff
I believe this was one of the talks at Postgres Open - I came across the
slides from most of the others earlier at
<http://wiki.postgresql.org/wiki/Postgres_Open_Talks_2011>, and they have some
good tips on scaling with Postgres.

There was one interesting tidbit in one of the Heroku presentations : they
apparently run over 150,000 databases ( <https://wal-e-
pgopen2011.herokuapp.com/#6> ) which is incredible.

~~~
robotadam
The Heroku guys are doing some seriously awesome things with Postgres. I can't
wait to see where they go with it; the whole community will benefit, I'm sure.

------
startupcto
I think I'm missing something from the presentation slides. Why would an API
call result in 5 Millions writes? I think he is trying to give a scenario of
an API call that creates a rich push for 5M devices. Even queuing the writes
is not going to solve the problem.

Fundamentally, I wouldn't have designed the messaging model that would require
a write for each client device for each message.

~~~
tzs
Perhaps they have to record if/when each client got the message. How would one
record that without one write per client?

~~~
drblast
It seemed odd to me too that a single API call could cause that much work.
Granted, I have no idea how the actual app works. But one way to do this is to
have clients belong to groups, then record if a group got a message (few
writes).

Then for each client pull the groups the client belongs to and look for new
messages (possibly more reads than storing the messages on a per-client
basis).

It's an extra database table and layer of indirection, but that's one way.

------
jerhewet
"I laughed, I cried, I experienced every possible human emotion! It was better
than 'Cats'!"

