
Ask HN: Going from Redis to MySQL. Good idea? - gary4gar
We are building a SaaS backend for restaurants using Rails. We integrate directly with POS, so each POS keeps sending customer orders that we store for later processing. we have this POS integration going at about 1,000 locations which send us about 3 Million individual customer orders on monthly basis.<p>for this write-heavy app, we store all orders in redis which is working beautifully. We are growing at incredible pace, we keep adding new restaurants with hundreds of locations that keep sending us crazy amount of data. Except there is one problem -- redis keeps running out of memory every month! As, everything which doesn&#x27;t have to be in memory is in memory.<p>This is why we contemplating to switch to mysql. As we really don&#x27;t need to keep all data in memory. here are we numbers of current redis database:<p><pre><code>      used_memory_human:39.83G 
      dbsize: 34706870
</code></pre>
Here is what we store in redis as Hash:<p><pre><code>      id - integer
      location_id - integer
      stored_at  - timestamp
      token - string
      transaction_no - integer
      menu_items - string(comma seprated list of all menu items that customer ordered along with their price &amp; Qty)
      order_amount - decimal
      order_subtotal_amount - decimal
      order_amount_payable - decimal
      order_datetime - timestamp
      employee_id - integer
      employee_name - string
      pos_type - string
      post_version - string
      restaurant_id - integer
      
</code></pre>
So, looking for some advice on:<p>1. moving from redis to mysql is good idea? how will it effect us in long run as we will need to keep updating our indexes &amp; partition scheme to cater to huge demand.<p>2. What other databases(relational or non-relational) would be suited for this use case than redis?<p>3. Or we are all wrong, as redis is made for storing this type of data. so, we just keep using redis &amp; upgrading our machines every month?
======
justinsb
Yes, I think it's a great idea.

First of all, 3 million rows a month is not that much from a database point of
view - if evenly distributed (though I'm sure it's not), it would be about 1
per second. (It is impressive to me from a business point of view though). You
don't need Redis's throughput, and you are hitting Redis's limits with respect
to memory.

1) This is _exactly_ the use case which relational databases were built and
optimized for (fixed schema, OLTP, presumably some analytic queries). You
might want to normalize the data a little (e.g. the menu_items, employee_name,
pos_type & pos_version), but you don't have to - though this would enable
faster & richer querying.

On your schema changes: I bet you'll be much happier using a relational
database which takes care of this stuff for you, then you will be implementing
it yourself in Redis. You probably won't need partitioning either. If you do
end up needing partitioning, I would guess your DB is "trivial" to partition
by restaurant.

I suspect you'll also find that a relational database stores the data more
efficiently, particularly if you normalize a few things. Your data will
probably fit into RAM again, but a relational database can cope even when it
doesn't.

2) Any relational database would be a great match for this, and would allow
rich querying (e.g. by employee, by date, by restaurant). My personal bias is
to prefer Postgres.

3) I don't see what Redis is getting you here. It doesn't support rich
querying, is memory bound, and you don't need the throughput it promises.

------
thinkingkong
I'll try and answer your 3 questions.

1\. As with most things it depends, but I'll make the mistake of assuming you
don't need all of this data in memory and you won't be iterating over the
entire set of data on a regular (<1m) basis. In that case, it most likely
makes sense to move to a relational database. You should ask yourself what
kind of queries you're running on the data and pick something appropriate. But
for the data you're listing, then yeah - go with a relational DB.

2\. Again "It depends". Postgresql is the most popular alternative to MySQL
and I'd say at this point is more in vogue with the developer community. It's
supported on AWS RDS and Heroku has a great postgres.app for OSX. Your data is
so simple that I'd encourage you to start there. In the future when you have
specific needs then you can research more appropriate database technologies.

3\. From what you've shown us then you should most likely move to a relational
database. The size should shrink dramatically, too.

~~~
justinsb
If you're iterating over the entire dataset regularly though, you should
_definitely_ use a relational database over Redis, because a relational
database lets you push processing onto the server. With redis you'll have to
fetch every row over the network; with a SQL database you'll typically only
retrieve the final summary data.

The OP has 40GB of data, so to query that even once per minute with Redis
would effectively max out 10Gb ethernet.

Of course, the real win is that something that would be a table scan in Redis
could be a indexed query in SQL. Doing a table scan over 40GB of data is
always going to be painful.

~~~
resca79
>With redis you'll have to fetch every row over the network; >with a SQL
database you'll typically only retrieve the final >summary data.

Redis supports filter features native, so it basically works as every remote
service.

~~~
justinsb
Do you mean HSCAN with MATCH? Thanks for reminding me about that. I think it's
so limited that the point still stands though.

~~~
itamarhaber
You can actually push all the processing to Redis with Lua without having to
fetch a single "row" over the network... I'm not recommending this as a rule
of thumb - on the contrary, you could easily tie down your (any type of) DB
with application logic - but in some cases it makes sense.

------
hkarthik
Any relational DB will work but I would look at Postgres before going with
MySQL.

------
nielskrijger
Without going into any level of technical depth, I'll offer my two cents.

"Write-heavy" means your application processes thousands of records per
minute. Three million orders on a monthly basis doesn't look like too big of a
deal. Any database should be able to manage that easily. Redis would have been
my last pick to do that. Redis I only use when to keep high-volatile non-
essential data in memory, not much else.

Key in picking your database is how your data is being used. Are the most
recent data records accessed most often and the rest almost never? MongoDB and
an RDBMS will do quite nicely. Do you really expect _extremely_ heavy growth,
a NoSQL datastore might be better. Are DBA's in short supply? Use a managed
datastore.

As a big fan of NoSQL, I'd say; be cautious when using NoSQL datastores. Any
SQL database will do the majority of workloads quite nicely and offer you with
plenty of tools to do any type of query you might need to. NoSQL databases do
analytical queries usually quite poorly; separating OLTP and OLAP is painful
and costly for smaller apps, only for that reason it is best to avoid them in
most circumstances.

Above all; use what you can run cheapest. The available skillset should play a
role too, if most engineers are familiar with SQL Server, use SQL Server... I
generally use MySQL or PostgreSQL on RDS (AWS) given DBA's are hard to find. I
use DynamoDB when I have extremely high dataloads. I use MongoDB when
developing a common app for NodeJS because NodeJS simply works very well with
MongoDB… I consider other NoSQL datastores only when processing many millions
of records per day. For all the rest (and majority of use cases); pick an
RDBMS.

// In production I've used Oracle DB, MySQL, PostgreSQL, MongoDB, DynamoDB,
Cassandra, Google Cloud Datastore, Redis, RedShift and Elasticsearch.

------
sganguly
Hi Gary4gar, Read the points that you stated in your question. If you focus on
growth (clearly your wins are going to bring more data updates as your
increase the customer base) you should look into a solution that does not tie
you down with data model and also does not expect you to worry about sharding
in the future. Having done sharding and when one keeps growing the data volume
and having to do shards it becomes painful and hurts the development flow.
Been there many time and done it quite a bit myself. (Am a technical guys
myself,
[http://www.linkedin.com/in/sukantag](http://www.linkedin.com/in/sukantag))

Look for a NoSQL solution which will help in the flexi-schema model. Helps you
add different types of data models relatively easily. Check out Aerospike
([http://www.aerospike.com](http://www.aerospike.com)), the product scales
like scaling problems have disappeared from this planet. All of the
characteristics that you would need based on what you have stated is in the
product. Best of all it is free for startups. Just give it shot. Open Source
server and client libraries in multiple programming languages. (Am a big "C"
and Python guys so like them client libs a lot). Lots of startups and many
large software companies use them.

If you like it and works for your application tell others, if not tell us what
did not work.

I ain't sellin and you ain't buyin;

Ciao

------
kruk
1,2) As so many have pointed out already, this type of data is what relational
databases excel at manipulating. Unless you already have a MySQL server
running, I'd recommend you try PostgreSQL. It offers great performance and
scalability.

3) Redis is a great tool in its own right but it's more of a key-value data
store than a full fledged database. It's fantastic for caching, tasks queuing
or cross application communication.

------
ColinCera
If you don't want to worry about database/server administration and scaling
issues, I'd recommend looking at Cloudant. It's quite inexpensive, takes
virtually no administration effort, scales automatically, and is fast and
flexible.

You could continue to use Redis as a write cache, although a message queue or
something like Kafka might be better for that purpose, and of course as a read
cache, while using Cloudant or some other "real" database for permanent
storage.

(On a side note, you described your application as "write-heavy" but you also
said you get "about 3 Million individual customer orders on monthly basis" —
which works out to just slightly more than one record to insert per second, on
average, which doesn't seem "write-heavy" to me. I understand that you get
orders submitted in batches, rather than evenly distributed, but it still
doesn't seem especially write heavy. At any rate, a database like Cloudant can
take as many writes as you want to throw at it.)

(Disclaimer: I don't work for Cloudant; I am a satisfied customer; I don't use
Cloudant for everything.)

------
rmetzler
TL;DR: you don't have to use one or another, use both!

Your use case, running a restaurant is certainly a mixed use case. You'll
certainly want to query the latest orders if this is going from the waiter to
the kitchen. But yesterday's orders aren't that much important.

As others have mentioned, this data is exactly the kind of data you would
store in a relational database. Just be sure to not query for the last 10
orders or something like that. I'm not a real database expert, but I've made
this mistake before and as far as I know ordering by date usually means a full
table scan.

RDBMS shine when you just store your data and run interesting queries on that
data whenever you want to answer a question about that data. Redis is the
other way around. If you want to query your data, you'll have to think about
your question ahead of time and store it in an appropriate data structure.

What I was wondering is, do you shard your data in different databases? I
would guess, every customer should have their on

~~~
justinsb
If you have an index on the date, then selecting the last 10 orders will be
very cheap, and there's no reason not to do that in a relational database - it
should be near-instantaneous.

The trade-off here is that every index you add makes insertion a little bit
slower, so you don't normally want to just add an index for every query.

~~~
rmetzler
Ok, thank you. Now you got me thinking, what did I do wrong with this Oracle 9
DB several years ago? :/

~~~
justinsb
Perhaps you selected the last ten orders for a particular customer, or
something like that. In that case, the query planner would have to choose
whether to scan the entire table, or to scan the timestamp index and then
filter by customer. If you had a lot of customers, it is reasonable to choose
the table scan.

Of course, for that query, you could create an index on customer_id &
timestamp, and then it would be near-instant again!

------
bendmorris
Why did you choose Redis in the first place?

Redis does have its own (official but not at a stable release yet) cluster
implementation. ([http://redis.io/topics/cluster-
spec](http://redis.io/topics/cluster-spec)) I'm not sure how production ready
it is - has anyone here tried it?

~~~
Oculus
Twitter probably runs the largest Redis cluster in the world[1]. It's
certainly doable, but depends how much engineering resources you're willing to
throw at the problem.

1 - [http://youtu.be/rP9EKvWt0zo](http://youtu.be/rP9EKvWt0zo)

Edit: I should point out that this was before Redis Clustering was built into
core. I'm trying to say the problem is solvable, but difficult.

------
itamarhaber
Regarding 3 - I don't know if you (or anyone for that matter) are wrong or
right about this (or anything), but there are ways to use Redis w/o upgrading
to bigger machines. Redis cluster, as mentioned in @bendmorris comment, is one
way to go about it.

Alternatively, if you want to continue using Redis - with the current or a
slimmer version of the data model - without having to worry about machine
upgrades or clusters, then I recommend that you look at
[http://redislabs.com/redis-cloud](http://redislabs.com/redis-cloud)
(disclaimer - I work at Redis Labs). Our managed service is built using
production-proven Redis clusters that allow your database to scale well beyond
the limits of any single node, without you having do anything or change a line
of code.

------
ScottBurson
I wouldn't even consider using MySQL. Postgres (aka PostgreSQL) is an
industrial-strength open-source RDBMS.

I can't comment on the non-relational options as I've never used them, but for
your data volume I would expect Postgres to be adequate.

~~~
michaelmior
MySQL is also an industrial-strength open-source RDBMS. Postgres might a
better option, but MySQL is worth considering as well.

------
auganov
Would be helpful if you shared what your reads are like. Are they limited to a
single customer or span the whole db? How fast do you need the "read-side" [or
some portion of it] to be aware of the new writes? Also remember that there's
nothing stopping you from using multiple databases. I.e. if you run complex
analytically queries that are just impossible to execute even in soft real
time it's very reasonable to have a separate database for that does that very
well.

------
revelation
I don't have an opinion re redis to MySQL, but it seems like right now the
cheapest options is to simply buy a server with 256+ GB of RAM. These are
pretty common by now.

~~~
taf2
I agree and perhaps see if you can shard between multiple servers.

------
jrochkind1
There's no easy answer i don't think. it depends.

If you decide it makes sense to switch to an rdbms, you should probably
consider postgres rather than MySQL.

~~~
misframer
Why not MySQL?

~~~
imanaccount247
Just the small list of flaws that we hit trying to use mysql (there's dozens
more, and we're not including all the "mysql silently inserts invalid data"
gotchas):

no check constraints

views with aggregates are too slow to be used

no expression indexes

triggers don't fire on cascaded actions

no window functions

can't set default values to be the result of a function

no transactional DDL

doesn't have multiple databases, just schemas misnamed as databases

rollbacks are orders of magnitude slower than other RDBMS, and if it is
interrupted it can corrupt the database

functions can't use prepare/execute, so no dynamic SQL in functions

subqueries are broken: can't modify and select from the same table

functions can't be called recursively (seriously? is this 1962?)

triggers can't alter the table they are firing against

stored procedures can't be invoked from prepare/execute

"slow query" log has a completely useless resolution of seconds

~~~
taf2
Mysql does have a much better durability solution -galera cluster IMO... Does
Postgres have something similar?

~~~
imanaccount247
Galera is not a durability solution. Yes, postgresql has multi-master
synchronous replication. It is also not a durability solution.

------
rubiquity
Riak[0] could also potentially be a good fit for your use case. Unlike Redis,
Riak is distributed. I've heard of several companies in the POS space using
it.

0 - [http://basho.com/riak/](http://basho.com/riak/)

------
ayushghosh
Use a document store like mongoDB, use redis for frequently used data only,
Mysql is also great, maria db would be better or percona.

I think redis should have current data only which may be needed to process or
crunch daily sales or so etc.

------
sirdavidhuang
I think Apache Cassandra DB is what you are looking for.

Apache: The Apache Cassandra database is the right choice when you need
scalability and high availability without compromising performance. Linear
scalability and proven fault-tolerance on commodity hardware or cloud
infrastructure make it the perfect platform for mission-critical data.
Cassandra's support for replicating across multiple datacenters is best-in-
class, providing lower latency for your users and the peace of mind of knowing
that you can survive regional outages.

[http://cassandra.apache.org/](http://cassandra.apache.org/)

[http://planetcassandra.org/getting-started-with-apache-
cassa...](http://planetcassandra.org/getting-started-with-apache-cassandra-
and-ruby/)

------
mud_dauber
Here's another vote. Definitely move away from Redis for the reasons already
stated. Choosing MySQL or Postgres depends on your dev team's relative comfort
with each.

------
diminish
mysql, mariadb or postgresql would be ok.

>> menu_items - string(comma seprated list of all menu items that customer
ordered along with their price & Qty)

you may prefer to rework the menu_items as an independent table depending on
your uses cases and if you don't have strong reason against.

~~~
rmetzler
>> menu_items - string(comma seprated list of all menu items that customer
ordered along with their price & Qty)

> you may prefer to rework the menu_items as an independent table depending on
> your uses cases and if you don't have strong ganything against.

I wonder if storing this in an array in postgres would be a good idea?

------
resca79
I think redis is excellent choice for evil writes. Redis has many kinds of
configuration[1] Do you use the base package configuration or you add
additional config options?

[1][http://redis.io/topics/persistence](http://redis.io/topics/persistence)

------
kxo
You could try FoundationDB, with or without the SQL layer.

------
Kiro
2\. MongoDB.

------
imanaccount247
>moving from redis

Yes. It is a very poor choice for your needs.

>to mysql

No. It is a very poor choice in general.

>What other databases

Postgresql obviously. I'm honestly shocked that people are still considering
mysql in 2014.

~~~
resca79
Shocked for what? Mysql is a good choice for many cases. Please you should
explain why mysql is better than postgres?

~~~
imanaccount247
>Mysql is a good choice for many cases

Name one. Mysql is horribly broken.

~~~
taf2
Mysql is fast and easy to scale and make durable, what features of database do
you really use that is not met by, fast, durable and scalable... Sounds more
like perhaps some sort of belief rather than data?

~~~
imanaccount247
That does not answer the question. Postgresql is faster than mysql, easier to
scale than mysql, and is already durable without having to do anything to it.
Saying "it is almost as good as postgresql" does not mean it has a use case.

------
davismwfl
I don't think you are to this point yet, but a RDBMS under heavy write load
can run into a lot of headaches too, requiring a very carefully designed
schema and proper disk sub-system (lots of IOPS), plus the need to have a
solid DBA to maintain it. However, non of that is a vote against moving to the
RDBMS, so I do think you should seriously consider Postgres or MySQL though as
it seems like a good fit.

What I would consider though is keeping your Redis up front for
accepting/processing the records in real time coming in and then batching them
into your RDBMS. This provides you the ability to do reporting and other
analytics through the RDBMS, but utilize Redis as a buffer for the writes. It
also gives you the ability to massage the data into the specific schema that
best fits the usage of the data in the RDBMS, but allowing the most efficient
schema coming into Redis from the POS systems. It adds complexity though so
that would have to be weighed.

