
How much faster is Redis at storing a blob of JSON compared to Postgres? - Manozco
https://www.peterbe.com/plog/redis-vs-postgres-blob-of-json
======
skywhopper
Alas, there's not enough context or code here to even really know what's going
on (or maybe I just don't know Django well enough to guess at what isn't
said). It sounds like the author is comparing storing and retrieving a record
with a JSON field in a PostgreSQL table versus storing and retrieving a raw
string (the raw JSON) in Redis?

So it's no surprise that the Redis solution is faster. Besides parsing and
indexing the data, PostgreSQL will also prioritize reliability and won't
confirm your transaction until it's actually persisted to disk. Whereas Redis
is prioritizing speed, and working straight to RAM--the disk serialization is
an asynchronous process. If anything, I'm surprised that the Redis solution is
_only_ 16-20x faster than PostgreSQL given what appears to be happening.

So, for this use case--storing a cache of Amazon API results for mostly-static
data--the Redis solution is a no-brainer: durability of the data is not a
serious concern; there's no real risk of data corruption because there's only
one schema and no interrelations; you don't need the data store to know
anything about the data; you don't need indexing beyond straight key-value.
For other use cases, Redis might be entirely unsuitable because of durability
or data integrity reasons, in which case the speed benefits wouldn't even
matter.

Now, all that said, in the real world there are lots of other considerations:
Are you using either of these databases already for some other purpose in the
same app? Which data store is easier to set up? Which is easier to code
against? Do you even care how fast it is if the slower speed is 11ms on a
single-user system?

~~~
fizx
A reasonably configured Postgres is easily sub-millisecond for reads and
writes on a small dataset like this. I suspect that the author is doing
something quite wrong.

People generally reach for Redis too quickly, IMO. Postgres is about as fast,
has better tooling, and will slow down gracefully as the dataset increases,
rather than evicting records as the limit is hit.

That said, I guess if you're relatively junior, it's hard to grossly
misconfigure Redis, so that's a point in its favor. Somewhat like Java vs
Node.js--you can write higher performance network services in Java, but it's a
lot easier to make mistakes.

~~~
Benjamin_Dobell
> _That said, I guess if you 're relatively junior, it's hard to grossly
> misconfigure Redis, so that's a point in its favor._

Redis persistence (or possibly lack-thereof) is probably quite complicated for
a junior developer to understand. I'd actually expect a better outcome with
PSQL.

~~~
threeseed
Not sure what you are getting at here.

Redis is one of the simplest data stores around to use and its persistence is
solid.

~~~
erichocean
> _Redis is one of the simplest data stores around to use and its persistence
> is solid._

I've lost countless amount of data in production systems running Redis.
Haven't ever lost anything in Postgres. (I still run both.)

------
KaiserPro
Its faster, but then, thats the point of it.

Its like saying that memcache is faster than postgres.

they do different things for different purposes.

I still wouldn't trust redis for anything other than ephemeral storage. Most
of the use cases where we use redis assume that the data will at some point go
away. The places where we use postgres assume that data is permanent.

Infact, we use both together, and it works really well, best of both worlds.

~~~
beamatronic
Couchbase really is the best of both worlds, in one product. Store in memory
first (memcached) and disk later, on 0 or more cluster nodes. From the calling
API, you can choose whether to block or not for the disk commit.

Edit: typos

~~~
godot
This is what I found in my experience as well. I founded a startup in 2013
(and ran it for 3 years) and used Couchbase as our primary data store (a saas
that integrates with web sites as customers that received tens of millions
DAU). With no dev ops person, running it was easy, and we've never had
performance or outage issues. I know you can't compare a document store/kv
store with a relational DB like PG or Mysql, but I have wondered why couchbase
is not as popular as say Mongo.

~~~
magduf
I can give you one hint. I don't really work with databases, so I'm not well-
versed on them, but I can easily name all the major RDBMSes (PG, MySQL,
Oracle, etc.), and some of the NoSQL ones. I've done a little work at home
playing with Postgres, but that's about it.

I've certainly heard about MongoDB many, many times. I've heard of Redis some.
I've heard of CouchDB. But this is the very first time I've ever heard of
"Couchbase".

~~~
lstamour
It was popular briefly on HN back in 2015, so it had a late start compared to
Mongo which was maybe 2012? And the commercialization of memcached was perhaps
noticed most by PHP devs, as memcached is common enough with Wordpress and
Drupal, though Redis might have overtaken it some. I always thought couchbase
was a very interesting idea but never had the chance to seriously suggest it
at work. These endorsements actually help me consider it, but at the time it
seemed like they didn’t do a great job explaining who was using it and why/how
it made a difference. I’ve heard more, recently, about using DocumentDB or
CockroachDB than it, actually. And most places I work at are still riding the
Postgres/Oracle/SQL traditional database shop, where it’s easier to introduce
Redis or Memcached than to switch where all the data is stored. Using it for a
new project would require buy in, but might be viable.

------
zzzeek
First off:

1\. if you are claiming to compare PostgreSQL to Redis, do NOT test with
django or SQLAlchemy or whatever, use the raw DBAPI driver, most likely
psycopg2, but you might get much faster results with a more optimized driver
like asyncpg.

2\. Use the Python profiler to generate profiles and to actually see how much
time is spent waiting on the network to get the value.

~~~
gonational
Thanks for this! I did not know about asyncpg, which is purportedly an average
of 3x faster than psycopg2.

------
ninth_ant
Something is very wrong for the median read of what should be a moderate sized
blob with an indexed lookup to be 8ms.

The article doesn’t give enough details to really understand what is that is
happening here, but I would recommend everyone to take this 14/16x report and
not expect that to be the general case.

~~~
zbentley
I think the author suggested that the cause of some of the latency might be
the processing in Python as data goes through the ORM. I like the Django ORM
but it does not prioritize speed. It's possible that Python time plus the ORM
doing something stupid (perhaps multiple round trips to the DB, reconnects, or
weird instrumentation in the DB router or manager) could account for those
extra milliseconds.

~~~
tyingq
It's also in a json type in Postgres rather than a blob or varchar, while it's
just raw data in redis.

------
rbranson
By default Django closes PostgreSQL connections after handling each web
request. I’m not certain but what we’re probably seeing here in latency is
mostly the substantial overhead in setting up a PG connection. Redis will
still be faster but I’d guess the gap would be significantly tighter with
persistent connections enabled.

~~~
cortesoft
Really? It doesn't use a connection pool?

~~~
jsmeaton
[https://docs.djangoproject.com/en/2.2/ref/settings/#std:sett...](https://docs.djangoproject.com/en/2.2/ref/settings/#std:setting-
CONN_MAX_AGE) pooling is available, but off by default. I suspect most new
projects turn it on, but I have no data to support that.

------
dijit
“Memory is faster than disk, more at 11”

~~~
njharman
yeah and in normal operation the OS/PG will have all your data in memory (or
all your indexes if very, very much data or under spec'd db server).

Disk is for writes.

~~~
dijit
There is one disk operation that is guaranteed to happen when pushing data
into a database like PostgreSQL. It will write data to an append-only binary
log.

In Postgres this is called “write-ahead-log” or WAL. On MySQL/MariaDB it is
called Binlog.

For a long time the only persistence redis even had was an append only binary
log-style system.

Additionally you could tune redis to a maximum persistence threshold which
emulated the absolute minimum persistence threshold of a system like Postgres
or MySQL.

There is no comparison between these systems. One is designed to treat your
data very safely by default and the other is designed for speed.

They both achieve those goals.

I don’t even want to go into the fact that vfs access isn’t still slower than
direct memory access (for, I hope, many obvious reasons).

The topic of this thread is the equivalent of judging a fish by its ability to
climb a tree.

------
hardwaresofton
A few problems:

\- A comparison of Redis and Postgres for in-memory storage without use of
TEMP or UNLOGGED[0] tables?

\- SQL for table creation and queries not shown

\- Using an ORM on one side and a more direct client on the other

While some data is better than none, the quality of this analysis is in
question.

[0]: [https://www.postgresql.org/docs/current/sql-
createtable.html](https://www.postgresql.org/docs/current/sql-
createtable.html)

~~~
jsmeaton
The redis cache client in Django is effectively an ORM - but a much lighter
one. The cache client is also doing compression and uses msgpack for
serialization.

I think the comparison, in the context of a regular Django developer, is apt.

~~~
hardwaresofton
> The redis cache client in Django is effectively an ORM

If you're implying that the redis cache client in Django is comparably
stunted, I don't agree. Postgres is a lot more tunable than Redis, and the
more complex the underlying machinery the leakier/less powerful the
abstraction ORMs can provide without taking on the same complexity. An
abstraction layer on top of Redis != an abstraction layer on top of Postgres.

Let me put this in context -- you could _literally_ write a custom Postgres
access method[0] (see talk on it[1][2][3]) which did nothing but take data and
write it to /dev/null, or put it in a bounded space in memory. Of course, you
_wouldn 't_ want to do that, but there's no way redis is going to be able to
compete with that speed -- and from there you can imagine implementing a
subset of Redis which is faster than redis can be (while throwing away a lot
of course). This is the kind of complexity an ORM has no business replicating
most of the time (I don't know which ORMs support the USING in CREATE TABLE
yet), but it's part of what _could_ be out there.

> The cache client is also doing compression and uses msgpack for
> serialization.

Great for performance but bad for testing -- this skews the results of the
analysis, they're not doing the same work.

> I think the comparison, in the context of a regular Django developer, is
> apt.

I don't disagree, because this is what I'd expect a regular Django/Rails
developer to do and call it a day -- but for those who want more serious
analysis, this isn't it. It's a great start (good analysis is hard), but
people shouldn't be thinking this is a _good_ analysis with reasonable options
explored -- UNLOGGED & TEMP tables are pretty low hanging fruit that isn't
even mentioned. Initial settings of the Postgres cluster aren't
mentioned/tweaked.

This is a pretty naive case and Postgres is near an order of magnitude, while
being capable of doing _a lot_ of other things. That's pretty amazing.

[0]:
[https://www.postgresql.org/docs/12/tableam.html](https://www.postgresql.org/docs/12/tableam.html)

[1]:
[https://www.pgcon.org/2019/schedule/events/1374.en.html](https://www.pgcon.org/2019/schedule/events/1374.en.html)

[2]: [https://anarazel.de/talks/2019-05-30-pgcon-pluggable-
table-s...](https://anarazel.de/talks/2019-05-30-pgcon-pluggable-table-
storage/pluggable.pdf)

[3]:
[https://www.youtube.com/watch?v=5RCkZl5HNiQ&list=PLuJmmKtsV1...](https://www.youtube.com/watch?v=5RCkZl5HNiQ&list=PLuJmmKtsV1dOTmlImlD9U5j1P1rLxS2V8&index=13&t=0s)

------
batmansmk
No index, not the same size of datastore, queries in a loop the same entry, no
cache config on the db side, only one hydrates an ORM... Don't get confused,
it's not because data comes from a database that it comes from the disk when
you query it.

I'm pretty sure we can go to a x3 max for the db.

------
citilife
I use PostgreSQL regularly and pull (and even search) JSON blobs. We have
something like 150 million JSON blobs (dataset is 500Gb) that are searched
almost every query, response time is maybe 150-200ms... That's on an AWS
t3.medium RDS instance.

I don't really know what the hell is going on with this, but something isn't
correct.

~~~
burpsnard
If its mostly on disk, you're paying ms for block io; get more iops.

If the index used are sensitive to low key diversity, fix that, and maybe
shard.

You might prefer a vm type with nvme.

If u can divide the data into more small ones, more indexes and cache fits in
ram.

Also, cluster data by what works, time, region, userclass, etc.

Only decompress the blobs at point of use.

Design your app to need only small fast data which cant be cached.

Etc

------
ngrilly
Bad benchmark with not enough information to draw any relevant conclusion.

My bet: Connection pooling is not enabled for PostgreSQL (Django default value
for CONN_MAX_AGE is 0). A new connection is established for each request,
which probably explained most of the overhead. Moreover, the connections are
are probably TLS secured by default, which increases the handshake time a lot.

------
Justsignedup
The problem of redis isn't speed of single object storage and retrieval.

We know for a fact that redis stores shit fast. And a blob of json is perfect.

The problem is multi-fold. Now you have a schema-on-read. You may spend a lot
more processing time querying and accessing the data rather than getting
exactly what you want from a sql query. Concurrency and locking. Constraints
dbs offer.

So it is important to evaluate on more than JUST read/write speed.

I worked on a team that used nosql for their db needs. The performance was so
terribly slow, not because of read/write speeds, but because of everything I
listed above. They had to basically manually write a query optimizer vs sql.
They quickly swapped to postgres and had somewhere along the lines of 10-100x
performance (depending on what parts) improvements with only a week of work.

------
ddorian43
Everyone should read
[https://redis.io/topics/persistence](https://redis.io/topics/persistence)
where it explains postgresql vs redis depending on fsyncness.

------
hesdeadjim
My team built a heavy duty load testing simulator in preparation for a
upcoming launch. We kept dialing the simulated clients up until we finally saw
timeouts talking to Redis (we used RedisLabs). The cause? We had saturated the
1 gig Ethernet link our AWS C4 instances had. Upgrading to the next tier of
NICs solved the problem and Redis kept up the pace.

That said, Redis is hugely expensive for any meaningfully sized data set.
Thankfully our use case allowed us to keep our data size bounded over time, or
else I would’ve traded latency for drastically reduced cost (eg use Postgres).

------
peterbe
Hey, Original blog post author here.

What a lot of people don't understand is that you want it both. The resilience
of Postgres and the speed of fetching from Redis.

You have data stored. You want to retrieve it as painlessly (shorter time,
less CPU usage, less complexity, etc.) as possible. What are the options? How
much faster is Redis? How much RAM is that going to eat up? Different tools
have different properties. Run some tests to answer some of those questions.
Don't just guess. Sure Redis is going to save some time, but how much? Do you
know by heart?

~~~
jdub
"Do you know by heart?" is very much the wrong question though. :-)

Benchmarking is surprisingly hard. After some digging, you might discover that
you're actually measuring how long it takes to bring up a TCP connection, or
how long it takes PostgreSQL to serialise the JSON from its stored form. It
could be differences in the driver layer Django is using. Coming up with the
corrct experiment can be the toughest part of doing the science.

------
manigandham
So reading from RAM in a single-threaded key/value store with a lightweight
text protocol, is faster than reading from disk in a multi-threaded relational
database with SQL parsing and ACID semantics. As measured by a heavy web
framework running in a slow interpreted language that overshadows the speed
differential so much that it's only 16x at the end.

This article is obvious in its conclusion, inaccurate in measurement, and
useless for any discussion.

------
__sy__
Thou shall edit thy postgresql.conf file. Better yet, use PGTune
[https://pgtune.leopard.in.ua](https://pgtune.leopard.in.ua)

------
radiospiel
I saw something similar recently when reading JSON data from postgres via
Rails, which was unbelievably slow. I found the reason to be that Rails would
automatically parse the JSON value into a in-memory ruby object. Since this
was for a API-like backend, the solution was simple: just read the column as a
string ("SELECT payload::varchar..")

I don't know if dkango is doing the same, but using an ORM might just do that
to you.

------
devit
Seems a really bad study, since it doesn't mention the exact methodology
(caching, exact queries, size of the JSON data, etc.), nor does it attempt to
figure out why PostgreSQL is taking 8 ms to do something that should take much
less and why reads and writes take the same time, or why Redis is taking 0.5
ms (i.e. millions of CPU cycles) for something that should essentially just do
a memcpy.

------
djinnandtonic
A good read, but wouldn't count these results as anywhere near valid - if
you're using an ORM, you're just perf testing the ORM.

~~~
bradhe
> anywhere near valid

Yeah they're laughably naïve.

------
einhverfr
These sort of benchmarks are both useful and useless, depending on what you
want. The architectures of the two are very different and so speed depends
very much on what you are trying to do.

Access patterns in Redis are very simple and it is a well optimized single
threaded event loop. It is very fast but cannot scale up to multiple cores
etc. So in this case, yes, storing and retrieving data will be fast, but uses
will be narrower _and_ if you need a high degree of concurrency, Redis will
pose some issues for you.

At least in my experience, Redis's key problem is that while it is well
optimized it is still a single threaded event loop and therefore with complex
use comes a lot of headache. For simple things however, it is really hard to
beat.

------
Operyl
Curiosity strikes: I have to wonder how much these benchmarks would change on
an actual Linux kernel? I don't know if the PostgreSQL team is necessarily
tuning their project for macOS/Darwin. I'm sure Redis would still be faster
(because memory vs disk), but I wonder by how much.

------
ArchReaper
Cached version:
[https://webcache.googleusercontent.com/search?q=cache:BBB9Vi...](https://webcache.googleusercontent.com/search?q=cache:BBB9VioZ0tkJ:https://www.peterbe.com/plog/redis-
vs-postgres-blob-of-json)

------
mfontani
For a couple similar use-cases (write rarely, read many times, JSON data) I
ended up using actual JSON files, on disk. Time to read and parse them is very
small compared to asking Redis or PG for it, and this benefits from the Linux
disk cache, which makes reading from those most-unchanged files almost the
same as reading it from memory.

------
ken
This is meaningless without knowing the schema.

------
windsurfer
Why can't Postgres implement a transparent caching layer similar to Redis?

~~~
tilolebo
How would you cache a write?

~~~
YawningAngel
You can't cache it, but if you don't care about durability as in this case you
can acknowledge it before it is fsync'd.

~~~
dboreham
A write is really fast if you don't do it.

------
najarvg
FWIW there is an update to this article at -
[https://www.peterbe.com/plog/update-to-speed-comparison-
for-...](https://www.peterbe.com/plog/update-to-speed-comparison-for-redis-vs-
postgresql-storing-blobs-of-json)

TLDR; this update is not about redis vs postgresql but about caching!

~~~
tyingq
If it were about caching, he wouldn't have used an ORM and a json Postgres
type. And there would have been more detail about the query cache, backing
store, and wal+fsync settings for Postgres.

------
xxxpupugo
Is this really that surprising?

Postgres is a RDMS, with transactional support and everything. While Redis is
powerful, it is much simpler and focused, not nearly as comprehensive as
Postgres.

TL;DR Postgres is doing a lot more heavy-lifting than Redis, it feels slow
because that much of work is not necessary for simple kv lookup

------
sandGorgon
postgresql does TOAST compression of large jsonb blocks. You might be
incurring the cost of compression/decompression by postgresql

~~~
bradhe
Uh...TOAST is a system for storing data off page. It doesn't have anything to
do with compression? Reading a TOAST actually requires reading two pages, so
you're correct that there is read and write overhead. But it isn't CPU bound a
la compressing the data.

~~~
sandGorgon
postgres does TOAST compression of larger jsonb.

I dont have any direct link to the docs, but here's a bug report when such
behavior was NOT happening.

[https://www.postgresql.org/message-
id/20140801182102.2696.87...](https://www.postgresql.org/message-
id/20140801182102.2696.87926%40wrigleys.postgresql.org)

if possible, this benchmark should be run again with a largish
TOAST_TUPLE_TARGET

~~~
bradhe
[https://wiki.postgresql.org/wiki/TOAST](https://wiki.postgresql.org/wiki/TOAST)

------
bradhe
This is cute.

------
unnouinceput
Quote:"My current implementation uses PostgreSQL (via the Django ORM) to store
this stuff."

At ORM the credibility of the author was all lost on me. I mean you went from
cloud to local cache as a performance improvement and then you chose to do it
via ORM? Why not make your own custom solution if you went this road anyway?
It's like changing your alarms in your house from using the ones provided by a
security company with making your own because you're an electronics engineer
and then you build them using electronic kits from commerce and complain that
using a different kit is better. It's your house dude, make them all custom
for maximum efficiency if you went this way.

