
Ask HN: Server-side web apps – is the database still a bottleneck? - vanilla-almond
I&#x27;ve read many times in comments here on Hacker News that the speed of your database or disk access is the biggest bottleneck for your server-side web app. Therefore, the speed of your server-side programming language is not important (or less important).<p>Does this still hold true? Given that SSDs are now commonly used by many hosting providers - is database&#x2F;disk access still a bottleneck for server-side web apps?
======
namelosw
In short, yes.

The database is the bottleneck because it's much harder to scale than
applications.

The path of evolution in the industry:

1\. Stateful application - usually only 1 server, not distributed at all. It's
very hard to scale.

2\. An obvious solution is to make application stateless and having a
centralized state. Then the application is very easy to scale and operate. And
databases slowly became the bottleneck because there are much more
applications than databases.

3\. Then not so obvious solution is to split your problem domain into solution
contexts, where every context become an application and each application have
a database to talk to. So the databases are still not really distributed but
it's sort of distributed by your sub-domain of business. (I think it's the
industry mainstream or becoming mainstream now)

4\. Then the non-obvious solution is to have truly distributed states split
across your application. Basically, the goal is you can distribute any object
across a whole lot cluster, in a more or less transparency way which lets you
treat the cluster as a whole without concern about individual machines. (There
are some cutting edge stuff yet to become mainstream, like Orleans / Lasp-lang
or just cluster/shared actor like Akka, or just distributed Erlang, etc)

It's an interesting example that short-term solution is in a totally different
direction from the powerful solution. To address this in short-term is to
extract and centralize the state, while to address this in the long-term is to
truly distribute the state.

~~~
idlewords
I want to quibble with you on one thing—the database is often easy to scale,
by three orders of magnitude, by configuring it correctly, and then moving it
onto dedicated hardware.

~~~
tracker1
Similarly, _very_ few applications need more performance than you can get by
scaling vertically and having a hot spare replica database.

Although a distributed data store on a few cheaper VPSes/nodes is probably
less expensive.

------
esaym
For simple "web apps", typically its a poor understanding of SQL that is the
bottle neck. Since most web frameworks include an ORM, many developers don't
even know much SQL any more and treat the database as a simple 'object' that
you can call methods on and iterate through. Case in point: you're building a
dash board for a web based store to show details about the 5 most recent
orders. But each order references 5 other items (as a simple example). And
each item references an SKU# that can come from 5 manufacturers.

So an optimized query set from an ORM will end up querying the DB like this:
First it selects the 5 rows from the order table, but since you left out any
join clauses, it then runs 5 more selects on each of the rows returned from
the order table to pull the related rows from the items table. So we are now
up to 6 queries.

But each row of the item table references 5 manufacturers (again, as a simple
schema example). So each of the 'item' 'objects' (5 per order) has to then
make a query to fetch the sku manufacturer. So that is 25 more select
statement queries on top of the original 6. So using the ORM properly and
building out the right join clauses, you would have made 1 query. Instead you
made 31. Now lets say that DB is not on the same machine, but a different
machine on the network. Throw in 2ms - 10ms latency per query and now you are
looking at 62 - 310ms of latency to fetch your data. And that is just for 5
'rows' of data. Do the math for a more realistic dataset, 500-1000 rows or so.
Now you're looking at minutes of latency for your data to come from the DB.

The worse case I've seen recently was this exact case. To generate an
inventory page to show an html table with about 300 rows, the DB was being
queried over 300,000 times with a page load of about 7 minutes (and getting
longer everyday). I added the proper join clauses to the ORM and one query was
issued which would return the result set in about 700ms which put the page
load time to about 1 second. A 1 second page load imo is still not very good.
But it seemed to impress everyone that with 1 hour of work, a page load went
from from 7 minutes to 1 second....

For what ever reason, most people don't understand this anymore. Instead of
learning SQL, the line of thought is to upgrade the machine, move to a 'no-
sql' DB, introduce some caching layer, and many number of other ideas. But
"lets make sure we are querying the DB instead of iterating through it" is
never discussed.

~~~
tracker1
If only more people used simple object structures to data mapping. Working on
an overly normalized database, with in-database logging on all access via
stored procedures for all access, and hitting bottlenecks on a few dozen
users.

A "well structured" database can lead to performance issues. Every join and
sub-query has a cost and they add up.

The funny thing is that on this project, I largely stepped back from anything
to do with the Database because the target was only around 100 users or so on
a deployment that could largely be dealt with by faster hardware. Now, stuck
trying to scale to more users than that and stuck with layers of
procedures/functions that don't perform well at all. I wanted a few loosely
coupled tables with some destructured (json) data in them. Wouldn't be having
half the issues today.

Also, not a fan of ORMs... it's usually easier to do simpler mapping directly
or a simpler data mapper that isn't a more typical full on ORM.

~~~
jrumbut
Granted you can get a performance boost from denormalizing, and even maintain
integrity, I still find the denormalization I want today won't be the
denormalization I want tomorrow.

Moving from denormalization to denormalization can be hard, and you end up
with a sui generis application, versus a normalized database where so much has
been written about how to deal with problems up to a reasonably large scale,
although the document store style does seem to be reasonably popular so maybe
I'm missing something.

~~~
tracker1
Depending on what I've needed in the past, I've simply duplicated all data
from the RDBMS into denormalized structures for ElasticSearch or MongoDB. It's
worked _incredibly_ well for scenarios that are mostly-read where normalized
queries are very expensive.

------
dorfsmay
The reason the database is the bottleneck is that in traditional architectures
everything can be duplicated and run in parallel except writing to the
database which has to be done serially one write at a time. How fast you can
write (and commit to "permanent" storage like disc and SSD) is constrained by
hardware.

There are ways to alleviate this issue but they have different trade offs. One
of the most used pattern is "eventual consistency".

You might want to read about the CAP theorem if you have not heard about it:
[https://en.wikipedia.org/wiki/CAP_theorem](https://en.wikipedia.org/wiki/CAP_theorem)

~~~
mattmanser
I've fixed a lot of performance problems over my career.

This is not true, and not what we talk about when we say it's almost always
the db.

The usual cause of database problems is a single big query that is causing a
slow page, or a lot of medium size queries inside a poorly written loop.
Whether it be bad joins, missing indexes, select statements in the select
statements, operations in select statements that force the db to run the query
line by line instead of as a set (the technical term suddenly escapes me),
it's the queries that almost always cause db slowdowns.

It rarely has anything to do with writes, and then in my experience usually
had to do with escalating locks, and I haven't seen that problem in ages as
hardwares got faster.

~~~
dorfsmay
> This is not true, and not what we talk about when we say it's almost always
> the db.

It depends who's talking and what they are talking about.

If we're talking about fixing slow website, then sure, bad big ugly queries
are one dominant issue, but not the only one. If we're talking about
architecture of a greenfield project, then balancing between concurrent writes
and consistency is going to be your ultimate problem, everything else can be
"duplicated" (read-duplicated DB servers, caching strategies, multiple apps
server, etc...).

~~~
dorfsmay
Replying to mattmanser in the sibling comment since I cannot replying to it
directly:

It comes in faster than you think. One of the company I work for was a small
company, at the time I think we were 4 or 5 including the 2 founders, they
sell some specific information as a service through APIs. They charge per
call, so you have to update the customer funds in their account on each API
call. Yes, people can want and make lots of APIs call very fast, and in a
business which charges per API call, this is a good thing, but you have to
find strategies and make business decisions on how to handle accepting and
replying to new API calls while maintaining their account balance.

~~~
mattmanser
I find it hard to believe this actually caused a DB problem, updating an int
column on an indexed table should have taken nanoseconds. Far faster than any
HTTP calls should have been happening. Sounds like something was very wrong.
Someone fucked up with the network causing some severe DB latency or they'd
written some really shitty code.

Even if it did, as the other comment says, you could easily work round it. If
that was the behaviour there was no good reason to have it utterly perfect.

~~~
nostrademons
Depends if autocommit is enabled. If it is (and on many DBs it's the default)
then you're telling the DB that by the time the increment completes, it should
be written to durable storage. That's a minimum of one seek, which on a hard
disk is about 4ms, which = max 250 TPS. On a SSD you might get 0.1ms seek
times = 10k TPS.

It's possible to kill your performance very quickly with databases if you
request features that you don't actually need.

------
rcavezza
Based on my experience, it's the most typically seen bottleneck. But
bottlenecks change over time. You notice slowness somewhere in a website
action and you dig into it. Sometimes, it's a new database table that isn't
optimized with correct indexing. Sometimes, you end up doing an n^2 or n^3
operation on the backend (sometimes it's on the frontend). Sometimes, you're
data is growing at a speed that your single monolith database isn't cutting it
and you need to shard or consider a different type of storage.

Bottlenecks change all the time. You fix one, and now suddenly, the bottleneck
lies somewhere else. There's almost always a bottleneck somewhere. A database
bottleneck is relatively common, though.

From my experience, the bottleneck is mostly at some type of Big-O relevant
scale in the code or database that using java vs ruby vs python doesn't make a
huge difference.

However, if you're in a situation where speed is absolutely crucial - like
high frequency trading (where milliseconds mean millions of dollars) - well,
that opens a completely different can of worms.

~~~
mandeepj
Just think about the architecture of NYSE. They are supporting billions of
trades per day from so many decades.

~~~
chenster
And there's Kdb+
[https://en.wikipedia.org/wiki/Kdb%2B](https://en.wikipedia.org/wiki/Kdb%2B)

------
cc81
It is probably one of the more common together with loading too much
JavaScript and image content.

However even with that said a modern SQL database can handle A LOT of traffic
and I've had discussions about internal CRUD applications with a couple of
thousand users at most where people say we must use a document database
because relational databases does not scale. Insane.

Of course people can use document databases and similar where it fits but you
get a lot of nice things with a traditional rdmbs that you might miss later.

~~~
Cthulhu_
I haven't had a discussion about performance in years, weirdly enough, but
people did default to the scalable solutions - even before knowing whether
they needed it. Think new applications built using microservices. They also
spent nearly two years before releasing something with a fraction of the
features of the system it was supposed to replace.

If someone mentions a need for a technology choice based on performance, they
had better come up with real numbers. Until they do, clarity, correctness and
consistency should be the primary metrics. Don't solve a performance problem
before you have the luxury of having a performance problem.

~~~
specialist
Premature scaling is akin to premature optimization.

On one of my recent projects, the highest use data was stored in DynamoDB. The
entire dataset could trivially fit in RAM. So much wasted effort.

The org also over used Spark, event streams, and elasticsearch.

One could argue (rationalize) that using DynamoDB, for instance, for the small
stuff builds team competency applicable to the big stuff. Or that using one
set of tools reduces dependencies. I'd have to see some case studies. Because
from where I was sitting, 90% of our maintenance costs were from mitigating
bad sacred cow design decisions.

I've worked on similar metastatic tech debt based on NoSQL. Much as I love
Redis, it's not a duck (fly, swim, walk).

~~~
cc81
I can understand the competency stuff. I was really blown away with React when
that arrived and used it early, probably too early. React itself has been
remarkably stable and consistent but everything around was chaotic for a
couple of years. Anyway, we went all in in the department and builds all our
front ends with React. I would argue it is a great fit for around half our
applications as they are more complex but the rest could easily have been
built with very little javascript enhancement. However now people can move
between projects more easily and building something with React when you are
comfortable with it and has a nice base is not slower than something else.

So I understand that people want to try new things but damn I wonder if I
sounded the same when I wanted to start using React.

\- "We can just use couchdb because then we won't have any problems with scale
and we will never have to have any downtime because schema updates".

....yeah but that comes at a cost. You need to handle those schema updates in
code. You need to handle those conflicts that arise from eventual consistency
in code.

Anyway, I like the idea where you have a limited innovation budget where you
can go for a new type of db but then you select other technologies that you
are comfortable with and are true and tested. Not "Yeah, we will build this in
a completely new way and we have no experience with docker, kubernetes,
nodejs, document databases, vue.js, Elastic stack, prometheus or kibana" but
this modern way of building things means everything will go much faster.
Sometimes you have to go that route but the learning time can be pretty rough.

------
BjoernKW
That largely depends on the usage patterns of the app in question.

Write operations can be particularly expensive, read operations much less so,
especially with application-level caching. Even as for write operations or
notoriously expensive read operations such as full table scans modern RDBMS
are optimised to a degree that for most applications database performance
shouldn't be an issue.

While traditional hard drives competed with the network for the title of
"system component with the lowest bandwidth" (see
[http://www.cs.cornell.edu/projects/ladis2009/talks/dean-
keyn...](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-
ladis2009.pdf)) accessing data on an SSD is way faster than accessing a
resource via network.

So, today the technical bottleneck (as in "component with the lowest
bandwidth") in a typical web app system architecture is the network.

------
jandrewrogers
I've spent my career answering this question at myriad companies. In a few
cases, the database was actually a bottleneck. In the vast majority of cases,
the database was just being used very poorly and was no longer a bottleneck
once used correctly. The exceptions to this usually involve exotic data
models, which isn't most web apps.

Of the few cases where the database was the bottleneck in practice, I cannot
think of a single cases where it should have been the bottleneck in theory.
The limitations were always a property of the specific database implementation
being used, not intrinsic to databases as a class of software. For example,
most open source databases have notoriously poor storage throughput as a
product of their architecture; faster storage definitely helps but it often
solves a problem that may not exist in other database implementations. In
these cases the problem is an architectural mismatch.

Sometimes architectural mismatch is an engineering problem. A handful of
workloads and data models are currently only possible with custom database
engines e.g. high-throughput mixed workloads or data models that require
continuous load shifting. As a practical matter, the exotic cases that require
a custom database engine are also the cases where no company should try to
build it themselves without a deep expert -- they are considered technically
exotic for good reasons.

------
idlewords
Never underestimate the power of a properly configured server with 1TB of RAM
running MySQL off of SSD.

In my experience, people used to cloud services misjudge the power of
correctly tuned modern hardware by two-three orders of magnitude.

Giving a general answer to your question is hard in the absence of further
information.

~~~
bostik
Heh, most systems could fit their entire working dataset in 1TB and never
worry about I/O.

Except for logs, of course. Those things are incredible disk space hogs. Also,
serialising data to logs can surprisingly expensive. (Our ES logging cluster
has tens to low hundreds of TB of storage, and it expires data after roughly
40 days.)

------
onion2k
Surely this depends _entirely_ on the app.

Is the database a bottle neck for a CRM app that deals with millions of
customer records? Probably because everything else is incredibly lightweight.
Is it the bottleneck for an app that transcodes video on the server?
Absolutely not.

And so on, for every imaginable app.

~~~
unilynx
But the database is often the hardest thing to scale. Scaling up transcoding
is childs play by comparison.

------
d0m
Nowadays, in most system architectures, app servers are stateless and can be
scaled horizontally. But something needs to hold the state, and that thing
will be the bottleneck. Yes, you can horizontally scale some databases, but
distributed state is very tricky to get right and will have performance trade-
off.

------
orthecreedence
What are you doing? How many writes/s? How much processing are you doing in
the app server? Do you have a queue you can offload work onto?

Your question is a bit open-ended.

Scaling your database is _generally_ the hardest part. Any jerk can put 100
app servers behind a load balancer. Scaling data is significantly harder,
especially when it comes to writes. There are options to get around this, but
they come with compromises.

So, really, it depends on what you're doing. In general, use things you're
comfortable with. Is Python slower than Rust? Sure. But do you need 2ms
response times from your app server or is 10ms fine, especially when latency
between the user and your first load balancer is likely 3-5x that number?
Build it, test it, if it's too slow, cache it. If you can't cache it, break
out the offending piece and optimize it or put it into a faster language.

------
beat
Anything can be your bottleneck. Fix one bottleneck, find the next one. That
said, databases are often the bottleneck. And with web apps, "the speed of
your server-side programming language" is almost never the bottleneck -
although the architectural limitations of your framework may be.

~~~
asark
I've seen a lot of cases where "the speed of your server-side programming
language" was, in fact, the problem—because the developer was doing stuff in
it that they should have told their DB to do for them. Basically the speed of
the server-side programming language doesn't matter only as long as you're
very careful not to, you know, use it to do anything. Which is a realistic
goal in many applications, by leaning on one or more DBs more heavily.

------
hu3
In general yes, database will be your bottleneck.

In reality, when database becomes a bottleneck it means you're probably making
enough that you can just throw more money at it.

~~~
talaketu
Let me know when this happens - I'll stand in front of it.

------
felixge
In my experience yes.

Example: We struggled with some read-queries taking a long time when
aggregating over millions of rows. To solve this, we used triggers to pre-
compute aggregate groups in real-time during write, which allowed us to easily
achieve sub-second response times. However, this caused a lot of write
amplification and now this is becoming a bottleneck. We could probably solve
some of these problems by looking into column stores, but that carries it's
own downsides (operational complexity, real-time data loading, etc.). Or we
could buy faster disks, but that gets expensive quickly.

YMMV but I certainly still experience more database/disk related bottlenecks
than application-level ones.

~~~
sparkling
SQL in my experience is not designed for workloads where you want to compute
together/return a ton of values from a ton of rows at once. Its good where you
need to cherry-pick just a few rows belonging to one user from a databses of
millions of row (e.g. CRUD operations).

tl;dr:

simple CRUD operations on few rows scale fine

data-science/analytics stuff involving millions of in one query scale rather
badly

~~~
felixge
SQL isn't the problem IMO. The problem is that the implementation we're using
(PostgreSQL) is a row-store which is indeed more optimal for the kind of
operations you mention.

However, SQL isn't limited to row-stores. There are column-store
implementations that are quite amazing for aggregate queries, e.g. Clickhouse.
Using one of those would very likely work for us, but my understanding is that
loading data into them in real-time is problematic.

~~~
kwillets
How close is real time? Seconds? Minutes?

Also, what type of aggregate?

------
godzilla82
In my experience with enterprise applications, we hit database bottlenecks
only if you are doing aggregate operations over millions of records or firing
'like' queries over varchar(max) columns without full text indexes. Or
generating reports with joins over multiple million record tables.

For most crud operations any production ready database will not bottle neck
before you have to fine tune your application server and application itself.
Non thread safe application code and libraries have caused more performance
issues than large join queries.

------
yakshaving_jgt
Yes, the database is still a bottleneck, but that depends on the size of the
data and what you're querying for. If your data fits comfortably in memory
(and so do any operations you perform over it), then performance is not really
an issue. For many businesses — probably the vast majority — all the data
could just sit in memory.

------
imtringued
From fastest to slowest: web server > database > synchronous database
connection.

The bottleneck in my code is rarely the database but rather the interface that
is used to it. I'm sure if you had thousands of users running high latency but
low utilization queries you could reach the actual limit of the database.

99% of the performance problems I experience involve loading data from systems
that don't let me access it directly and instead the data is simply dumped and
my application has to load the dump. When you have to delete and reinsert
300000 rows every day and use the naive way of INSERTing the data then you
will wait several minutes because the synchronous nature of SQL means that the
client has to wait for a response on every statement. An easy way out is to
use non-standard features like COPY or INSERT INTO x FROM unnest(...) in
postgres. Except they aren't really pretty to use. Your 1 line Domain.save()
suddenly turns into a 30 line query just for inserting some data. I still
prefer the former for smaller datasets even if it means waiting 2 minutes on
an initial import that only happens once because that can be quickly
implemented and then you can simply throw the code away if you need something
better.

------
guggle
Yes, any server-side language will do. But when you read "database is the most
common bottleneck", you could frequently replace it with "database is the
least well understood component in the stack". As soon as you stop using the
database as a dumb store, you start to realise that your db can handle a lot
more than you thought. One has to go beyond what the average ORM offers.

------
jerf
"Therefore, the speed of your server-side programming language is not
important (or less important)."

I view this statement as a prediction, in the scientific sense: "If you re-
implement a program in a faster language, but have the same database access,
the user will not have any visible performance improvements."

Like almost anything else, the answer is, "it depends". It is often true.

However, I have falsified it in some specific instances. It is definitely
possible to take a "scripting language", which ranges in the 20x-50x slower
than C (a rough, but adequate, approximation of "optimal"), then layer it in a
couple more layers of abstraction that may be really nifty and cool and
awesome but also adds another 10x to the slowdown (or more... if you're not
careful, this term can be unboundedly large), and get to the point that simply
generating an HTML page takes a full second of a 3GHz's processor time. I have
replaced things that did non-trivial database work with something in a
compiled language and gone from "noticeably sluggish" to "essentially instant"
without all that much effort.

In one of those cases, significant effort had been expended trying to figure
out why the database was so slow, presumably because of the prevalence of this
meme, when even the first glance at the output of a profiling tool said that
was not where the problem was.

Performance is rarely the first thing to worry about, but I find it's
something you do want to at least keep in mind while developing. If you do at
least keep it in mind, I suspect the saying is generally true, and the raw
speed of the underlying server language won't bother you much. But you are
definitely much closer to the point where performance becomes problematic than
you start at with a faster language, so you need to be that much more
vigilant. People who are lulled into a false sense of security by the idea
that the database is _always_ the bottleneck can get blindsided by performance
issues in their app.

I think the meme overall is damaging. It's not true _enough_ to make it a
core, unquestioned assumption that you can rely on, and if it can't be that,
then you might as well simply use the correct answer, which is to use a real
profiler and look at the real data. I pretty much always through at least a
basic profile pass at any code that isn't just brute scripting.

~~~
Chyzwar
Most web application code is IO bound. You will get better performance by
using more productive language to iterate and create better architecture.
Github(Ruby) is noticeably faster than Bitbucket(Java) because engineering
time spent on what matters.

I think most slowness is coming from bloated apps trying to push too much
code/data over the crappy network. Additionaly it is easy to misconfigure AWS
infrastructure.

~~~
jerf
"You will get better performance by using more productive language to iterate
and create better architecture."

And if you blindly follow that, you'll fall into exactly the trap I mentioned,
where that "better architecture" involves layering another 10x+ slowdown on
top of your already-slow scripting language and before you know it it's
100,000 cycles just to emit "<p>".

I think there's rather more compute-bound web-generation code out there in the
world than people think, precisely because this meme makes it so people never
even _look_. Of course you'll think it's all IO bound if you just _assume_ it
is and never check even a little bit whether that's true, which from what I
can see, is the norm in the web programming world. The data can be screaming
that you are in fact _not_ IO bound, but if nobody's listening because
"everybody knows" that can't be true... who's going to find out it's wrong?

------
SimonPStevens
It's hugely dependant on the app and type of data it's processing.

Our primary apps deal with trading data at 15/30 minute intervals over
multiple years. It's not uncommon to have millions of rows involved in a
process.

Yes, nearly all of our performance problems boil down to the database. The
majority of the time spent on the server is on data access.

Basic CRUD on small data wouldn't be the same.

------
hedora
Assuming everything is designed to scale linearly to some point well above the
current workload, a better way to phrase this is to ask what percentage of
your hardware spend goes to compute, network and storage machines.

Note that a single postgres instance with one core and 1gb of ram on 1 ssd can
“scale linearly” by my definition, since you can easily double, triple, etc
all those specs.

On the other hand, a fully populated data center running scale out software at
100% power capacity can’t scale linearly anymore, because you can’t upgrade it
anymore. For that data center, power or maybe space is the bottleneck.

Short answer to the question you asked: SSD’s are not going to be the
bottleneck for software that’s migrating off disk, and the database probably
won’t either. Hardware trends mean the storage and database just got 10-100x
faster, while the business logic maybe doubled in speed.

If the system was well balanced before, that means no one spent then-
unnecessary effort on now-necessary optimizations on the compute side.

------
GrumpyNl
Biggest bottleneck these days are poorly written sqls and badly optimized
db's. Hardware isn't really an issue anymore.

~~~
godzilla82
Also, poorly written application logic.

------
patsplat
Persistence is still slow, but is no longer a major bottleneck. SSDs are
faster, but io and network is still slow.

What has changed it is much easier for the client to be continually in touch
with the server so that by the time the next "page" is loaded all necessary
data has been retrieved.

This way the database is no longer the same obvious bottleneck.

------
viraptor
It really depends on the app, but if you're asking about a typical situation
of a typical crud app, then yes, db is still likely the slowest part. SSD
doesn't change this, because a typical crud app fits completely in memory and
is 99% buffered reads anyway.

~~~
chenster
Is buffered read a database feature or the application?

~~~
viraptor
Yes. Either. Depends on your application. It may be a feature of orm caching
as well.

------
tombert
Even if your disk speed is infinitely fast, or your using an in-memory thing
like Redis, hitting the DB is almost always going to be slow-ish due to
network latency.

Redis is awesome badass rockstar tech, but I try and avoid it for any single-
processe thing, instead opting for a tried-and-true global (thread-safe) hash
table. Network packets aren't free, and no matter how cool your database is, I
don't see you avoiding that. Not to mention that there _is_ a cost to
(de)serializing data, which you don't pay if you use something built into the
language itself.

I'm not saying to go drop Redis (or any database/cache), just to keep this in
mind.

------
mcguire
The data is a bit old, but I don't think the relative order or rough magnitude
has changed.

[https://gist.github.com/jboner/2841832](https://gist.github.com/jboner/2841832)

------
tracker1
Often, yes... it will largely depend on what you are doing, how cacheable
things are, and how large your data set is.

If you are largely static and most of your content is tethered to fixed keys
that can map from URL patterns, then you could through a large redis cache in
front and it wouldn't be the bottleneck... or enough ram to cache on-system.

There are more details to how much your disk or database are your bottleneck,
but in almost every case, they are still much slower than your CPU or your
network interface (that may be a bottleneck in some cases),

------
Cthulhu_
One important thing that has changed since I learned of the "databases are
slow" is that they can now be "as-a-service", with services like Amazon
Aurora; from the outside interface they're just SQL databases, but underneath
they can scale up a lot with minimal effort or knowledge. Likewise, you can
have your database of choice and put it on a server which you can scale up or
down. Not sure how the migration for those works though.

~~~
noobiemcfoob
The serverless database options still remain the bottleneck though. They are
still often the slowest aspect of any logic path if not also the most
expensive.

------
fulafel
I thought this promising title was going to be about a new trend called
"server-side web apps" and how they reduce DB round trips. Oh well :)

~~~
asark
"One cool, super-advanced trick you can use to reduce latency for simple
queries by 95%!"

Then you read the article and it's just running the app and DB on the same
physical server, and communicating over local sockets rather than a TCP/IP
stack through various virtualization layers and across several networks, real
and virtual—so, partying like it's 1999.

------
RocketSyntax
It does feel weird pulling an entire database into a docker container... I
don't know what the path looks like to make that a cluster.

------
owaislone
Yes, for the most part, it still is. It's not always disk though. I can be CPU
or memory but most of the time it is the database that falls short of it.

------
winkeltripel
Yarp, it is.

------
kid64
This is a ridiculous conversation. Disks and RAM will always occupy their
respective positions in the storage hierarchy, so databases (assuming stored
on disks) will always be the bottleneck.

------
noobiemcfoob
Memory access of any type is often the slowest. This might just be a general
law of computing regardless of level of the stack at this point.

In the serverless world, memory accesses are the most expensive with writes
being the most expensive of all (where expensive == slow __AND __literally
expensive)

~~~
mi100hael
Do you have an example of that being the case? Memory is usually pretty quick
compared to disk & network.

~~~
noobiemcfoob
I mean memory, storage in a broad sense. Not a statement about in RAM or
cache, though I see the confusion in retrospect. Having code access some sort
of storage (better word than memory) is always the most expensive part.

