
Ask HN: What are the biggest databases you've worked with? - barelyusable
Originally thinking about SQL databases done with PostgresSQL&#x2F;MySQL, but would be interested in anything. Wondering about number of queries&#x2F;transactions per second, and how you handled that scale.
======
kevrone
At Timehop we currently work with a single instance AWS Aurora (MySQL-ish)
database with over 40TB of data (plus a read-only replica on a smaller
instance). Some stats: 1.5MB/sec receive throughput, 10MB/sec transmit
throughput, commit latency around 3-4ms (with very occasional spikes to
10-20ms), select & commit counts are about 300/s, and select latency hovers
around 35ms (we do about a dozen unions per query though).

All in all it's the easiest relational database I've ever worked with in terms
of stability, speed, and scalability. I know this sounds like an ad for
Aurora, but I just really like it.

~~~
dirtyaura
I'm curious, how does the replication work if the replica instance is smaller
(I assume smaller in disk space)? Is is automatically removing some of the
data from the replica based on a heuristic rule?

~~~
kevrone
Smaller instances are smaller in memory/compute power only. Storage is charged
separately and the implementation details are unknown to me.

------
tupshin
Having worked with Cassandra for many years, I have worked on:

* 1000+ node clusters

* Petabyte scale data

* 10s of millions of reads and writes per second

Given my preface, the "how" is scale out on top of Cassandra, of course. Not
SQL, and hard to do if you have a highly relational model, but many stories of
success at those kinds of scale.

~~~
dserban
Hey, thanks for sharing.

As a Cassandra devops / data modeler myself, I would be fascinated to read
more details about your scaling challenges. Do you guys have an engineering
blog I could read?

------
spthorn60
MLB's Statcast collects 7TB/game, or 17 petabytes of raw data annually.
[http://fortune.com/2015/09/04/mlb-statcast-
data/](http://fortune.com/2015/09/04/mlb-statcast-data/)

------
thinkMOAR
Would be nice if people answering include:

\- hot vs cold data ratio of the total size \- read vs write data ratio \- if
read/writes are split \- how partitioning, if used, is done \- total
machine(s) resources disk/ram \- average (read)query response time \- how
machine/node failure is handled

------
CyberFonic
My biggest installation was an accounting system for a multi-national
corporation. 30 Oracle instances running on a 256 core Sun cluster, 192 GB
RAM, 40 TB EMC SAN. Typical enterprise system overkill.One of the big 6
consulting firms designed it and deployed PeopleSoft on the completed system.
I was just the lowly engineer who configured the hardware, the SAN and the
Oracle instances. As for Rolls Royce cars, the performance was "adequate".

------
abalashov
I've handled ~1 TB DBs in Postgres with about 2000 read queries/sec.
Technically these were stored function invocations, so wrapped a considerably
larger number of queries inside.

This didn't seem to be a problem. It was the simultaneous write operations
that created real limits, banging on the disks/disk controller like that.

------
ohstopitu
I had worked with ~2 TB of data in CouchDB (a few thousand endpoint calls /
sec) for my capstone project at University and I thought I had experience, but
reading these comments, I realized how much less experience I really have.

------
avitzurel
12TB MongoDB spread across 9 shards (2 replicas per shard) 4TB MySQL with some
tables ranging the 400GB size.

MongoDB handled about 13K ops/sec at peak times with around 5-8K of these
being writes.

MySQL was probably around 2-3K ops/sec.

------
Clownshoesms
2G, but it was a Cache database, in a hospital. Probably the worst job of my
life.

