So, let's say my system is currently backed by MySQL or PostgreSQL, and that is not fungible. How would one move that data into RAM, including writes? And, how would one maintain some level of safety in the event of a crash? i.e. I don't really care if I lose X amount of time worth of data (say, five minutes), but I do care that when I reboot the system, the database comes back from disk into RAM in a consistent state.
Is there some off-the-shelf solution to this problem? And, if so, why isn't it talked about more? Every CMS ever, for example, would be very well-served by something like this. My entire website's database, all ~100k comments and pages and issues and all 60k users, is only 1.4GB, and performance is always a problem. I don't care if I lose a couple minutes worth of comments in the event of a system reboot or crash. So, why can't I just turn that feature (in-memory with eventual on-disk consistency, or whatever you'd want to call it) on and forget about it?
If you are using PostgreSQL, your data is operating in RAM if your data is small enough to fit. The write-ahead log gets a sequential copy of writes that are immediately moved to disk but the rest is lazily applied in the background.
Given a halfway competent I/O scheduler and some cheap SSDs, you can continuously write new data to disk at network wire speed even at 10 GbE while operating on the data in RAM and saturating outbound network. There is no slowdown at all. Even for databases that do not implement a good I/O scheduler (like PostgreSQL unfortunately) your workload is sufficiently trivial that backing it with SSD should have no performance impact. If you are having a performance problem with 1.4GB CMS, it is an architecture problem, not a database problem.
Postgres's default settings are pretty bad. You'll want to increase shared_buffers to somewhere between 25%-40% of your available system RAM (if you're using 9.2 or earlier, you'll probably need to update SHMMAX using sysctl) instead of the default 128MB.
For the thing you described, you probably also want synchronous_commit=off. That means you might lose some commits in the case of a crash, but you won't get data corruption from it, and writes will be much faster.
If performance is a problem with a 1.4GB database, your system is either extremely low end, or you have some seriously un-optimised queries or database architecture and/or should take a long hard look at what caching you are not doing that you should be (including in the form of materialised views)
It simply is not your overall disk IO capacity that is the performance problem with a dataset that small. At least not for a CMS.
So, my question wasn't about whether the system in question is shitty. I know that it's shitty. I also know that the CMS has less than optimal queries. That was also not my question.
"It simply is not your overall disk IO capacity that is the performance problem with a dataset that small."
But, it clearly, and measurably is; there's nothing to argue about there. That which comes from RAM (reads) is fast, that which waits on disk (writes) is slow. Writes take several seconds to complete, thus users wait several seconds for their comments and posts to save before being able to continue reading. That sucks, and is stupid and pointless, especially since it's not even all that important that we avoid data loss. A minute of data loss averages out to close enough to zero actual data loss, since crashes are so rare.
How many new comments are you seeing per second? How many MB/s can the hardware actually write? How any MB/s of new comments are you actually getting?
Given the tiny dataset and small number of users, you really need to be running on abysmally slow (as in servers slow by late 80's standards) hardware or have a really bizarrely high user engagement or ludicrous levels of inefficiencies for this to make any sense with the numbers you've given.
Lets say an average comment is 8KB. Lets double that for indexes. At one comment per second you would be writing 86400 comments a day, replacing almost all old comments with 1.4GB of new comments in one day. Given you mention 100k comments total, presumably your actual rate of comments is much slower.
But even 16KB/second is at least an order of magnitude less than than my old 8-bit RLL harddrive for my 25 year old Amiga 500, and within reach of even the floppy drive on it...
Assuming your commenting rate is a tenth of that, the IO that should be required would in theory be within the reach of the floppy drive on a Commodore 64 home computer. Heck, it wouldn't take much for it to be within reach of the C64 tape drive.
Now, there will be some write amplification due to fsyncs etc. but nothing that could even remotely explain what you're describing unless there's something else wrong.
Maybe you have a degraded RAID array? Or lots of read/write errors on the drive?
> So, let's say my system is currently backed by MySQL or PostgreSQL, and that is not fungible. How would one move that data into RAM,
MySQL and PostgreSQL will totally take advantage of all of the RAM you give them.
> including writes?
This is harder, and you might not want it? It's worth noting that this argument is almost certainly directed against things like Hadoop, which claim to trade off performance for low management and easy scalability.
"MySQL has generally got your back in the 'less safety for more performance' arena:"
So, this was never entirely clear to me, but now that I've read a bit more about it, this might actually be exactly what I want (which is to not have the system wait to return when posting new content, and just assume it'll end up on disk eventually). The talk of not being ACID made me nervous and maybe switched off my brain. I guess it just means I don't need or want ACID in this case, all I want is a consistent database on reboot.
So, I guess maybe this does what I want, but just to be clear: In the event of data loss, the database will still be consistent, correct? i.e. we'll lose one or more comments or written pieces of data, but the transaction it was wrapped up in won't be half finished or something in the database? (I recall MySQL had issues with this kind of thing in the very distant past, but I imagine that's just bad memories at this point.)
You're in good shape if you use innodb in MySQL. The issues in the distant past you're thinking of are with the isam drivers, and I've also had issues with the compressed write only archive driver when writing to that continuously. Innodb will remain consistent.
For example, it is very easy to write badly performing code using ORMs. And yet, ORM is often chosen for good reasons initially to give development speed (e.g. Django forms) The problem with quickly prototyped ORM based apps is that initially the performance is good enough but when the data grows the amount of queries goes through the roof. It is not the amount of data per se, but number of queries. Fixing these performance problems afterwards for small customer projects is often too expensive for the customer, but if there were a plug'n'play in-memory SQL cache/replica with disk-based writes, it would easily handle the problem for many sites.
Configuring PostgreSQL to do something like reading data from in-memory replica is likely possible, but I see that there would be value in plug'n'play configuration script/solution.
Have you proven that your performance issues are SQL related? If you configure mysql correctly and give it enough RAM, a lot of those queries are happily waiting for you in RAM, so you have a defacto RAM disk. Finding your bottleneck in a LAMP based CMS system is fairly non-trivial. Think of all the php and such that runs for every function. Its incredible how complex WP and Drupal are. Lots and lots of code runs for even the most trivial of things.
This is why we just move up one abstration layer and dump everything in Varnish, which also puts its cache in RAM. Drupal and WP will never be fast, even if mysql is. Might as well just use a transparent reverse proxy and call it a day.
Disk writes are the problem in my case, and I'm comfortable making that assertion (I'm not new to this particular game). Reads are plenty fast, it's writes that are a problem. Certainly, it is a pathological problem with the disk subsystem that makes it suck, but it does suck, and if I could flip a switch and say "stop waiting for the disk, write it whenever you can" without risking breaking consistency on crashes, I would do so.
It seems like, from another comment, that setting innodb_flush_log_at_trx_commit to a non-1 value is roughly what I want, though it still flushes every second, which is probably more often than I need, but may resolve the problem of the application waiting for the commit to disk, which is probably enough.
In-memory database on a ramdisk, replicating to an on-disk database. All reads answered from the RAM master, while the replica only has to answer writes.
But I suspect that it's all already in the disk cache. Is the bottleneck reads or writes?
I guess I didn't mention it, but obviously (or at least, obvious to me knowing my system), it is writes that are slow. Reading pages can all come from memory, as the whole database is cached in RAM (I've given MySQL plenty of space to store the whole site in memory). But, writing a new post or page can take up to 30 seconds or so, which is simply absurd.
There is something pathological about our disk subsystem on that particular system, which is another issue, but it has often struck me as annoying that I can't just tell MySQL or MariaDB, "I don't care if you lose a few minutes of data. Just be consistent after a crash."
> writing a new post or page can take up to 30 seconds or so, which is simply absurd.
If it really waits on the disk your problems would be solved by using the SSD. Change your hosting plan. Fitting 1.4GB database isn't expensive. If it remains slow that means you use some very badly programmed solution.
We have dedicated servers in colo. I'm upgrading to new SSDs on a server-by-server basis, as I have the opportunity to do so. This one just hasn't been upgraded yet. That wasn't really the question, though. Whether writes take 30 seconds or .1 seconds, if I don't care, I think there should be an obvious path to make it take a millisecond by letting it just "commit" to RAM and then return.
Which, it seems like there is in the innodb_flush_log_at_trx_commit option, which is awesome, and which I'm trying out, now.
If you try the value 2 please write here if it's better for you. It can be a good compromise: "only an operating system crash or a power outage can erase the last second of transactions." It still doesn't "flush" so I guess it can work really good.
I feel like this is already indirectly possible with cache systems... yes there is an initial load from DB or whatever, but with Ehcache for example I think(?) all those objects just sit in the JVM, and therefore should be in RAM. If you wrote some app startup batch process to stick every object possible into the cache proactively, I think you'd have essentially what you're asking for.
Yes, there are a few systems which stores info in memory but backs it up to disk. Look at VoltDb, Aerospike, Couchbase (to a degree), Starcounter, OrigoDb etc. That said for a CMS the structure is usually the problem unless you have billions of web pages which means a db with a flexible datamodel is probably what you want (ie non-relational), no matter where it stores the info. For exmple graph database should match that niche really well.
100 K comments is actually miniscule. I guess you don't use any caching if the performance is always a problem? And I'm rather sure your website has even more issues compared to a better optimized one. It can be an issue of what you pay for, of course, if you have some virtual hosting with just a little of RAM. That is, maybe your data "doesn't fit in the RAM" you are providing right now.
Of course caching is enabled. Reads are all coming from cache. Performance problems come on writes, which currently wait for the disk, always, even though I don't care if they do...I'd be happy with it writing to disk five minutes later, but that's not an option I've been able to find.
Take a look at this option on MySQL innodb. I've used this in the past to radically improve speed:
"If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1, the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues."
We are moving to SSD. It just takes a while. This is the last of our servers to not have SSD. But, nonetheless, it seems obvious that for many use cases, one should be able to say, "I don't care if it's on disk immediately. Just be fast."
The "nice" option is to tune "fsync", "synchronous_commit" and "wal_sync_method" in postgresql.conf
If your overall write load is low enough that you will catch up over reasonable amounts of time, the really dirty method is to set up replication (internally on a single server if you like) and put the master on a ram disk. If your server crashes, just copy the slaves data directory and remove the recovery.conf file, and you'll lose only whatever hadn't been replicated.
But in terms of time investment in solving this, it's likely going to be cheaper to just stick an SSD in there.
shared_buffers is the setting you're thinking of here.
effective_cache_size should be set to a reasonable value of course, but it does not affect the allocated cache size, it's just used by the query optimizer.
Is there some off-the-shelf solution to this problem? And, if so, why isn't it talked about more? Every CMS ever, for example, would be very well-served by something like this. My entire website's database, all ~100k comments and pages and issues and all 60k users, is only 1.4GB, and performance is always a problem. I don't care if I lose a couple minutes worth of comments in the event of a system reboot or crash. So, why can't I just turn that feature (in-memory with eventual on-disk consistency, or whatever you'd want to call it) on and forget about it?