Anyone like physical analogies more? If you convert them into relative masses:
L1 cache: a squirrel (1kg)
L2 cache: a mid-sized cat (~5 kg)
RAM: a tall, well-muscled man (~80kg)
Hard disk: one hundred blue whales (100 * ~130 metric tonnes)
This is what I mean when I say "It doesn't matter how fast your language is, you're just racing to get to wait on I/O faster."
P.S. Let's extend the analogy to include two other common factors:
Typical round trip to database: the combined mass of every ship, plane, and person in the USS Nimitz' air group... with room for another two fleets or so after you're done (150 ms ~ 1.5 million metric tonnes)
Time for user's computer to render a web page of medium complexity: worldwide demand for cement in 2009 (2 seconds = 20 million metric tonnes)
But please, spend time optimizing your string concatenation... because that is going to help ;)
[Edited: Revised and extended because I introduced a conversion error or two and then compounded them. Word to the wise: mental conversion to fractions of blue whales not advisable before morning coffee.]
Yeah, but if you can have more ram than the commonly accessed data (which is the usual case I have seen, unless you are putting things like photos in the DB. Most of us have databases that are far smaller than 64GiB, and 64GiB ram servers can be had for the value of less than a week of programmer time. ) once the database warms up, the read speed is no longer blocked on disk.
Disk is largely something you keep around so you can handle really large things (like pictures and video) and so that all your data doesn't go 'poof' when you hit the power.
(the other side of that, of course, is that if you want your data to be in good shape after the aforementioned power loss, then yeah, your writes will block on disk speed. But for most people, the 'sane but not correct' default of ext3 and the like is good enough.)
key. I mean, all my experience is with squid, but there are many, many caching proxies. Using caching proxies is a ridiculously easy way for a SysAdmin to take a slow webapp and make it fast without screwing with the application code.
Unfortunately most webapp developers don't use reasonable cache control headers. Most php apps, if your proxy does a HEAD to see if it should re-pull content, render a full page and throw it away except for the headers. (this may be false now, but when I did this, they were using php3. In php3, to handle HEAD requests properly you'd have to actually write code to handle it, which most programmers did not.)
Still, my experience has been that using something like squid gives you a pretty massive performance advantage, even when your webapp is uncooperative.
Actually, the more important factor here is memory consumption. When you do lots of string concatenations, you end up with tons of unused strings, eating up memory. Also that translates to CPU time the garbage collector has to spend.
Using StringBuffer, although it looks a bit crappy in the code (IMHO), actually does result in massive gains in terms of performance.
I think you are using different numbers from those depicted in the graphic, in which RAM is "only" 83 times faster than L1 cache, or under a kilogram if we go with the L1 hummingbirds @ 10g, and the cat would be 47 grams.
That really is incredible isn't it. I don't mind saying I have difficulty comprehending - really comprehending - such large magnitudes.
Great to hear it in different formats though. I can stare at numbers all day and still not really get it, but the difference between a second and 5 months is as subtle as a punch in the nose. Good stuff.
I've been HDD-free for a few months now; I can't imagine going back. It's not just latency that's an issue for me, it is also reliability (I've had 6 HDD failures in the last two years in various devices around the house.) I also worry less about damaging something if I drop my laptop.
For my latest work project (I do scientific computing), I realized it's easier to do it on my laptop instead of the workstation. Since my laptop has an SSD, I can just use the filesystem as my database. This means that I can have have millions of files (literally, millions) lying around and process them using the good old Unix shell. It greatly reduces the development time compared to using a database. Just for giggles I tried doing this on a machine with a hard drive, and it was more than one hundred times slower.
I just had an Intel SSD die on me that was less than 2 months old. Without any warning it became unreadable. Hooking it up to an external device I used to retrieve data from faulty spinning disks didn't work to get any data off.
Right now I am back to spinning disk and time machine for hourly backups. I'm highly considering selling the replacement Intel SSD when I get it back. Some things like loading programs, starting up and shutting down are much faster. When it comes to installing stuff, extracting files, or writing data to disk, this 7200 rpm 500gb drive is faster than the SSD.
Also worth taking note is the $300+ for the 80 GB Intel SSD could buy 4 500GB laptop drives.
Have you tried it on a box with a HDD and an adequate amount of ram? I would think that on modern file systems with ordered metadata writes (or journaling) like ext3 or ffs with softupdates, so long as you had enough ram, disk speed wouldn't matter all that much so long as you can keep everything in cache.
SSD is great, the problem is that the good SSD costs something like $15 per gigabyte, and good registered ecc ddr2 ram costs just over $20 per gigabyte. Sure, in applications where consistency across power-loss events is a huge deal, ssd is the right answer, but for most applications, buying a whole lot of ram is often faster and not that much more expensive.
RAM was not the issue. The reason for the slowness, as I understand it, is rather that different files are spread out in different areas of disk, even if they are in the same directory. This is considered a feature, and I guess it makes sense under normal access patterns. So accessing a million files (even to load them into memory for the first time) would require the same order of disk seeks, and takes forever. I might be simplifying a little bit, but this is my understanding.
Could I have re-written the code by messing around with inodes and other low-level details so that it accessed the files in physical order? Probably. Was it worth my time, rather than using an SSD? Hell no.
I agree that SSDs are still a tad expensive for the average Joe. For most hackers, considering that we spend most of our work hours in front of a computer, I feel that the added productivity from an SSD is easily worth the investment.
the idea is that if you have enough ram, you only need to read the files from disk once. after that, the files are in ram cache. Once the files are in cache, at least for reads, it doesn't matter how spread out on disk they were.
And yeah, you do need to read the files from disk once, and that is slow; thus you often see a 'warm up' effect on servers. hitting a new page is often slower than it is for the second person who hits that same page.
Ram might not hold all the information, and reading the files isn't the problem, finding them is. SSD's have virtually no seek time, it is RAM. Tossing in a 64gig SSD essentially puts the entire file system in RAM, at least that's what it feels like, they don't need warming up. It feels like everything's in the page cache all the time.
if the file is in cache, you can 'find' it in cache, without hitting disk. seek time, (which I assume is what you mean by 'finding it') is only a problem if the file isn't cached in ram.
Yes, running on a SSD takes the entire filesystem much closer to ram speeds. However, you are doing so at almost ram prices. (I'm speaking of good SSDs, like the X-25E; which comes to something like $15 per gigabyte; the not so good SSDs have problems of their own. I have a SSD in my laptop right now that is branded by one of the gamer ram companies, i forget which one. It was pretty cheap, under $2 per gigabyte. It's pretty nice for reads, for writes, sometimes it is good, but often writes are worse than spinning disk.) The advantage of just buying the ram is that a good virtual memory management system can automatically optimize to keep the data you access most often in ram.
Like I said, i use a SSD in my laptop, a cheap brand and it's small, my laptop doesn't need a lot of storage, so the cost is reasonable, and I use a journaling file system, so writes are cached and the slow sub-cell size write speeds of the cheap SSD aren't a huge problem. I'm just explaining why in my servers, I prefer to go with a whole lot of ram, and then slow, cheap, and large SATA, rather than less ram and expensive SSD.
I don't disagree about stuffing ram in servers, that's obviously the best approach, but it isn't always an option. I'm talking about the desktop experience. Many of us are limited by time and circumstance and are still using 32bit OS's on the desktop. I can't stuff it with RAM, but my Intel X-25M SSD makes my desktop smoke like no other hardware upgrade ever has.
My of us are also stuck with mission critical legacy 32 bit servers that we can't just take down and upgrade so easily and licenses for enterprise versions of some db's that can handle assloads of ram don't come cheap. SSD's are much cheaper and a no brainer upgrade for that aging db server that just needs to be faster. The X-25E smokes here letting you get that speed without needing that expensive license.
I can't find good numbers, but it looks like SSD's are still around 0.1 millisecond's which is still 100,000 nanoseconds. And 1/1000th the access time of RAM if if they are 100 times faster than HDD's. Most people don't really notice that big of a jump from HDD to SSD's and, I don't see it being as much of an issue for a while due to the increasing ram and cache sizes.
The major improvement coming from SSDs is that seek time no longer kills you. People will notice a difference going from random access on a rotational drive to random access on an SSD; sequential access, not so much.
My point was while SSD seek time is 1/100th HDD seek time you don't get anywhere near that big a jump. Because, while a HDD might take 100x as long to get you the first bit, HDD and SSD take about the same amount of time to read the rest of 4kb the sector.
Moving to SSD is a very noticeable performance bump. In my personal experience it has been one of the better upgrades I've ever done. The performance gains are across the board whereas CPU/RAM upgrades these days only benefit you if there was a CPU/RAM bottleneck in the first place. If you happen to have a fairly good system to star with an SSD really opens things up.
I'm sure there's an excellent reason, but the first thing I think whenever I see something like this is .. why doesn't intel (or whoeever) load up the CPUs with more L1 and L2 cache? Are there really diminishing returns so quickly after 6MB, or would the size increase make the expense not worth it? And is it impossible to make it modular and expandable?
It would be interesting to see some account of the cache size / die size cost / performance trade-offs.
Nehalem is about 70% cache. Most of it is the shared L3 between
cores. There are physical limits to how large a cache can be and
still run synchronously. The L1 is still tiny (64k, split between
instructions data), and it's really not feasible to make it larger
without affecting clock speed. But if you drop just a little bit and
pay a latency cost, you can stick a 256k unified cache on each core.
Then they all talk to the 3M shared "uncore" cache.
But to first approximation, a modern "CPU" is entirely SRAM.
It's commonly accepted that the cost of a chip is super-linear in its area but performance improves sub-linearly with cache size, so there is definitely a point of diminishing returns. Processor vendors analyze cost/performance tradeoffs in detail, but they generally don't publish anything.
What is the difference between L1 and L2 cache? I know that L2 caches are generally larger. Is it more expensive so far as silicon real estate, to make 1k of L1 cache vs 1k of L2 cache?
The last I heard, both L1 and L2 cache were sram based which used 6 transitors per bit. I also remember that ram takes either 2 or 4 transitors per bit + a capacitor. If L1 and L2 cache take the same number of number of transistors per bit, what is the difference?
Yes, L1 and L2 are made from the same 6T SRAM. The access time of a cache is proportional to something like the square root of the capacity since signals may have to travel to the opposite side of the cache and back. Because of locality, it's better to have something like a 3-cycle L1 and a 15-cycle L2 than a single 12-cycle cache.