Visualization of L1, L2, Ram and Disk latencies [gif]

patio11 · on July 14, 2009

Anyone like physical analogies more? If you convert them into relative masses:

L1 cache: a squirrel (1kg)

L2 cache: a mid-sized cat (~5 kg)

RAM: a tall, well-muscled man (~80kg)

Hard disk: one hundred blue whales (100 * ~130 metric tonnes)

This is what I mean when I say "It doesn't matter how fast your language is, you're just racing to get to wait on I/O faster."

P.S. Let's extend the analogy to include two other common factors:

Typical round trip to database: the combined mass of every ship, plane, and person in the USS Nimitz' air group... with room for another two fleets or so after you're done (150 ms ~ 1.5 million metric tonnes)

Time for user's computer to render a web page of medium complexity: worldwide demand for cement in 2009 (2 seconds = 20 million metric tonnes)

But please, spend time optimizing your string concatenation... because that is going to help ;)

[Edited: Revised and extended because I introduced a conversion error or two and then compounded them. Word to the wise: mental conversion to fractions of blue whales not advisable before morning coffee.]

modeless · on July 14, 2009

I prefer the analogy posted by norkakn on reddit: http://www.reddit.com/r/programming/comments/90tge/hey_rprog...

L1 - There is a sandwich in front of you.

L2 - Walk to the kitchen and make a sandwich

RAM - Drive to the store, purchase sandwich fixings, drive home and make sandwich

HD - Drive to the store. Purchase seeds. Grow seeds..... .... ... Harvest lettuce, wheat, etc. Make sandwich.

randomwalker · on July 14, 2009

Amazingly, that still underestimates the ratio of HD/RAM latencies by a factor of 10.

(I'm using 83ns and 14ms, as in the article, and half an hour for RAM and one year for HD.)

lsc · on July 14, 2009

Yeah, but if you can have more ram than the commonly accessed data (which is the usual case I have seen, unless you are putting things like photos in the DB. Most of us have databases that are far smaller than 64GiB, and 64GiB ram servers can be had for the value of less than a week of programmer time. ) once the database warms up, the read speed is no longer blocked on disk.

Disk is largely something you keep around so you can handle really large things (like pictures and video) and so that all your data doesn't go 'poof' when you hit the power.

(the other side of that, of course, is that if you want your data to be in good shape after the aforementioned power loss, then yeah, your writes will block on disk speed. But for most people, the 'sane but not correct' default of ext3 and the like is good enough.)

mbrubeck · on July 16, 2009

"It doesn't matter how fast your language is, you're just racing to get to wait on I/O faster."

But it does matter how small your runtime (and inner loops) are, if you want to live in the L1 and L2 cache and avoid paying the RAM latency penalty.

sho · on July 14, 2009

I think you are using different numbers from those depicted in the graphic, in which RAM is "only" 83 times faster than L1 cache, or under a kilogram if we go with the L1 hummingbirds @ 10g, and the cat would be 47 grams.

Maybe you meant to start with 1 kilogram? : D

patio11 · on July 14, 2009

Gah, I really need to put my numbers on paper when doing repeated conversions or errors creep in. You're right, it is pretty borked. Give me a second...

kazuya · on July 14, 2009

> Give me a second...

L1$: a second

Hard disk: more than five months

sho · on July 14, 2009

That really is incredible isn't it. I don't mind saying I have difficulty comprehending - really comprehending - such large magnitudes.

Great to hear it in different formats though. I can stare at numbers all day and still not really get it, but the difference between a second and 5 months is as subtle as a punch in the nose. Good stuff.

lsb · on July 14, 2009

Okay, so, if your DB all gets paged into RAM, what tends to be the next mole to whack in the optimization whack-a-mole?

patio11 · on July 14, 2009

What sort of app are we talking about? If it is a web app, and you're optimizing the DB... are you sure you're not already working on the wrong mole?

Take a look at the above scale: page rendering times are going to dominate everything else. There are simple, repeatable, effective ways to reduce them. See the presentations from the YSlow guys.

lsc · on July 14, 2009

key. I mean, all my experience is with squid, but there are many, many caching proxies. Using caching proxies is a ridiculously easy way for a SysAdmin to take a slow webapp and make it fast without screwing with the application code.

Unfortunately most webapp developers don't use reasonable cache control headers. Most php apps, if your proxy does a HEAD to see if it should re-pull content, render a full page and throw it away except for the headers. (this may be false now, but when I did this, they were using php3. In php3, to handle HEAD requests properly you'd have to actually write code to handle it, which most programmers did not.)

Still, my experience has been that using something like squid gives you a pretty massive performance advantage, even when your webapp is uncooperative.

ryanwaggoner · on July 14, 2009

Business model.

I_got_fifty · on July 14, 2009

Ding!

stcredzero · on July 14, 2009

But please, spend time optimizing your string concatenation... because that is going to help

I have been a "hero" many, many times by doing exactly this, unfortunately for the assessment of "average" programming knowledge in corporate IT.

nailer · on July 14, 2009

How can you optimize concatenation? (seriously - never thought of this as a major performance consideration)

dabeeeenster · on July 14, 2009

In Java you can use StringBuffer.append() instead of "string a" + "string b" which is nanoseconds faster, allegedly...

axod · on July 14, 2009

Actually, the more important factor here is memory consumption. When you do lots of string concatenations, you end up with tons of unused strings, eating up memory. Also that translates to CPU time the garbage collector has to spend.

Using StringBuffer, although it looks a bit crappy in the code (IMHO), actually does result in massive gains in terms of performance.

dabeeeenster · on July 14, 2009

"Massive gains"? Really?

axod · on July 14, 2009

That's been my experience, yup. I can't remember if I have numbers/graphs anywhere, but it made a massive difference on a few occasions (If you do a lot of concats).

dabeeeenster · on July 14, 2009

"Massive gains"? Really?

eru · on July 14, 2009

Find a way to turn concatenating n strings (of a given length distribution) from taking O(n^2) time to O(n) time.

randomwalker · on July 14, 2009

I've been HDD-free for a few months now; I can't imagine going back. It's not just latency that's an issue for me, it is also reliability (I've had 6 HDD failures in the last two years in various devices around the house.) I also worry less about damaging something if I drop my laptop.

For my latest work project (I do scientific computing), I realized it's easier to do it on my laptop instead of the workstation. Since my laptop has an SSD, I can just use the filesystem as my database. This means that I can have have millions of files (literally, millions) lying around and process them using the good old Unix shell. It greatly reduces the development time compared to using a database. Just for giggles I tried doing this on a machine with a hard drive, and it was more than one hundred times slower.

quellhorst · on July 14, 2009

I just had an Intel SSD die on me that was less than 2 months old. Without any warning it became unreadable. Hooking it up to an external device I used to retrieve data from faulty spinning disks didn't work to get any data off.

Right now I am back to spinning disk and time machine for hourly backups. I'm highly considering selling the replacement Intel SSD when I get it back. Some things like loading programs, starting up and shutting down are much faster. When it comes to installing stuff, extracting files, or writing data to disk, this 7200 rpm 500gb drive is faster than the SSD.

Also worth taking note is the $300+ for the 80 GB Intel SSD could buy 4 500GB laptop drives.

lsc · on July 14, 2009

Have you tried it on a box with a HDD and an adequate amount of ram? I would think that on modern file systems with ordered metadata writes (or journaling) like ext3 or ffs with softupdates, so long as you had enough ram, disk speed wouldn't matter all that much so long as you can keep everything in cache.

SSD is great, the problem is that the good SSD costs something like $15 per gigabyte, and good registered ecc ddr2 ram costs just over $20 per gigabyte. Sure, in applications where consistency across power-loss events is a huge deal, ssd is the right answer, but for most applications, buying a whole lot of ram is often faster and not that much more expensive.

randomwalker · on July 14, 2009

RAM was not the issue. The reason for the slowness, as I understand it, is rather that different files are spread out in different areas of disk, even if they are in the same directory. This is considered a feature, and I guess it makes sense under normal access patterns. So accessing a million files (even to load them into memory for the first time) would require the same order of disk seeks, and takes forever. I might be simplifying a little bit, but this is my understanding.

Could I have re-written the code by messing around with inodes and other low-level details so that it accessed the files in physical order? Probably. Was it worth my time, rather than using an SSD? Hell no.

I agree that SSDs are still a tad expensive for the average Joe. For most hackers, considering that we spend most of our work hours in front of a computer, I feel that the added productivity from an SSD is easily worth the investment.

lsc · on July 14, 2009

http://en.wikipedia.org/wiki/Page_cache

the idea is that if you have enough ram, you only need to read the files from disk once. after that, the files are in ram cache. Once the files are in cache, at least for reads, it doesn't matter how spread out on disk they were.

And yeah, you do need to read the files from disk once, and that is slow; thus you often see a 'warm up' effect on servers. hitting a new page is often slower than it is for the second person who hits that same page.

gnaritas · on July 14, 2009

Ram might not hold all the information, and reading the files isn't the problem, finding them is. SSD's have virtually no seek time, it is RAM. Tossing in a 64gig SSD essentially puts the entire file system in RAM, at least that's what it feels like, they don't need warming up. It feels like everything's in the page cache all the time.

lsc · on July 14, 2009

if the file is in cache, you can 'find' it in cache, without hitting disk. seek time, (which I assume is what you mean by 'finding it') is only a problem if the file isn't cached in ram.

Yes, running on a SSD takes the entire filesystem much closer to ram speeds. However, you are doing so at almost ram prices. (I'm speaking of good SSDs, like the X-25E; which comes to something like $15 per gigabyte; the not so good SSDs have problems of their own. I have a SSD in my laptop right now that is branded by one of the gamer ram companies, i forget which one. It was pretty cheap, under $2 per gigabyte. It's pretty nice for reads, for writes, sometimes it is good, but often writes are worse than spinning disk.) The advantage of just buying the ram is that a good virtual memory management system can automatically optimize to keep the data you access most often in ram.

Like I said, i use a SSD in my laptop, a cheap brand and it's small, my laptop doesn't need a lot of storage, so the cost is reasonable, and I use a journaling file system, so writes are cached and the slow sub-cell size write speeds of the cheap SSD aren't a huge problem. I'm just explaining why in my servers, I prefer to go with a whole lot of ram, and then slow, cheap, and large SATA, rather than less ram and expensive SSD.

gnaritas · on July 14, 2009

I don't disagree about stuffing ram in servers, that's obviously the best approach, but it isn't always an option. I'm talking about the desktop experience. Many of us are limited by time and circumstance and are still using 32bit OS's on the desktop. I can't stuff it with RAM, but my Intel X-25M SSD makes my desktop smoke like no other hardware upgrade ever has.

My of us are also stuck with mission critical legacy 32 bit servers that we can't just take down and upgrade so easily and licenses for enterprise versions of some db's that can handle assloads of ram don't come cheap. SSD's are much cheaper and a no brainer upgrade for that aging db server that just needs to be faster. The X-25E smokes here letting you get that speed without needing that expensive license.

lsb · on July 14, 2009

What're you storing in files, and why is it in files versus in something more structured?

kazuya · on July 14, 2009

In case you are too lazy to enter the URL at the top:

http://duartes.org/gustavo/blog/post/what-your-computer-does...

There you can find more nice figures.

gourneau · on July 14, 2009

The post links to a 114 page paper titled 'What Every Programmer Should Know About Memory" take can be found here http://people.redhat.com/drepper/cpumemory.pdf

sho · on July 14, 2009

I'm sure there's an excellent reason, but the first thing I think whenever I see something like this is .. why doesn't intel (or whoeever) load up the CPUs with more L1 and L2 cache? Are there really diminishing returns so quickly after 6MB, or would the size increase make the expense not worth it? And is it impossible to make it modular and expandable?

It would be interesting to see some account of the cache size / die size cost / performance trade-offs.

ajross · on July 14, 2009

They do: http://images.google.com/images?hl=en&q=nehalem+die

Nehalem is about 70% cache. Most of it is the shared L3 between cores. There are physical limits to how large a cache can be and still run synchronously. The L1 is still tiny (64k, split between instructions data), and it's really not feasible to make it larger without affecting clock speed. But if you drop just a little bit and pay a latency cost, you can stick a 256k unified cache on each core. Then they all talk to the 3M shared "uncore" cache.

But to first approximation, a modern "CPU" is entirely SRAM.

wmf · on July 14, 2009

It's commonly accepted that the cost of a chip is super-linear in its area but performance improves sub-linearly with cache size, so there is definitely a point of diminishing returns. Processor vendors analyze cost/performance tradeoffs in detail, but they generally don't publish anything.

sho · on July 14, 2009

Yeah, that's what I figured. Be nice to see some numbers though.

Not like I have a hope in hell of getting anything I write even into 6MB .. sigh.

paddy_m · on July 14, 2009

What is the difference between L1 and L2 cache? I know that L2 caches are generally larger. Is it more expensive so far as silicon real estate, to make 1k of L1 cache vs 1k of L2 cache?

The last I heard, both L1 and L2 cache were sram based which used 6 transitors per bit. I also remember that ram takes either 2 or 4 transitors per bit + a capacitor. If L1 and L2 cache take the same number of number of transistors per bit, what is the difference?

wmf · on July 14, 2009

Yes, L1 and L2 are made from the same 6T SRAM. The access time of a cache is proportional to something like the square root of the capacity since signals may have to travel to the opposite side of the cache and back. Because of locality, it's better to have something like a 3-cycle L1 and a 15-cycle L2 than a single 12-cycle cache.

malkia · on July 14, 2009

All I see is a red-bar strip. Is this an animation or something (I've tried Mozilla, Safari)

yan · on July 14, 2009

Zoom in all the way and look on top

lsc · on July 14, 2009

Nice. it makes it real clear why caching disk to ram (as all modern *NIX variants do) is such a huge win, and why you should always load up your servers with as much ram as you can afford.

Sure, CPU contention can slow things down, but it's usually not the 'fall off a cliff' performance degradation that hitting disk (rather than hitting ram cache) is.

jws · on July 13, 2009

Don't try to view this on an original iPhone. It crashes safari.

Poiesis · on July 14, 2009

iPhone 3g crashes too. :(

DrJokepu · on July 14, 2009

Just for the record, it crashes the browser of Android (Cupcale) too. Seems like a limitation of embedded Webkit.

grinich · on July 14, 2009

I feel like I need a SSD just to deal with this image. How about like 3 lines of CSS?

sho · on July 14, 2009

Look, it is 54 KB. That is not large. Perhaps with some effort it could have been slightly smaller but the author is probably optimising for a different metric.

kinetik · on July 14, 2009

The image dimensions are 1120x13800, so it's 59MB uncompressed in memory.

scorxn · on July 14, 2009

Granted, the CSS wouldn't require zooming in, and the citation link would be usable.

grinich · on July 14, 2009

You're right. 53,966 bytes and not one larger. I wonder why it chokes up my browser.

But seriously, if you're going to use a GIF, at least make it animated.

functional-tree · on July 13, 2009

It'll be nice when SSDs come down in price/GB. Should help make day-to-day work seem just a tad quicker.

Retric · on July 13, 2009

I can't find good numbers, but it looks like SSD's are still around 0.1 millisecond's which is still 100,000 nanoseconds. And 1/1000th the access time of RAM if if they are 100 times faster than HDD's. Most people don't really notice that big of a jump from HDD to SSD's and, I don't see it being as much of an issue for a while due to the increasing ram and cache sizes.

functional-tree · on July 13, 2009

Some videos show that SSDs could improve experience in terms of booting, launch, etc:

http://www.youtube.com/watch?v=pJMGAdpCLVg

It's that kind of speedup I'd like to see -- disk-heavy tasks that don't peg the CPU.

leif · on July 13, 2009

The major improvement coming from SSDs is that seek time no longer kills you. People will notice a difference going from random access on a rotational drive to random access on an SSD; sequential access, not so much.

Retric · on July 14, 2009

My point was while SSD seek time is 1/100th HDD seek time you don't get anywhere near that big a jump. Because, while a HDD might take 100x as long to get you the first bit, HDD and SSD take about the same amount of time to read the rest of 4kb the sector.

herf · on July 14, 2009

SSDs vary a lot by write speed. Most today are slower than HDD for write throughput. (Intel's SSD drives are actually fast, but expensive.)

This will get figured out and make a big difference. But still no comparison to RAM.

jsz0 · on July 14, 2009

Moving to SSD is a very noticeable performance bump. In my personal experience it has been one of the better upgrades I've ever done. The performance gains are across the board whereas CPU/RAM upgrades these days only benefit you if there was a CPU/RAM bottleneck in the first place. If you happen to have a fairly good system to star with an SSD really opens things up.

vannevar · on July 14, 2009

How about adding ethernet and wireless latencies?

TheSOB88 · on July 14, 2009

Why doesn't this take into account the size of the block you're getting back? 1 pixel from the L1 gets you less data than 8 from the L2, etc.