Hacker News new | comments | show | ask | jobs | submit login
Busting Modern Hardware Myths (2013) (highscalability.com)
110 points by nkurz 1185 days ago | hide | past | web | 37 comments | favorite

An important overall point that the article seems to be highlighting is that, in general (independent of these myths), developers are becoming less and less familiar with what the underlying system is doing. Or perhaps the more correct term is that developers are becoming more 'distant' -- there are more layers of abstraction in the whole system than there have been in the past. At the top, programming languages themselves want to get more abstract to make the process of writing software more efficient and in some ways more automated. At the bottom, hardware needs to use clever design to hide performance limitations that we come up against.

The problem is that sometimes, these abstractions leak. Knowing that what you're seeing is one of those leaks is key to not go down the rabbit hole of, for example, chasing a bug that ends up being caused by a hardware failure (painful experience!). So you have to keep notice of those layers below those you're directly working with; while you don't have to live there, you shouldn't totally ignore their existence.

Are any of these a surprise to anyone here?

I am not asking this rhetorically.

I found the SSD discussion very useful. Bottom line is that deleting (or rewriting) data causes unpredictable and degraged write performance due to "write amplification". Sticking to appends-only writes keeps things fast.

I certainly wouldn't discourage anyone from watching the video. At least read the slides!


For me, the surprise was to learn that access time for L1 cache is 3 cycles. I thought it should be 1 cycle.

Actually 4 cycles for modern cpus. The access time of a cache depends mostly on it's size (wire delay and switching time for the access tree). This means that cache sizes are always tradeoffs between hit rate and access time. As cpus get better at hiding latency, the access latency is loosened.

I was a little surprised that it was called a myth that CPUs are not getting faster. CPUs may still be getting a little faster, but they are way off the exponential curve that they were on at one time. They look like they're asymptotically approaching some constant now. It certainly wouldn't hurt your code at all if you make the assumption that CPUs are no longer getting faster. Your code would probably be better.

CPUs are still getting a lot faster, at a geometric rate. But they are doing so at a different rate than they did in the '90s, where CPUS were getting enormously faster year over year.

Here's a chart (note that the vertical scale is logarithmic): http://preshing.com/images/integer-perf.png

In the '90s processors were doubling in performance every 20 months or so (i.e. a factor of 64x per decade). Since about 2005 processors have been doubling in performance about every 43 months. This is much slower than the insanely fast rate during the '90s but it is still very fast and still on an exponential curve.

> Here's a chart

Interesting that in order to compare apples to apples it's single threaded. That means for a back of the envelope estimation of ideally parallel loads performance, the graph goes "through the roof" since it's exactly twice on my laptop or four times on my desktop.

The graph is logarithmic, so it might not go entirely through the roof.

Why does it stop at 2011? The last that I've heard of Haswell was that it was a paltry improvement, and that core will be around for the next couple years.

That was just the most thorough chart I'd found. Rest assured, the trends have continued since then.

Network cards write directly into cache

This feature, known as DDIO, is only on Intel's Xeon E5 and E7 v2 family CPUs. So, it's not quite ubiquitous yet, nor have there been any announcements on when it will become available (if ever) on the lower Xeon E3 or any of the desktop lines, despite a good amount of information already having been revealed on the next 2-3 generations of processors.

DCA (Direct Cache Access) has been around for a while - DDIO is its replacement on higher end Xeons.

The idea that mutating files in place is actually very expensive is new to me.

It was definitely interesting to learn a few little things like 1 divide being slower than +-

In Myth 4 SSDs, Thompson claims that:

"there's a limited number of times you can READ and ERASE a block" (emphasis mine)

My understanding is that ERASE cycles are what's limited; reading and writing are not destructive to the flash.

Supposedly MLC NAND is susceptible to data deterioration due to reads.

I have never heard of the myth that "HDDs provide random access". As far back as I can remember, HDDs were always the archetypal example for non-random access storage media.

That's funny, but I realize you probably didn't mean it to be.

the old "SAM" (Sequential Access Method) libraries talked to tape, and "DASD" (Direct Access Storage Devices) were what you and I would call a hard drive.

Think 'relative seek time' on a tape seeking to the correct block can take minutes, on disks it generally takes milliseconds. Both of them are accessed sequentially once you seek to the correct block, and that was the point of the article. Relative to seek time, disk reads are taking longer (a consequence of one physical head serving more potential bits).

I tried to get Seagate to put two independent R/W heads in a disk once. They wouldn't go for it.

Back to your comment though "RAM" as in Random Access Memory is often distinguished from Hard Drives, especially in early computer science classes to point out the differences. And again what this article is saying is that even dynamic RAM today, looks more like an I/O device than memory.

In the early 2000's at NetApp, one of the performance engineers managed to track down the performance limiter on Pentium 4's to the frontside bus memory controller. The FSB could only issue so many memory transactions per second, and the P4 had doubled the length of a cache line (that improved streaming performance at the expense of everything else). The tricky bit was that the chip reported CPU 100% busy, but in reality it was the memory controller that was busy, not the CPU.

> As far back as I can remember, HDDs were always the archetypal example for non-random access storage media.

I'm pretty sure that's tape, not HDD.

I'd say "random access" is basically just a relative term. Tape, yes, is about the least random-access-friendly medium I can think of. Compared to that, HDDs are quite efficient at random access. DRAM then makes HDDs look like tape in comparison, but even that in turn has its own slight differences in behaviour between sequential and random access (row buffers, prefetching...).

but even that in turn has its own slight differences in behaviour between sequential and random access

It's not slight. Modern RAM has about 20x more bandwidth doing sequential reads vs. random reads. It's almost entirely due to the addressing setup times needed to access different parts of RAM (something I suspect most programmers don't even know exists).

Programmers should actually be treating RAM the same way they treat disks, if they want the best possible performance the hardware can deliver. You can treat the L3 cache the way you used to treat RAM—they're certainly big enough these days!

One of the problem I work on is completely random access memory bound, so to scale, we now use ridiculously cheap CPUs (AMD Kaveri chips if anyone cares) paired with DDR3 2400 DRAM, $100 motherboards, and an Infiniband fabric and switch to keep them talking to each other quickly. (We use Kaveri's because we also are doing GPU computation at the same time, and thanks to the slow RAM, the on-chip R7 GPU is more than fast enough to keep up.)

On the plus side, I get to spend way less time optimizing the code since the CPU just sits there idling most of the time, which is a nice change. :)

> I'm pretty sure that's tape, not HDD.

Nope, I'm only 27.

It's less a matter of age than environment -- there are still plenty of places using lots of tape. (It's not like we're talking about vacuum tubes or core memory here...)

Clarification: I didn't mean to imply that tape is dead, but that during the 15 years that I've been programming, tape drives have only been used in enterprise-level backup systems, and are not used as "archetypal example" in articles intended for a more general audience.

Tape is still in use today. Widely. I bet there are more companies archiving to tape than archiving to AWS. For most of the past 27 years it would be the dominant archival method.

archiving to tape, not using tape as active storage. I think you and sibling comments are missing the point.

There are tapes that store 200GB+. It's definitely not a dead medium.

LTO5 can do 1.5TB on a $20 tape.

Well, I know that there were tapes doing at least 200GB a few years ago (LTO2?), hence the '+'. :)

I don't work with tapes at all, but I know that they are not dying as a medium...

Wasn't meant as a criticism or correction. Simply to underscore your point. Sorry if it seemed otherwise.

>I'm pretty sure that's tape, not HDD.

HDD is the new tape.

Random Access simply means that you can hit any part of the device 'at will' without skip all the data in between where you are and where you want to be.

Sequential access indicates that you have to pass all the data between where you are and where you want to go underneath the read head. So a denser medium would increase your access time unless you also changed some other fundamental (such as the speed with which you can move your medium around).

For a disk drive rotational speeds and head seek times have been more or less constant for a very long time, so the time to access a certain block can be computed without the underlying storage density becoming a factor.

So historically RA indicates disks/drums and so on and SA indicates tapes and derivatives. Bubble memory and its more recent incarnations are somewhere in the middle.

>'I have never heard of the myth that "HDDs provide random access".'

I haven't heard it stated directly, but I've seen it implied plenty. The typical scenario is simply someone who doesn't grok the gulf between random and sequential disk performance.

Indeed. IOPS has been the achilles head of magnetic disks for time eternal.

But ultimately calling these "myths" is somewhat absurd. Everything is relative. Relative to a magnetic hard drive, main memory offers impossibly fast random access. Sure, it's slow compared to cache memory, but that is a different discussion, and a different level of optimization. Similarly, when your SSD offers you 10s to 100s of thousands of random seeks per second, it is a universe better than the 10s to low 100 of a magnetic disk.

Or: welcome to the memory mountain, enjoy your stay.

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact