Hacker News new | past | comments | ask | show | jobs | submit login
Shouting in the Datacenter (2008) [video] (youtube.com)
149 points by bcaa7f3a8bbc on June 24, 2018 | hide | past | favorite | 29 comments



Ah, it's a classic. And wow, Brendan Gregg looks so young. And video uploaded by Bryan Cantrill himself. Damn it SUN had so many superstars yet it's Oracle that survived.


Brendan and Bryan in particular seem to be doing really well.

Brendan is senior performance architect at Netflix http://www.brendangregg.com/bio.html

Bryan is the CTO at Joyent, a cloud computing pioneer recently acquired by Samsung. https://www.joyent.com/about/management/bryan-cantrill


As someone who is contemplating what I do after I am done with a full time, “9-5” job, it’s amusing how much we as a culture equate “doing well” with a job at a company.

Not a criticism of your statement, just an opportunity to point that out.


It seems almost common to break HDDs just while testing the firealarm. There are several incidents after a quick search. The latest big one was maybe this April in the Nasdaq's Scandinavian Data Center.

https://m.slashdot.org/story/339921


nice paper on this:

Blue Note: How Intentional Acoustic Interference Damages Availability and Integrity in Hard Disk Drives and Operating Systems [pdf] https://spqr.eecs.umich.edu/papers/bolton-blue-note-IEEESSP-...


After this incident, our property owner installed "silencers" on the fire extinguisher system in the data center we rent.


This issue, according to what I have heard, had to do with the fire suppression system going off by misstake and the explosion that happened when the gas was going to be released caused the vibrations.


The making of video is arguably more interesting: https://youtu.be/lMPozJFC8g0


So, it's 2018 - this was filmed 2008 on Solaris. Where is this UI for Linux? How can I reproduce the latency graphs in realtime on Linux?


The UIs were part of Fishworks which became the ZFS Storage Appliance. A more fair question would be, why hasn't your storage appliance vendor (like Pure Storage, EMC or whatever) implemented the UIs (and maybe they have?).

As others point out dynamic tracing facilities should be there with bcc but to be honest you shouldn't even need dynamic tracing to create those graphs, just disk I/O queues, service time and throughput. I know how to get that easily on FreeBSD from libgeom(3), I'm sure it's not too hard on Linux either in one of the pseudo-filesystems.

[libgeom(3)] https://www.freebsd.org/cgi/man.cgi?query=libgeom&sektion=3



There's this library with rudimentary histograms:

https://github.com/iovisor/bcc


Did you know that your HDDs performance may drop if you have too many correctable ecc errors?

We had servers that started an alert and a buzzer when ecc errors exceeded some internal threshold. The buzzer was making the chassis vibrate at the disks harmonious frequency and the performance dropped 30%.

After that I made every new hire to my team watch this video.

Now I deal almost exclusively with SSDs and this specific failure mode is no longer an issue.



"Don't shout at your JBODs". Not something I'd ever considered, but good to know!


Use harddisks as microphones? Nice!


Thats funny you say that... there's a project I came across on the interwebs a few years back (maybe it was on HN?) about a guy who turned old disks into speakers by modulating the read head to generate different frequencies.


There's also https://github.com/fulldecent/system-bus-radio Sidechannels everywhere!


tl;dr, Vibration due to shouting causes measurable impact on spinning disk performance, increasing io latency. I guess this wouldn't apply to SSD's though?


I can personally attest that there is an extreme version of this effect: do not use hard disk in PA/DJ situations... I crashed several of my early disks due reading MP3 data proximal to a certain pair of 18" push-pull PA subwoofers...


Sounds like the "engineers are retarded" anecdote [1] where on some Apple computers the speaker was located too close to the hard drive, resulting in a crash if you played something too loudly.

[1] https://www.youtube.com/watch?v=C5d151lqJsA


Are datacentres transitioning to SSDs ? Or have they already?


SSDs are used in datacenters, though in my experience they haven't completely replaced spinning rust. Typically I see flash as a (large) cache layer or for more important stuff, with spinning rust for bulk storage.


The transition to various flash-based storage is essentially complete for anything that is inside the actual server (with booting the thing off SD card and that being the only storage inside being somewhat common), but for SAN systems SSDs aren't that much interesting, because the bottleneck is not in the storage media itself, but in the interconnect technology (typical SAN storage shelf can saturate it's FC ports on random IO traffic regardless of whether it is based on flash or spinning rust at 15k RPM)


Most of the storage is and for quite some time will be magnetic until SSD costs lower enough. Also am not sure but I think for long term cold storage (backups, archives) magnetic can have advantages.


Yes, for primary disks and cache tiers. Archival storage and large disks are still on HDD, usually through some SAN vendor that handles replication and performance.


They are still not cost efficient when compared to traditional hard disks, so they are usually used for intent logging and de-staging to conventional disks. The pricing is nowhere near competitive in USD/GB or in total capacity for SSD’s to replace conventional disk-based storage.


Those were the days....


Those were the days indeed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: