Hacker News new | past | comments | ask | show | jobs | submit login

What goes on at the chip level is terrifying. "You have to understand," a hardware engineer once said to me, when we shipped a consumer computer that was clocking its memory system 15% faster than the chips were supposed to go, "that DRAMs are essentially analog devices." He was pushing them to the limit, but he knew what it was, and we never had a problem with the memory system.

There was a great TR from IBM describing memory system design for one of their PowerPC chips. Summary: Do your board layout, follow all these rules [a big list], then plan to spend six months in a lab twiddling transmission line parameters and re-doing layout until you're sure it works . . .

True horror story: The first board I was involved with when working at Lucent in 2001 was a monster modem card, provided something like 300 modems plus or minus. For a long time in the development process we had a weird reliability problem which just could not be tracked down. Until in desperation a hardware engineer started putting his scope probe everywhere ... and found a line (or set of them) where the signal was messed up in the middle but not at the end!!!

Analog is a black art.

There's a lovely book on high-speed signaling subtitled "a handbook of black magic". I always found it very fitting. It explains the rules and heuristics and methods of working around and exploiting the analog nature of digital signals.

I highly recommend that book as well, I learned more about signal integrity from reading it twice than I did during all of EE undergrad.

Link: http://www.amazon.com/High-Speed-Digital-Design-Handbook/dp/...

My company designs & sells high-perf memories. My lead AE has a copy of this within reach of his desk.

I currently work in the embedded field and we had a similar issue where my software team spent days trying to track down weird problem that looked like hardware.

Long story short was that spansion changed from gold to copper feeder wires inside their memory chip and was in every so many chips causing bit flip problems.

Our end product is in the automotive space averaging 300,000 units per year. It's a non safety related component.

Black arts are just arts with more rules than we ourselves understand. I don't think we should let that scare us from trying to understand. It's all just physics.

It's also about manufacturing tolerance, making room for error, suffering environmental stuff and understanding that the universe is out to get you. Sure, in reductio ad absurdum it's all just physics, but most of shipping a product is making engineering tradeoffs, dealing with complex stuff you may not fully understand, and making plans for when vendor B's product isn't exactly the same as vendor A's chips, but vendor A was just bought by Apple and won't talk to you any more, at any price. :-)

Of course. But the actual physical models that underpin these things are vastly different than the mental models we use to reason about them - even for people who understand them. Frequently the "lower level" model only needs to be pulled out at troubleshooting-time.

Hiding complexity behind abstractions is what allows us to build complex things.

It's all well modeled by transmission line theory and electromagnetic compatibility, as far as I know; although I'm sure modern processors have some additional problematic quantum behaviors too.

Yup. As clocks increased, traces began to more/less resemble waveguide designs a-la microwave. And if you've ever done microwave RF eng, it's all black magic.

Or don't use auto-route feature of many electronic design programs have...

When was the last time you heard of a signal in a PC trace degrading and then getting much better?!?!!??!!!! So much better that when previously tested that trace wasn't deemed suspicious?

Look at load line impedances. The effective output impedance of a transmission line depends on the fraction of wavelengths of waveguide transmit the wave. At any half wavelength, the output and input have equal impedances. If the output were matched to the input but no the line you get the exact effect described.


Not a PC, but a good story about a PDP-10 back-in-the-day:


Serial LVD buses make clock skew and interference mostly irrelevant. It's a reason besides cost that parallel ports and PATA and parallel SCSI are dead and USB, SATA/SAS dominates.

This is a good example of analogue-ness causing a very subtle and intermittent bug which remained unsolved for over 30 years: http://www.linusakesson.net/scene/safevsp/ (previous discussion here: https://news.ycombinator.com/item?id=5314959 )

Most circuits works probabilistically and come up randomly because of timing and thermal noise, so it's basically impossible to get the same exact running state twice... It's just impossible.

For a small example, Look at the simplest SR latch circuit... It's metastable because it feedsback into itself.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact