
"Intel Packet of Death" not Intel's problem - Maci
http://www.h-online.com/security/news/item/Intel-Packet-of-Death-not-Intel-s-problem-1801537.html
======
noonespecial
Kielhofner's response.

<http://blog.krisk.org/2013/02/packets-of-death-update.html>

I used to use "Lanner" gear for voip and these had embedded intel ethernets. I
don't have any more of them to test, but I swear I've seen it on them as well.
We suspected power supply problems because the link lights would just go dark
every once in a blue moon and need a power cycle to set right, but then we
were never be able to reproduce it.

~~~
jessaustin
I am impressed by his original troubleshooting, but this followup seems
impractical. Of his three suggestions, only the third (Intel providing
improved board testing tools) even seems like it could possibly prevent this
sort of problem. Asking for hardware-enforced "sane" behavior is like asking,
"why doesn't my computer know I don't want my program to deadlock, segfault,
or loop indefinitely?" That is, if the controller could do that then it would
solve the Halting Problem. Improved drivers, his second suggestion, are always
a good thing, but drivers only get patched to handle broken hardware in
response to the discovery of broken hardware. There is no way to anticipate
each particular way a NIC could possibly be broken ahead of time.

The market demands controllers with flexible and expandable functionality.
Board manufacturers use the EEPROM to specify exactly what behavior is
required. If a particular manufacturer underestimates the importance of
correctness and doesn't perform the code review and testing necessary to
prevent a PoD, that isn't Intel's fault.

~~~
kkielhofner
Kristian Kielhofner here - While I understand your analogy I don't think it's
an accurate one. In fact, with the release of the successor to the 82574 Intel
has already implemented some of the things I suggested:

[http://communities.intel.com/community/wired/blog/2012/10/18...](http://communities.intel.com/community/wired/blog/2012/10/18/i210-launch-
announcement)

Clearly they have learned from the various EEPROM issues on previous
controllers (including the 82574) and implemented (among other things) EEPROM
signing, which addresses some (most?) of my concerns about sane hardware
behavior. Software drivers already do some basic EEPROM checks on this
hardware (I know because I've had to tweak them); I'm simply suggesting these
checks go a little further to verify the various EEPROM settings than could
potentially result in a scenario like this one. When the effects are as
significant as they are here I hope we can all agree: more sanity checking is
a good thing.

~~~
jessaustin
Let me preface this by saying that your epic trouble-shooting effort was
really cool. That's what inspires me to pay so much attention to this.

If you'll excuse my ignorance, could you identify which points made on the
linked page correspond to your suggestions? I can see how signature checking,
if there is in fact such a mechanism on the controller, can help ensure that
an EEPROM image is a member of a particular favored set of such images, but
you'll admit that that's a less general approach than "in-hardware sane
behavior". I don't know anything about µC design, but it would surprise me if
the mistake here were as simple as setting a "die when you see this particular
byte sequence" bit. It seems more likely that the behavior is an emergent
property based on a combination of flags and coded behavior. I still don't
think it would be possible for the controller to prevent that result in
general. It is possible to test for bad behavior, as your customers proved.
It's also possible for drivers to correctly handle the bad behavior of their
hardware, and I'm sure appropriate patches are welcome.

Did your board vendor inform you of Intel's findings back in October? If so,
could your original article have been a bit more explicit about the fact that
Intel wasn't responsible for this? If not, are you looking for another board
vendor?

~~~
kkielhofner
Thanks!

Let me start by saying that I'm not asking for or expecting perfect hardware
or software. This does not exist. I'm looking for improvements. Sane? Let's
start with "sane-er". I linked to the i210 because it offers exactly what I'm
asking for: improvement (as you'd expect in 4+ years of development).

The link for the i210 was an overview for general consumption. The 862 page
datasheet is here:

[http://www.intel.com/content/dam/www/public/us/en/documents/...](http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-
controller-datasheet.pdf)

The description of the various memory and configuration spaces starts around
page 53. When compared to what's available in the 82574L this is clearly a
substantial improvement.

However, as I say in my update, we still don't /really/ know why this issue
manifested the way it did. Without knowing the true underlying cause anything
I offer is speculation, as are your suppositions. With that it is unknown as
to whether or not the improvements in the i210 would have eliminated or even
ameliorated this issue.

As far as catching this exception in driver software? Possible, but doubtful.
Working with Intel last fall they seemed to dismiss this possibility. Current
drivers report a loss of communication with the PHY and the adapter seems to
essentially disappear from the PCI bus until a full power cycle.

Neither Intel nor my board vendor reported these findings to me until this
story broke last week. I reported this issue to them last fall: both of them
claimed to have never seen this issue before (or since).

Meanwhile, as I’ve said before, other people have consistently reproduced this
issue with different board manufacturers. We are pursuing a second source but
I'm not going to be any more confident with the second source if it has 82574L
controllers. I can't be certain it's going to be any different.

~~~
jessaustin
Thanks so much for the detailed response, and good luck in your hunt for
better vendors. It seems that it's going to fall to _you_ to test and correct
the EEPROM settings. You might want to keep your results to yourself in
future; you could probably get some big-money consulting work with other
companies forced to use these products. It's so shitty that neither party
bothered to respond until you went public with this.

------
GiHe
Cross-posted at h-online.com:

I have a plurality of systems with _Intel_ motherboards which demonstrate the
same kind of problems. The motherboards in question have two Intel ethernet
controllers, one of which is an 82574L.

The systems connect to two different networks. When the systems attach to one
of the networks (but not the other) using the 82574L interface (but not the
other), that interface dies after some unpredictable amount of time.

I have tried posting comments to the Intel engineer's blog post (and PM-ing
the engineer directly), but they do not appear. In fact, there seem to be no
comments at Intel's site, despite the post having nearly 6000 views (at my
time of writing).

Something is not right here.

~~~
kkielhofner
This.

As I say in my updated post, this is a complex issue with clear combinatorial
factors. More than likely it's not limited to one chip, one packet, or one
EEPROM configuration. A quick reading of the web shows various unexplained
issues with this family of Intel ethernet controllers randomly exhibiting the
exact behavior I've described. Different controllers, different mobo OEMs,
different EEPROM settings. Are all of these issues related to some kind of
"packet of death"? Certainly not. However, are at least some of them? Almost
certainly, even if they're not vulnerable to my (extremely specific) "packet
of death". We still don't know exactly why this is happening (even in my
extremely specific case).

~~~
GiHe
I have another interesting (and reproducible) manifestation on Supermicro
motherboards with two 82574L controllers. In this case, it is again true that
we only experience problems on the first (as ordered by ascending MAC address)
of the two interfaces.

That was the case, though I did not clearly state so above, on the Intel
motherboard with one 82574L and one 82579LM.

------
brownbat
Company from Taipei flashes some Intel equipment, then it appears to function
correctly, but can be bricked remotely with a specially crafted incoming
packet.

Company has US branch that's a government contractor: [http://government-
contractor.bizdirlib.com/ceo/Synertron_Tec...](http://government-
contractor.bizdirlib.com/ceo/Synertron_Technology_Inc)

Charming.

~~~
ersii
I think you're reading way more into this, than there is to it.

Taiwan (Republic of China) is by the way, basically it's own country with it's
own leadership and currency. I find it somewhat hard to put China (People's
Republic of China) and Taiwan (Republic of China) together.

~~~
alanctgardner2
Just to clarify, there's a difference between a Special Administrative Region
like Hong Kong or Macau, and Taiwan. While Hong Kong is largely self-governing
internally, it's still part of the PRC. Meanwhile, Taiwan (the ROC) was
founded by people ousted during the revolution. It's like saying North and
South Korea are 'basically' their own countries. Politically they aren't even
friendly.

~~~
ersii
Indeed, I guess I was a little too fuzzy in how I phrased myself in hind
sight. Thanks - a good addition in itself.

I guess the only really suitable way of explaining the situation is "It's
complicated.". It's a colourful situation and in no way is it neither black
nor white.

------
fulafel
Firmware images usually have checksums. Was this an Intel blob suffering from
bitrot, or does Intel have some more or less error prone way to build your own
FW images for NICs?

~~~
noonespecial
I suspect NICs these days are tiny computers in their own right. As a
motherboard manufacturer, you can probably program them to do all sorts of
nifty, with the possible downside of strange things happening if you get it
wrong.

------
Maci
Follow up story on: <http://news.ycombinator.com/item?id=5177815>

