Hacker News new | past | comments | ask | show | jobs | submit login
How both TCP and Ethernet checksums fail (evanjones.ca)
46 points by jsnell on Oct 9, 2015 | hide | past | favorite | 26 comments

If the chance is purely random, we should expect approximately 1 in 2^16 (approximately 0.001%) of corrupt packets to not be detected. This seems small, but on one Gigabit Ethernet connection, that could be as many as 15 packets per second.

Not really. You'd only get that many corrupt packets if you have a Gbps of traffic flowing; but as soon as you start detectably corrupting 99.999% of packets, TCP throughput is going to drop dramatically and so you'll have fewer packets available to corrupt.

With 1500 Byte packets at 1Gbps you're pushing 83,333 packets per second. If 1% of those (833) are corrupted and 1 in every 2^16 corrupted packets has a valid CRC then you have 1 corrupt packet with a valid CRC every 78 seconds.

Still not something to ignore, but not nearly as bad as the author indicates.

Right, I think 1% PER is around the worst case in terms of having lots of errors while not causing the data rate to scale back dramatically in common internet-facing applications. I was estimating around one undetected error per minute; as you say, not something to ignore, but definitely not as bad as the author suggested either.

Actually, I don't think that's the case. While it's true that the theoretical maximum speed of an individual connection would decrease rapidly as packet loss increases, the aggregate data rate of the traffic going through a single network element would not necessarily be affected all that much.

It's totally routine to see much higher network wide packet loss rates higher than 1%. The most I can remember was >15% sustained for weeks, for a few Gbps of real life traffic in a mobile network. ("Real life" as in hundreds of thousands of normal users, with the traffic coming directly from whatever servers they were actually accessing).

Right, mobile networks are unusual. My point about "common internet-facing appliances" is that most systems maintain cwnd values of at least 10 segments, or else people notice and complain about poor performance.


A study was done on the Performance of CRCs on Real Data. CRC16 as used for TCP has real biases. "In one dramatic case, 0.01% of the check values appeared nearly 15% of the time"

Cut through switching in 10 GbE applications however does not modify ANYTHING about the packet as it gets sent along. It'll calculate that a packet was bad, but at that point it's already too late to do anything about it because it's already forwarded part or all of it onto the next wire segment.

This can be equally frustrating, as now you have to trace the entire path from switch to switch and try and figure out what cable/fibre is bad, and you see error counters increase on multiple interfaces.

It's not store-and-forward vs. cut-through. It's whether or not the switch acts as a layer 3 device, or a layer 2 device. If it acts as a plain old layer 2 device, it can pass the packet, unmodified. As a layer 3 device, it modifies the layer 2 headers, and the TTL. As a layer 3 device, it can still cut-through.

Source: Broadcom documentation

I would call a layer 3 device a router. Whether it is marketed as a switch or not. Layer 2 devices is what I was referring to.

My experience is with the Nexus platform from Cisco, pure layer 2.

One thing to note about CRCs is that they are good for error correction but not good as hashing functions. I ran a bunch of "almost sequential" identifiers through CRC32, and upon producing 1024 buckets out of it, found half of the buckets had the lion's share of hits.


How does that compare with the expected binomial distribution?

This seems small, but on one Gigabit Ethernet connection, that could be as many as 15 packets per second.

Only if your network is corrupting every packet!

Data corruption is a serious problem, but it doesn't help the discussion if you wildly over-estimate its occurrence.

If you don't monitor bad TCP segment counts, you get what you deserve. It's also smart to have your own end-to-end checksums on serialized objects.

> The root cause appears to have been a switch that was corrupting packets. ... The hypothesis is that occasionally the corrupt packets had valid TCP and Ethernet checksums. One "lucky" packet stored corrupt data in memcache.

Did the server (both source and destination) in question both/all have ECC-protected memory? Hopefully that's a foregone conclusion but that's another big opportunity for errors.

A couple things:

1. IPv6 completely removes the checksum you used to have in IPv4. So, now there is just the Ethernet FCS, and the TCP checksums. You should use IPv6. If you're not using IPv6, you're only hurting yourself, and the rest of the internet.

2. Just transport everything over SSL, please. With AES-NI, the overhead for encrypting data is so tiny, that it's easier just to let someone else solve this problem.

SSL still has significant overhead, Netflix did a bunch of work [1], and they still can only push 10 Gbps out of a box that used to be able to push 40 Gbps (quad port 10G nic). 1/4th the throughput seems like a lot of overhead; and I'm a mere mortal, and can't put TLS into my kernel.

[1] https://people.freebsd.org/~rrs/asiabsd_2015_tls.pdf

I agree with the comment in the article about using cryptographic hashes (where possible, of course): there are huge peace-of-mind advantages to just not having to worry about a problem. Obviously, there can be situations in which one must make a pragmatic engineering tradeoff between reliability and performance, but in the main I think it's worth doing.

I have been lazy about this recently. A buddy of mine who works on flash storage told me to do the same with any on disk data structures.

This post has pushed me to finally getting checksums in.

That's why file systems that do end to end checksumming should be the norm.

I think a reasonable argument could be made that if it's in the filesystem it's not end-to-end.

Mind explaining a little more what you mean?

ZFS checksumming has served me well..

If you mean "it's not checksumming data not represented in the filesystem, like unpartitioned sectors", well, no one cares about that data, right?

end-to-end checksumming refers to the fact that when a file is written it's checksum is generated and stored, upon reading the file system verifies the checksum before returning the data to the application. That is, the read() will return valid data.

See this post from Jeff Bonwick: https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data

Realistically it's not going to be that simple though. The filesystem proper (e.g. ZFS) may verify checksums when pulling in data from a storage device, but then it's going to sit in the page cache for some arbitrarily long time. Whether by faulty DRAM, cosmic rays, or writes through stray pointers, that's a potentially major source of vulnerability to data corruption. There's of course a mirror-image vulnerability on the write path: you write into the page cache, and if something borks your data between that point and when it actually gets written back to storage, you end up persisting it (quite arguably worse, if perhaps less likely due to dirty data usually getting flushed within a reasonably short time).

Basically, the filesystem isn't the end point, and thus simply isn't positioned to really provide "end-to-end" protection.

If your OS allows writes through stray pointers, you have bigger issues like code exec...

Also, ZFS only has it's claims to data integrity if you use ECC memory.

BTW, anything running on top of the machine is going to be vulnerable to your example, i.e. if you have a program that checksums something, and then writes it to disk, there is opportunity for corruption while it's reading the data to checksum it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact