
Bitsquatting: DNS Hijacking without exploitation - jakub_g
http://dinaburg.org/bitsquatting.html
======
antirez
A serious amount of Redis crash reports are due to memory errors (we ask to
test memory after crashes, since real segfaults are very rare, not everybody
tests, tests may lead to false negative, so the problem is bigger than the one
we observe).

This experiment also seems to show that there is a lot of corrupted memory out
there.

There is a simple fix, I wonder why it is not used:

1) Add a feature at operating system level that one time every second tests N
memory pages, at random. No observable performance degradation.

2) Report the problem to the user when it is found.

3) Mark the page as faked up and don't use it, ever.

So you have memory tested basically for free, users aware of their hardware
errors, and a lot less consequences for those errors.

~~~
kevingadd
Guild Wars did this in user mode in the background while playing - it did
background testing of the user's CPU, RAM, and GPU. Machines that failed the
tests were flagged so that their crash reports got bucketed separately (saving
us the time of trying to understand impossible crashes), and it popped up a
message telling the user their computer was broken.

So even if the OS should be doing this for you, for long-running processes you
could do it yourself in user mode. (I don't know if it's worth the effort,
though.)

~~~
antirez
That's exactly my plan with Redis, and it is awesome to discover that it was
used with success in the past! But I've a problem given that I can't access
memory at a lower level, that is, when to test and what?

I've the following possibilities basically:

1) Test on malloc(), with a given probability, and perhaps only up to N bytes
of the allocation, for latency concerns.

2) Do the same also at free() time.

3) From the time to time test just allocating a new piece of memory with
malloc of a fixed size, test it, and free it again.

"3" is the one with the minimum overhead, but probably 1+2 have a bigger
percentage of hitting all the pages, eventually...

I don't have a broken memory module to test different strategies, I wonder if
there is a kenrel module that simulates broken mem.

Note that Redis can already test the computer memory with 'redis-server
--test-memory' but of course this requires user intervention.

------
jakub_g
TL;DR: every day, there are hundreds of wrong URL requests being done due to
memory failures in the computers. Due to hardware problem, the computer can
connect e.g. to microsmft.com instead of microsoft.com. The data gathered by
the researcher suggests those kind of bugs happen also in web caches etc. thus
increasing the number of affected users.

In practice, the privacy problems coming from this are rather limited (unless
you send private stuff in URL), since in majority of cases, you'll not be
sending domain cookies for the original domain if it resolved to the bitsquat
domain early. Anyway, it's still probably a thing not thoroughly thought of on
a daily basis regarding security.

~~~
jakub_g
There are follow-up blog articles on the author's page. This one is also
interesting: [http://blog.dinaburg.org/2012/10/a-preview-of-
bitsquatting-p...](http://blog.dinaburg.org/2012/10/a-preview-of-bitsquatting-
pcaps.html)

The original link was found at h-online.

------
spullara
I did this experiment by bitsquatting all the domains around cloudfront.net
after hearing about it from defcon. It works. You basically have the
opportunity to replace the javascript of tons of sites. I simply served 404s.
What was really interesting to me was the varied places where the corruption
occurs. Some of the requests even have the correct Host header. Now you know
why the old PC was so flaky!

~~~
jakub_g
I've started thinking about all those banks [1] and other pages serving
like/tweet buttons on the login page.

Or pages including Google Analytics. If the described behaviors really take
place, given the massive scale of deployment of Google Analytics, Statcounter,
FB buttons, jQuery includes from CDNs, you should be able to do arbitrary JS
injections to a non-trivial number of users (though very random).

[1] [http://my.opera.com/hallvors/blog/2012/05/11/social-media-
ba...](http://my.opera.com/hallvors/blog/2012/05/11/social-media-banking)

------
0x0
It's pretty amazing to see so many bit errors make it through DNS resolution
without the client machines crashing instantly; imagine if the bit errors were
introduced in RAM containing code or data structures with memory pointers,
instead of a domain name!

Thinking about it, I guess such crashes are actually a much larger number.

~~~
csense
Everyone who's worked with computers for any length of time has seen
inexplicable, unreproducible crashes, lockups or reboots.

We're just conditioned to ignore them if they're neither frequent nor
reproducible.

------
tyoma
Hi,

Author of the article here. Was happily surprised to find it linked on the
front page of HN.

I can try to answer any questions that you may have have.

I'm also in the midst of writing another blog post, this time talking about
the bit-error distribution in the DNS query type field. Spoiler: its not
uniform.

~~~
tyoma
Part 3 is now up: [http://blog.dinaburg.org/2012/11/bitsquatting-pcap-
analysis-...](http://blog.dinaburg.org/2012/11/bitsquatting-pcap-analysis-
part-3-bit.html)

~~~
skorgu
The part about the flip being mostly on a single bit reminded me of [1], I
wonder if we're seeing the same cause in two different ways?

[1] [http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-
so...](http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-
couldnt.html)

------
jackalope
_HTTP 1.1 includes a header field called Host._

Technically, this statement is correct, but don't overlook the fact that while
the Host header is optional in HTTP 1.0, most HTTP 1.0 clients will include it
out of necessity. It's nearly impossible to guarantee you'll get the correct
resource without a Host header, these days.

~~~
lmm
Are there HTTP 1.0 clients still in use? I can't imagine why a maintained
program wouldn't be using 1.1.

------
option_greek
Very fascinating... if a very narrow scenario with a very low probability
resulted in this, how is it that these errors are not apparent during other
computer activities ?

~~~
0x0
They probably are, but people are used to bluescreens and unexpected app
crashes and think nothing of it.

~~~
jackalope
I dunno. Something about this analysis bothers me. The basic premise is that a
string the length of a domain name routinely, albeit rarely, gets corrupted by
one bit, causing errant DNS lookups. Then it's reasonable to assume that a
longer string is even more likely to contain corruption. But if that's so, why
do I almost never see any evidence of bit corruption in my web server logs?
Surely the same corruption would affect other parts of the URL, and the
probability should be greater due to the length. But I can't find a single
example in my logs that can't be explained by human error (typos by users or
developers). If bit corruption is so overwhelmingly prevalent in hostnames,
but not URLs or other identifiers, I suspect it's due to a software bug
somewhere.

~~~
njs12345
Presumably your web server doesn't serve quite as much traffic as fbcdn.net.
The odds of such a bitflip happening are vanishingly low, so you need an
incredibly large amount of traffic before you'll see such errors occurring.

~~~
phyalow
In my understanding of network communication and data transmission this should
be impossible. All payload data and encapsulated header data etc is subject to
checksums, hash's, variable encoding schemes on the wire, parity balancing,
redundant bit insertion (hamming) etc. The result of which will always signal
an errors presence. Even if the bit flip occurs in Primary memory surely the
OS's memory management subsystem's would detect the corruption.

So for a bit flip not to be detected and remedied before the execution of an
errant DNS lookup seems odd. Although I could be wrong (just a final year CS
student).

EDIT: Just watched the video, originally classified it as TLDW, seems
plausible.

~~~
tjgq
Note that no error detection code is able to detect all errors; it just lowers
the probability of an error passing undetected even further. (CRCs are pretty
robust in that they always detect sequences of errors with length <= N, with N
depending on the particular algorithm.) With a large enough sample size, you
_will_ hit errors.

In this case it is probably memory corruption. The OS won't be able to detect
such a thing unless the memory has ECC (relatively uncommon these days). It
could theoretically detect it if the memory pages were checksummed and
periodically verified against the checksum, but afaik no OS does so.

------
tptacek
I wonder if the distribution of hits to these domains follows the popularity
of the underlying "correct" domain, which is what you'd expect if random bit
errors were causing these hits and would help corroborate his claim.

~~~
0x0
I'm wondering if the string length or alignment could affect it, too. For
example, if memory is allocated in 16byte chunks, longer domain names might
have a larger chance for having a bit flipped in the active part of the string
(instead of the padding). Just speculating wildly here.

~~~
spc476
There's no padding in DNS requests (having written my own DNS decoding
routines). There's also very little that can change in a DNS packet that won't
cause an error---basically, the only thing that can change without causing a
DNS decoding error are text-related fields (say, the payload of a TXT or SPF
record type) and even then, given the restrictions on character sets in DNS
host names (and the crazy compression scheme used for domain names), it's
actually surprising to see bit-6 errors, as that bit should cause more invalid
domain names than not.

Edited because I thought bit-6 errors would flip letter case (upper to lower,
lower to upper) when it's bit-5 that will do that.

~~~
tptacek
I'm a little skeptical of this whole thing. Obviously, one other thing about
bit errors in DNS packets is that they need to not break label compression.

~~~
0x0
Do you mean the whole article, or just the parent post? Because we're probably
talking about bit errors while the domain name is still just a string in
application memory, not when it's being assembled to a DNS packet or in
transit. So I don't see how label compression comes into play.

------
tjgq
This reminds me of this (slightly old) article, wherein the authors observe
that about 1 in every 30,000 TCP packets fails the TCP checksum, even though
the actual errors should have been caught by the link-level CRC. They go on to
speculate about possible causes, which include memory corruption at the hosts.

A good read, and it seems to be an instance of the same problem.

"When The CRC and TCP Checksum Disagree" -
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.7...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.7611)

------
cabirum
Are those really memory corruptions and not, say, damaged cables or bugs in
network infrastructure equipment? Or, just typos in scripts (both server and
client side)?

~~~
antirez
Other bit-flipping issues should be usually trapped by checksums. Typos in
scripts is possible but then you should get a massive amount of requests from
the same IP and should be easy to notice.

------
zokier
Yet, ECC on consumer hardware ranges from exotic to non-existent.

