
ZMap: Internet scanner maps all of IPv4 in 45 minutes - sweis
https://zmap.io/
======
tptacek
I thought the approach they took to permuting the address space was
particularly clever, but was quickly schooled on Twitter by how bog-standard
the approach apparently is (I very much concede the point!). In the interest
of honesty about how easily impressed I am, and because I still think it's a
neat little trick:

You want to generate a permutation of the entire IPv4 address space, but you
don't simply want to shuffle every possible IP address because that would
require you to keep an insane amount of state. So instead, work in the
multiplicative group modulo _p_ for prime _p_ > 2^32, find an appropriate
generator, and iterate by multiplying with the generator mod _p_. Remember the
prime, the generator, the starting address, and your current address and you
can detect a complete traversal of the space when the starting address recurs.

There are a number of simpler ways to do this (after sheepishly conceding that
this is pretty fundamental stuff, I played with using PRFs and card shuffling
to do it; DrHoney suggested Gray codes), but I liked how immediately obvious
the multiplicative group solution was, and that I could code it from a simple
description.

~~~
aortega
The problem of generating a non-repeating (2^n)-1 n-bit numbers permutation is
equivalent to generating a m-sequence
([http://en.wikipedia.org/wiki/Maximum_length_sequence](http://en.wikipedia.org/wiki/Maximum_length_sequence))
and the simplest way is a n-bit LFSR with a primitive poly, called maximal
LFSR. It's so much simpler and faster (some XORs and n bits of memory), I
cannot see why they chose that weird prime multiplication algorithm, in fact
right now I'm not completely sure that algorithm is correct.

~~~
pbsd
A 32-bit LFSR generates a sequence _of bits_ with period 2^32-1; what you
really want is a Lehmer-like generator using arithmetic in GF(2^32).

~~~
tptacek
You mathematicians, with your... understanding of the basic mathematics behind
things we developers already use every day, you all think you're better than
us, don't you? DON'T YOU?!

Can't I just be left to be impressed by a direct use of the math behind a
crappy textbook random number generator in peace?

-.-

~~~
pbsd
I'm not 100% sure if that was tongue-in-cheek or not, so I'll clarify: I was
disagreeing with aortega. This is not a typical PRNG, where the seed generally
marks the initial point of _the same sequence_. Choosing a random generator
matters, because we want a _different permutation_ every time, not simply a
different starting point. It is akin to running a block cipher in OFB mode
with a different key each time.

What they are doing modulo 2^32 + 15 is IMO the simplest way to achieve that
goal. Yes, you could do it without overflowing 2^32 using a binary field; you
could also cook your own mini block-cipher. But that increases the complexity
of the code and is not really faster everywhere.

~~~
aortega
>Choosing a random generator matters, because we want a different permutation
every time, not simply a different starting point.

Ah thanks, now it makes more sense, I knew I was missing something. What about
choosing different prime polys for a galois lfsr? I believe you will get the
same result.

~~~
aortega
Also you could XOR the LFSR output with any number to get a different
permutation each time.

------
mrb
When they say "scanning the whole Internet in 45 min" they mean scanning only
_one_ port of every IP address (for example sending a short GET request to
port 80/tcp) over a Gigabit link:

2^32 (IP addresses) * 1 (port per IP) * 80 (bytes per packet) * 8 (bits per
byte) / 1e9 (throughput in bit/sec) / 60 (sec per min) = 46 minutes (note:
excluding multicast space, RFC 1918 space, etc, scanning time would be reduced
down to ~35 min)

That's equivalent to "scanning all 65,535 ports of a /16 subnet in 45 min"
which does sound less impressive...

~~~
qwerty_asdf

      scanning all 65,535 ports of a /16 subnet in 45 min
    

...or in other words:

    
    
      scanning all ports in the reserved Class C range, 
      from 192.168.0.0 to 192.168.255.255, in 45 min

~~~
jlgaddis
> ... "class C" ...

I realize lots of people are simply in the habit of saying "Class C" when what
they really mean is a /24, "Class B" for a /16, etc. but classless routing[0]
has been around for 20 years now and these terms need to go away.

[0]: [http://en.wikipedia.org/wiki/Classless_Inter-
Domain_Routing](http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing)

~~~
cwb71
Except that qwerty_asdf wasn't referring to a /24 subnet, (s)he was talking
about one of the three address spaces defined in RFC 1918 and described
thusly:

"Note that (in pre-CIDR notation) the first block is nothing but a single
class A network number, while the second block is a set of 16 contiguous class
B network numbers, and third block is a set of 256 contiguous class C network
numbers."

So it is common for crusty old network engineers and sysadmins to refer to
192.168/16 as "the class C" private block, even when they understand that you
can subnet it however you'd like.

~~~
jlgaddis
> ... (s)he was talking about one of the three address spaces defined ...

Right, I realized that when s/he said "reserved Class C range". It was more of
a general observation. I always forget I have to be extremely specific here on
HN.

------
colonelxc
I'm somewhat surprised there is no reference to scanrand[1][2], a fast
stateless syn scanner by Dan Kaminsky in 2002. It wasn't directly geared
towards scanning the entire Internet, but instead scanning large subnets (like
for a pen test of a /16 network... takes a long time to scan).

It is a bit obscure, but it did do tricks like encoding encrypted data in
extra mutable fields (just the sequence number for scanrand) for validation
purposes. Actually, scanrand 2.0 can apparently measure latency (without
state!) by encoding timing information in the source port field, which zmap
doesn't currently do.

I think this research is great, but I just hate to see interesting old
projects get forgotten.

[1]
[http://dankaminsky.com/2002/11/18/77/](http://dankaminsky.com/2002/11/18/77/)
[2] [http://www.sans.org/security-
resources/idfaq/scanrand.php](http://www.sans.org/security-
resources/idfaq/scanrand.php) [3]
[http://s3.amazonaws.com/dmk/SBO_Hiver.ppt](http://s3.amazonaws.com/dmk/SBO_Hiver.ppt)

~~~
dsl
I currently scan the entire internet once a week using a re-implementation of
scanrand I did myself. (and will be switching to Zmap shortly)

There aren't as many people using it as you'd think because 1) finding a
working download link is quite an exercise and 2) compiling paketto is near
impossible except on Dan's machine. :)

~~~
jnazario
check out dscan from dugsong, which was built around 2003 or so to address
that problem, that dan doesn't often write portable code.

[https://github.com/dugsong/dscan](https://github.com/dugsong/dscan)

------
__alexs
The anonymously published whole Internet survey scanned 100 ports on every
address as well as a few other things. It took something like 30,000 devices
and months of work though so I guess this is pretty impressive.

[http://internetcensus2012.bitbucket.org/paper.html](http://internetcensus2012.bitbucket.org/paper.html)

~~~
afreak
A friend of mine conveniently built a search around the data:

[http://exfiltrated.com/querystart.php](http://exfiltrated.com/querystart.php)

------
adamseabrook
Running this from any web host will almost certainly get your server turned
off along with a pile of abuse emails. To make it run even faster you should
scrub all the Bogon routes on this list: [https://www.team-
cymru.org/Services/Bogons/http.html](https://www.team-
cymru.org/Services/Bogons/http.html)

I would also scrub all the sinkholes and captured botnet C&C ip addresses as
hitting those will lower the reputation your netblock. List we use at meanpath
is:
[http://mirror1.malwaredomains.com/files/domains.txt](http://mirror1.malwaredomains.com/files/domains.txt)
[https://zeustracker.abuse.ch/blocklist.php?download=domainbl...](https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist)
[https://zeustracker.abuse.ch/blocklist.php?download=ipblockl...](https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist)
[http://malc0de.com/bl/IP_Blacklist.txt](http://malc0de.com/bl/IP_Blacklist.txt)
[http://hosts-file.net/download/hosts.txt](http://hosts-
file.net/download/hosts.txt) [http://www.joewein.net/dl/bl/dom-bl-
base.txt](http://www.joewein.net/dl/bl/dom-bl-base.txt)
[http://www.dshield.org/feeds/suspiciousdomains_High.txt](http://www.dshield.org/feeds/suspiciousdomains_High.txt)
[http://www.malware.com.br/cgi/submit?action=list](http://www.malware.com.br/cgi/submit?action=list)
[https://spyeyetracker.abuse.ch/blocklist.php?download=domain...](https://spyeyetracker.abuse.ch/blocklist.php?download=domainblocklist)
[https://spyeyetracker.abuse.ch/blocklist.php?download=ipbloc...](https://spyeyetracker.abuse.ch/blocklist.php?download=ipblocklist)

------
acd
As a phun side note. Reverse mapping government censorship of DNS. Since I
think governments are starting to go down the slippery slope road of
censorship here is how map out the censorship.

If you have the list of the whole internet servers which answers on http port
80. Then you can reverse map government censorship dns list. Ie you can find
out what the government wants to censor by doing lookups in the censored dns
and for example opendns on the servers ip that answers on port 80, then you
diff the results from the dns servers and if you get different answers you
find out the government black list.

------
spindritf
"How to get dropped by your ISP in under an hour."

There has been a very positive trend recently in the quality of documentation,
a move away from dry, man-style listing of options to more operational
descriptions, tutorial, examples, a bit of hand-holding. Here's a Docker
tutorial[1], still on the front page.

[1]
[https://www.docker.io/gettingstarted/](https://www.docker.io/gettingstarted/)

~~~
akl
A move _away_ from having man pages isn't necessarily a thing to be celebrated
- a well-written man page is an extremely useful thing.

In the ideal case, you might have a quality man page that provides usage
information and links to more detailed documentation (that would include
tutorials, implementation info, etc.) on the web somewhere.

~~~
nakkiel
I suppose the internet generation never really took the time to read man pages
(nor to write one for that matter).

~~~
jlgaddis
That's too bad. When I was getting started, the best documentation available
to me was the man pages (this was before I had an "always-on" broadband
connection) and TLDP's "Linux HOWTO's". I printed many of them and took them
with me to high school to study in class.

It's great that we have blogs and such nowadays where anyone and everyone can
contribute their own documentation, guides, tutorials, etc., but there was
something awesome about having a single, centralized, authoritative HOWTO
covering a particular topic.

------
dylangs1030
This is very cool, but I'm curious as to why they stuck with ZMap. Nmap's
graphical interface, Zenmap, sounds very similar and has a huge following
among security researchers.

For the technically inclined, a good white paper describing the advantages of
IPv4-wide scanning for security reconaissance and the advantages of ZMap vs
other tools like NMap can be found here:

[https://www.usenix.org/system/files/conference/usenixsecurit...](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_durumeric.pdf)

~~~
dadrian
Probably because the author's first name is Zakir.

------
anon90424671
Wonder why they didn't mention unicornscan in their paper?

User-land Distributed Portscanner released in 2005:
[http://unicornscan.com](http://unicornscan.com)

Defcon Presentation Introducing Unicornscan from 2005:
[http://www.youtube.com/watch?v=ZdCEo6yoEWA](http://www.youtube.com/watch?v=ZdCEo6yoEWA)

------
smutticus
1) "ZMap supports both blacklisting and whitelisting network prefixes. If ZMap
is not provided with blacklist or whitelist parameters, ZMap will scan all
IPv4 address (including local, reserved, and multicast addresses)."

What an absolutely stupid default setting. Thanks for giving a bunch of noobs
a simple IPMC DOS application. If this thing gets popular it will soon be the
bane of network admins everywhere.

2) There are at least 2 obvious omissions from their default blacklist file.
There might be more but these are the obvious ones that come to mind. class-E
240.0.0.0/4 CGN 100.64.0.0/10

3) Can someone explain to me why I wouldn't just want to use nmap to do this
same thing? Why do we need a new tool for this?

------
muloka
In case anyone is wondering if this worked on IPv6, and the rate was constant
(45 minutes to do 2^32) it would take about 2 octillion days to scan all of
the IPv6 addresses.

------
iotakodali
gonna try it over university network

~~~
runarb
I did something similar when I was researching search technology at my
university[0]. Then when coming back from lunch some weeks later I found two
gays from the it-department had locked them self into our office to
investigating what we where doing. Apparently we had hit a lot of honeypots :)

0: [http://www.boitho.com/](http://www.boitho.com/)

~~~
nickthemagicman
Why would you assume they were gay?

~~~
runarb
Lol. I meant guys :) English is not my first languages ( gays/guys, "let's
have a beer/bear" etc don't get caught in most spell checkers... ).

~~~
jahitr
Just for a moment I thought I was reading a plot for a gay porno. LOL

~~~
SimHacker
Good thing TCP/IP legalized same-sex connections. NCP was not so open minded.

The old NCP networking protocol required that connect and listen sockets must
have different parity gender (one even, the other odd -- I can't remember
which was which, or if it mattered -- they just had to be different). The act
of trying to connect an even socket to another even socket, or an odd socket
to another odd socket, was called "homosocketuality", and it was strictly
forbidden by internet protocols, and was called the "Anita Bryant feature".

[http://www.saildart.org/IMPSER.DOC[SS,SYS][1]](http://www.saildart.org/IMPSER.DOC\[SS,SYS\]\[1\])

Illegal gender in RFC, host hhh/iii, link 0

The host is trying to engage us in homosocketuality. Since this is against the
laws of God and ARPA, we naturally refuse to consent to it.

[http://www.saildart.org/FTP.OLD[S,NET]1[2]](http://www.saildart.org/FTP.OLD\[S,NET\]1\[2\])

    
    
        ; Try to initiate connection
    
        loginj:
                init log,17
                sixbit /IMP/
                0
                jrst noinit
                setzm conecb
                setom conecb+lsloc
                move ac3,hostno
                movem ac3,conecb+hloc
                setom conecb+wfloc
                movei ac3,40
                movem ac3,conecb+bsloc
                move ac3,consck
                trnn ac3,1
                    jrst gayskt            ; only heterosocketuals can win!
                 movem ac3,conecb+fsloc
                 mtape log,[
                       =15
                        byte (6) 2,24,0,7,7
                             ]          ; Time out CLS, RFNM, RFC, and INPut
    
        [...]
    
        gayskt:    outstr [asciz/Homosocketuality is prohibited (the Anita Bryant feature)
    
        /]
    
            ife rsexec,<jrst rstart;>exit       1,
    

(The code above adds the connect and listen socket numbers together, which
results in bit 0 being 0 if they are the same gender, then TRNN is "test bits
right, no change, skip if non zero", which skips the next instruction (jrst
gayskt) if they different sex.)

------
ihsw
It's interesting to see that this supports pushing to redis lists, which is
clarified as being an 'output module.'

[https://zmap.io/documentation.html#extending](https://zmap.io/documentation.html#extending)

Very good, very extensible.

