Hacker News new | comments | show | ask | jobs | submit login

As someone who works with this, I would like to know how they can be sure their results are reliable? Just starting a sender and receiver thread simply wont do. At that rate congestion happens, and we'll start seeing packet loss. With a stateless approach, the only thing you can do to prevent this is arbitrarily slowing down the rate of packets being sent. Using that approach, it is going to take way longer than ten hours if you scan from one location.

It works perfectly well as long as your results are not used for anything important I guess. But if you have customers who needs reliable results, this naïve approach simply don't cut it in my experience.

During the scan we monitor the bandwidth, and we have control pings in order to check all the time the server can send and receive pings. We took certain monitoring and slow down things. Sure it was nor perfect, but the reliability were considered during the experiment.

So, what measures were taken?

The problem is not sending packets fast enough. It's not about bandwidth. The problem is sending them just fast enough, which is impossible if you're scanning statelessly with just ICMP echoes.

Let's say you're on a 100 mbit ethernet, your uplink is only 8 mbit. If you send packets at a rate of 10 mbits, packet loss will happen. And you're not the only one using the network either, so this can happen way earlier. And that's only the part of the network that you control. There might be a lot of hops between you and the host you're sending packets to. And with your approach (the way I understand it) you're not gonna notice packet loss.

I might make too many assumptions here, but ten hours is just too short of a time period for a network of that size for a reliable result. I'm very sceptical. But please prove me wrong, because it will def. make my job easier.

I guess you could publish the code, so I could test it myself.

What if you did a much slower (i.e., reliable) scan of a small sample? Then you could compute the probability of false negatives in the fast search and get a much more accurate count.

I guess it depends on what kind of results you want. It's possible to scan the public IPv4 address space reliably, but it requires a bit more effort than just sending out packets to see what you get back over a relatively short time frame.

You can split the address space up across several different scanners on different physical links. You can estimate the RTT to a network segment you're scanning and base your timings on that. Probing with TCP packets can yield better results than ICMP packets for this type of activity. There's so many variables involved.

Build a tool that allows you to send ICMP packets at a fixed rate (preferably in the kernel, or even without an OS at all if you're into that. Getting precise timings in user land is hard) or just a tool that sleeps between packets with the possibility of not sleeping at all. It's an educational experience. Scan a relatively small range of addresses bound to hosts on the other side of the world at different speeds and see the diff in results. Maybe there's a good tool for that already.

Whenever I read about "We've scanned/product X can scan the internet in X hours" I'm very sceptical. Unless the results are verifiable in some way (which is hard to guess/estimate for such a large sample) or the approach they took seems like a sane one (very subjective I guess), I assume they don't know what they're doing. The reason I assume this is because I've been there myself.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact