
Why Does It Take So Long to Connect to a WiFi Access Point? - 2dvisio
https://arxiv.org/abs/1701.02528
======
IshKebab
And why is there no proper 'invalid password' response? It always seems to me
like if you have the correct password the connection succeeds and if you have
the wrong password the connection just mysteriously fails and the computer has
to guess that maybe it was because the password was wrong? In fact I've had a
Macbook tell me the password was wrong when it was _definitely_ correct. I
imagine it's because they have to guess the failure reason.

~~~
gipsies
If you're using a pre-shared key, the password is verified during the 4-way
handshake. The thing is, if your password is wrong, then the Message
Authentication Code (MAC) of the messages your are sending is wrong. The AP
will simply drop frames with a wrong MAC, and will not respond to them. The
problem is that as a client you do not know whether the AP is not responding
because (1) the MAC was wrong, and hence the password was wrong; or (2) the
message did not arrive at the AP (or you did not receive the response of the
AP).

tl;dr: can't tell the difference between dropped messages due to wrong
authentication check (i.e. wrong password), or dropped messages due to bad
connection.

~~~
zaroth
I think this is half right. There is still an L1 ACK, so the STA doesn't have
to retry sending the packet, it knows it was received.

I believe what happens is AP sends Nonce to the STA, STA uses the PSK to send
Message 2 back to the AP. It will receive a '802.11 Ack' but then no Message 3
of the 4-way handshake will ever come from the AP.

Good drivers see this and flag an invalid password warning back to the user
within milliseconds. But bad drivers... sure, they will just keep assuming
magic dust got in the way and if they just keep retrying the handshake enough
maybe they will see a Message 3.

I'm not sure why from a security hardening perspective it's better not to
specify the AP should send '802.11 Disassoc' immediately after receiving an
invalid Message 2 with a proper error code so that the driver can message the
UI that the password is wrong instantly.

~~~
gipsies
Not really. The STA may know that its message was received, but can never be
sure whether the AP replied. The reply from the AP could have been missed due
to noise, or maybe it didn't reply at all. You cannot be sure. There are only
heuristics.

Good drivers indeed tell you whether the message arrived or not. But it's up
to the client to decide what to do with that information. And again, it's just
a heuristic. I've read and messed with the code of four different Wi-Fi
clients, and none of them attempt to detect a bad password this way. Most
simply report an error after trying to retransmit message 2 multiple times
(e.g. if wpa_supplicant got message 1 from the AP, but didn't get a reply to
message 2, it warns that _maybe_ the password was wrong).

------
zwieback
This is great work but I think the machine learning part is just thrown in for
trendiness reasons. More rigorous statistical analysis should lead to a
cleaner algorithm without relying on pretrained classifiers.

~~~
thearn4
In general, any opinions about where we are on the hype curve for machine
learning (as a stand-alone concept separate from traditional statistical
inference)?

[https://en.wikipedia.org/wiki/Hype_cycle](https://en.wikipedia.org/wiki/Hype_cycle)

~~~
visarga
It's not the first hype cycle, so we've already been through all phases, but
now I guess we are still raising towards peak hype. The fact is that this time
AI has real world applications and benefits, so it's not just a bunch of hot
air. It will not have a crush at this level of hype yet because we now have
lots of results that were unimaginable even 5 years ago.

We're breaking human level accuracy in vision, speech, text and behavior - on
specific tasks, not in general yet. In the last 3 years neural nets have
become creative - now they can create paintings (neural style transfer),
images (GANs), sounds (WaveNet), text (seq2seq, translation, image captioning)
and gameplay (Atari, AlphaGo).

All these are complex forms of creativity, as opposed to simple classifiers.
So we have recent progress, there is no period of lacking results behind us.
That's why we're still on the rise.

~~~
zwieback
Well said, I would agree that we're just to the left of the peak but that
applies more to the areas adjacent to the problems where deep learning has
shown some real value.

I remember the previous AI peaks in the 80s and 90s as well the neural net and
fuzzy logic euphorias. The problem back then was that the results were not
that exciting. Now we have some really impressive applications searching vast
datasets and recognizing useful features. However, I notice everyone is trying
to apply the same algorithms to problems not well suited for that type of
approach.

------
bfirsh
This reminded me of this explanation of how MacBooks (and presumably other
devices) manage to reconnect to Wifi so quickly after waking from sleep:
[http://cafbit.com/entry/rapid_dhcp_or_how_do](http://cafbit.com/entry/rapid_dhcp_or_how_do)

~~~
digi_owl
And in the process walking all over various security best practices?

And was there not a similar security snafu involving iPhones broadcasting past
ssids every time they tried to connect to a access point?

~~~
kiliankoe
Don't most (mobile) devices broadcast all known SSIDs just in case a network
is available and hidden? I don't really know a lot about the subject matter,
but as I recall this does not apply to _just iPhones_.

I also remember an installation at the Datenspuren in Dresden with a monitor
showing all of these SSIDs it intercepted with people walking past and being
astounded how the device knew their home network name^^

~~~
ycmbntrthrwaway
Almost all Android devices do the same, they just send probe requests with all
known networks. That is why you can automatically connect to "hidden"
networks. Just run kismet or wireshark and see for yourself.

~~~
BoorishBears
And it can be used to uniquely identify devices and track them with fairly
standard hardware

~~~
epistasis
Doesn't a MAC address get broadcast anyway? Unless wifi devices are
randomizing their MAC addresses that seems like a fairly trackable thing.

~~~
nitrogen
IIRC newer iOS devices do randomize the MAC address used for probes.

------
dorianm

        > we develop a machine learning based AP selection strategy that can
        > significantly improve WiFi connection set-up performance, against the
        > conventional strategy purely based on signal strength, by reducing the
        > connection set-up failures from 33% to 3.6% and reducing 80% time costs of the
        > connection set-up processes by more than 10 times.
    
        > The correlation analysis finds that though the signal strength is important,
        > knowing the AP model and mobile device model has great help to predict the
        > connection set-up time cost.
    

Neat!

~~~
saycheese
> [Abstract] "we develop a machine learning based AP selection strategy that
> can significantly improve WiFi connection set-up performance, against the
> conventional strategy purely based on signal strength, by reducing the
> connection set-up failures from 33% to 3.6% and reducing 80% time costs of
> the connection set-up processes by more than 10 times."

> [Conclusion] "The correlation analysis finds that though the signal strength
> is important, knowing the AP model and mobile device model has great help to
> predict the connection set-up time cost. To the best of our knowledge, we
> are the first to add AP model and mobile device model as features which
> greatly increases the accuracy to predict the connection set-up time cost."

_____

(NOTE: Please do not use block quote formatting, it's unreadable on mobile.)

~~~
emmelaich
@HN please add wrappable quoting markup.

~~~
mintplant
I requested this a while ago and was told that, although they like the idea,
adding standardized blockquote markup would somehow break HN's spam filter.

~~~
nitrogen
It's not uncommon to use italics for quoted paragraphs, optionally prefixed by
a single greater than, or surrounded by quotes.

 _Like this._

 _> Or this._

------
thyselius
Was hoping for an explanation to: Why doesn't it take a fraction of a second
to connect?

~~~
hannibalhorn
The paper says that the biggest delay is in the "scan" phase, just getting a
list of all the available APs in the area. This would be the same problem
addressed by Apple's "try to associate to all known SSIDs on powerup"
approach.

Maybe I'm missing something, but their actual machine learning model seems to
address a different problem:

    
    
      The final features we choose to train: the connection
      time cost includes hour of day, RSSI, mobile device
      model, AP model, Encrypted
    

Given this, I they're basically giving lower priority to known incompatible
device & AP pairs based on their BSS IDs.

Not sure I like the approach that much, seems an AP running a recent OpenWRT
and very reliable would be penalized for having buggy factory firmware.

~~~
andreyf
Seems like something simple like "look for the last N and the most common {M0,
M1, M2} in the past {week, month, year} before doing a full scan" should hit
the vast majority of use cases, no?

------
ams6110
Apple devices seem to aggressively cache the last known IP address on each
wireless network, rather than issue a new DHCP request.

At home this results in duplicate IP addresses when the kid with the iPhone
gets home after being away and meanwhile another device has started using that
IP address. This tends to bork up the entire network on my cheap Netgear
router and I usually have to reset it at that point.

~~~
paulddraper
Naive question...why does that mess with the network? Your router has the
correct MAC <-> IP mapping, and iPhone kid is the only one losing out.

~~~
ams6110
Not sure but the correlation of Apple device walking in the door and entire
home network hanging is pretty consistent.

I presume the router periodically issues "who has 192.168.1.10" or whatever
and upon getting responses from two different MAC address just gives up.

------
azernik
I would be very interested in seeing the correlations between set-up time and
total devices in the area (or, to be more precise, total channel utilization).
This paper studied devices associated per access, point, which is a separate
metric. They mentioned the issue in a paragraph at the end of a section,
probably dug up in post-study literature review.

From work in wi-fi router development, high channel utilization is often a
much bigger determinant of packet loss than either router overload or RSSI.
Hour of day is probably just a good proxy for this.

802.11 is pretty good at handling the shared medium when a single access point
can do traffic control, but multiple access points in a crowded city or office
building gets you into all kinds of problems. Usually you can get big
performance gains in large deployments just from making sure nearby access
points are on different channels.

Between the other two major factors the paper looked at:

1) The number of clients a single router can handle is mostly limited by CPU
power (and the failure mode there is typically not association request drops,
since those are usually processed pretty early in the packet pipeline, without
much queuing). So I'm not surprised that they saw very little effect of number
of associated clients with connection time.

2) RSSI is more important in low-interference environments, where the ability
to hear packets over the noise floor is a big limiter. In dense, high-
interference environments, it helps a bit in terms of being able to shout over
the noise of very distant interference sources, but for the most part a
collision is a collision even with substantial magnitude differences between
the colliding packets.

~~~
ac29
Since you work in WiFi, I was wondering if you could comment the linked chart
of SNR to Modulation rates [0]. Obviously the exactly values are going to
depend on hardware to some degree.

[0][https://dl.dropboxusercontent.com/u/8644251/Revolution%20Wi-...](https://dl.dropboxusercontent.com/u/8644251/Revolution%20Wi-
Fi%20MCS%20to%20SNR.pdf)

~~~
azernik
What exactly do you mean by comment? Explanation of what it represents?

Basic overview, with the caveat that I haven't done wifi stuff in a couple of
years, and was not doing straight up RF stuff like the hardware folks. It's
basically saying that as signal strength above the noise floor (in dB) (ie
RSSI - kind of) increases, you can get to higher bit rates. Let me know if you
want more info on what exactly is being described.

------
petra
There's a new tech called MulteFire, which is LTE over unlicensed spectrum.
Everybody could buy an access point(altough those will be more expensive for
than wifi for some time, at the very least).

One of the biggest selling points for it is a simpler faster handoff than
WIFI. Is that a big enough problem that people will be willing to buy access
points ?

------
tomsmeding
Interesting to see this researched, but I think this isn't really applicable
to most people's situation at home, where there's 1-3 access points, and most
often you just want to connect to a specific one. Then you don't really need
software telling you that connecting will be slow, because you know, but you
just want to connect.

Besides, in my experience, having a lot of wifi interference is a huge PITA
when connecting or communicating with an AP. Maybe they were not able to
include that factor in their dataset for some reason, but I think you'd find a
strong (negative) correlation.

~~~
icebraining
Probably not at home, but that's not when I want to connect faster. I use the
Fonera network when I'm in the street, and there's often three or four APs
near me, so this could be helpful. Another case is campus wifi, when I was in
college. In both of these cases, I want to connect to a specific SSID, but not
to a specific AP.

------
saycheese
Here's an MIT Technology Review article covering this reach, "Data Mining
Solves the Mystery of Your Slow Wi-Fi Connection":

[https://www.technologyreview.com/s/603414/data-mining-
solves...](https://www.technologyreview.com/s/603414/data-mining-solves-the-
mystery-of-your-slow-wi-fi-connection/)

~~~
TwoBit
That article is crap because it never says why connections are slow. Lots of
words and no substance.

~~~
saycheese
I would ask if you're missing a bit, but that would be like saying something
is "crap" and materially misrepresenting its value and substance.

------
digi_owl
Seems to me like a "problem" only in the sense that one wants to leech data
connectivity while out and about.

~~~
icebraining
There are many cases where one wants to use wifi while out and about without
"leeching". In my city alone, we have areas with paid networks, tit-for-tat
networks (Fonera) and public networks (as in paid by taxes).

~~~
macintux
Interesting, I'd never heard of Fonera. Trying to find hotspots in my area has
simply emphasized to me how dusty my laptop screen is (the closest hotspot
seems to be several hours' drive away).

~~~
icebraining
Fonera is nice in my country (and a few other European ones) because one of
our largest ISPs made a deal with them, so every client of theirs is also a
Fonera member, and therefore a hotspot I can use.

------
jbg_
Could someone with permission to do so please fix the title? It either needs
the question mark removed, or to be written as a question (e.g. "Why Does It
Take So Long…")

~~~
jonsen
Only the paper authors can fix their PDF, I guess.

~~~
jbg_
Is there some policy on HN that the title of the submission always matches the
title of the thing that is linked to? I'm sure I've seen situations where it
doesn't.

~~~
Spare_account
The verbatim title is preferred:

"... _please use the original title, unless it is misleading or linkbait._ "

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
pbhjpbhj
Though it's more like the verbatim title is preferred except when it isn't and
there's a good chance an editor will change it to the verbatim title, or
something worse, or maybe better and an equally good chance people will
complain and it'll be changed [back].

I've always said we should have the verbatim title and an editorialised one
and let people choose in config which one[s] they want.

Current system mostly works though.

