tl;dr: can't tell the difference between dropped messages due to wrong authentication check (i.e. wrong password), or dropped messages due to bad connection.
I believe what happens is AP sends Nonce to the STA, STA uses the PSK to send Message 2 back to the AP. It will receive a '802.11 Ack' but then no Message 3 of the 4-way handshake will ever come from the AP.
Good drivers see this and flag an invalid password warning back to the user within milliseconds. But bad drivers... sure, they will just keep assuming magic dust got in the way and if they just keep retrying the handshake enough maybe they will see a Message 3.
I'm not sure why from a security hardening perspective it's better not to specify the AP should send '802.11 Disassoc' immediately after receiving an invalid Message 2 with a proper error code so that the driver can message the UI that the password is wrong instantly.
Good drivers indeed tell you whether the message arrived or not. But it's up to the client to decide what to do with that information. And again, it's just a heuristic. I've read and messed with the code of four different Wi-Fi clients, and none of them attempt to detect a bad password this way. Most simply report an error after trying to retransmit message 2 multiple times (e.g. if wpa_supplicant got message 1 from the AP, but didn't get a reply to message 2, it warns that maybe the password was wrong).
> Based on the measurement analysis, we develop a machine learning based AP selection strategy that can significantly improve WiFi connection set-up performance, against the conventional strategy purely based on signal strength, by reducing the connection set-up failures from 33% to 3.6% and reducing 80% time costs of the connection set-up processes by more than 10 times.
They don't actually demonstrate this and don't know that it does improve WiFi connection performance, because they don't benchmark it on any real-world devices. It's pure extrapolation from the algorithm's predictive performance on the dataset. (And there are some suspicious inputs to the algorithm like time of day.) When they write on pg9 about where they are pulling these numbers from:
> To evaluate, we first divide our connection log dataset into two parts, each subset contains 50% of the overall data. ...This fresh dataset ensures that we can accurately evaluate the performance of our algorithm if we deploy our algorithm in the wild, where many of the APs will not be seen by the mobile devices before.
They're just wrong. Splitting like this only ensures good out-of-sample performance when drawing from the same distribution, but when you use the algorithm to make choices, the distribution is different. Correlation!=causation; it's no more guaranteed to help than data mining hospital records and finding that antibiotics apparently are killing patients and so hospitals should stop using them.
We're breaking human level accuracy in vision, speech, text and behavior - on specific tasks, not in general yet. In the last 3 years neural nets have become creative - now they can create paintings (neural style transfer), images (GANs), sounds (WaveNet), text (seq2seq, translation, image captioning) and gameplay (Atari, AlphaGo).
All these are complex forms of creativity, as opposed to simple classifiers. So we have recent progress, there is no period of lacking results behind us. That's why we're still on the rise.
I remember the previous AI peaks in the 80s and 90s as well the neural net and fuzzy logic euphorias. The problem back then was that the results were not that exciting. Now we have some really impressive applications searching vast datasets and recognizing useful features. However, I notice everyone is trying to apply the same algorithms to problems not well suited for that type of approach.
(Or stock trading, if you believe in that sort of thing.)
And was there not a similar security snafu involving iPhones broadcasting past ssids every time they tried to connect to a access point?
I also remember an installation at the Datenspuren in Dresden with a monitor showing all of these SSIDs it intercepted with people walking past and being astounded how the device knew their home network name^^
> we develop a machine learning based AP selection strategy that can
> significantly improve WiFi connection set-up performance, against the
> conventional strategy purely based on signal strength, by reducing the
> connection set-up failures from 33% to 3.6% and reducing 80% time costs of the
> connection set-up processes by more than 10 times.
> The correlation analysis finds that though the signal strength is important,
> knowing the AP model and mobile device model has great help to predict the
> connection set-up time cost.
> [Conclusion] "The correlation analysis finds that though the signal strength is important, knowing the AP model and mobile device model has great help to predict the connection set-up time cost. To the best of our knowledge, we are the first to add AP model and mobile device model as features which greatly increases the accuracy to predict the connection set-up time cost."
(NOTE: Please do not use block quote formatting, it's unreadable on mobile.)
> Or this.
And it is not so much a quote as a way to show code segments.
Maybe I'm missing something, but their actual machine learning model seems to address a different problem:
The final features we choose to train: the connection
time cost includes hour of day, RSSI, mobile device
model, AP model, Encrypted
Not sure I like the approach that much, seems an AP running a recent OpenWRT and very reliable would be penalized for having buggy factory firmware.
At home this results in duplicate IP addresses when the kid with the iPhone gets home after being away and meanwhile another device has started using that IP address. This tends to bork up the entire network on my cheap Netgear router and I usually have to reset it at that point.
I presume the router periodically issues "who has 192.168.1.10" or whatever and upon getting responses from two different MAC address just gives up.
From work in wi-fi router development, high channel utilization is often a much bigger determinant of packet loss than either router overload or RSSI. Hour of day is probably just a good proxy for this.
802.11 is pretty good at handling the shared medium when a single access point can do traffic control, but multiple access points in a crowded city or office building gets you into all kinds of problems. Usually you can get big performance gains in large deployments just from making sure nearby access points are on different channels.
Between the other two major factors the paper looked at:
1) The number of clients a single router can handle is mostly limited by CPU power (and the failure mode there is typically not association request drops, since those are usually processed pretty early in the packet pipeline, without much queuing). So I'm not surprised that they saw very little effect of number of associated clients with connection time.
2) RSSI is more important in low-interference environments, where the ability to hear packets over the noise floor is a big limiter. In dense, high-interference environments, it helps a bit in terms of being able to shout over the noise of very distant interference sources, but for the most part a collision is a collision even with substantial magnitude differences between the colliding packets.
Basic overview, with the caveat that I haven't done wifi stuff in a couple of years, and was not doing straight up RF stuff like the hardware folks. It's basically saying that as signal strength above the noise floor (in dB) (ie RSSI - kind of) increases, you can get to higher bit rates. Let me know if you want more info on what exactly is being described.
One of the biggest selling points for it is a simpler faster handoff than WIFI. Is that a big enough problem that people will be willing to buy access points ?
Besides, in my experience, having a lot of wifi interference is a huge PITA when connecting or communicating with an AP. Maybe they were not able to include that factor in their dataset for some reason, but I think you'd find a strong (negative) correlation.
"... please use the original title, unless it is misleading or linkbait."
I've always said we should have the verbatim title and an editorialised one and let people choose in config which one[s] they want.
Current system mostly works though.