
Avery’s laws of Wi-Fi reliability - colinprince
http://apenwarr.ca/log/?m=201707
======
captainmuon
This is a good insight, especially the diagram at the top. But the quesion for
me is, why is a router broken say 10% of the time?

I think the problem is that WiFi is totally undebuggable. When it doesn't
work, you don't see at which stage it doesn't work.

\- Do the frames in the air get corrupted or dropped? What portion of frames
makes it through?

\- I see the SSID, so it can transmit _something_. Why can't I connect and get
at least a slow connection?

\- What is it doing now, sending, waiting for a response, did it negotiate
encryption yet?

\- Is the low-level connection OK, and we have a problem with DHCP? This you
can detect, but the OS obscures it from you.

\- There is no easy way to see signal strength. You get a more or less
meaningful number reported from the card, but these nice diagrams showing
signal strength over frequency? Mostly fake, this is just the reported signal
strength of each channel, smoothed out to look like a continuous scan. If we
had a continuous scan, we could see individual sources of interference better.

\- Why does the connection suddenly drop out or get slow? Is the problem on my
side of the router, or on the internet side? Are buffers full/empty? Is some
object standing in the way, or is is a weather phenomenon? You can ping your
router or modem, and some routers have nice tools to let you see graphs of
upstream and downstream. But there is no coherent debugging solution.

~~~
nisa
> why is a router broken say 10% of the time?

My guess would be that WiFi drivers are kind of quirky - if you follow
OpenWRT/LEDE development there a lot of issues that appear only on heavy load
or under certain situations - causes range from driver bugs / chip bugs /
firmware bugs - most of the stuff is NDA or closed source. You've got the
problem on both sides - especially the situation on Linux is still dire for
lot's of chipsets. Even my years old intel card in my notebook does not really
work well with Linux und the current kernel/firmware. I blame the vendors.

> Do the frames in the air get corrupted or dropped? What portion of frames
> makes it through?

That also depends on the rate and encoding that the frames are sent. Then
there is stuff like frame aggretation that sometimes work sometimes not.

> I see the SSID, so it can transmit something. Why can't I connect and get at
> least a slow connection?

SSID is send with the lowest rate (1Mbit) and with the most solid encoding -
usally the transmit power and antenna sensitivity is also the strongest on the
lowest rate so the SSID can be seen quite far away - often so far away that
your clients wifi is able to receive the SSID beacons but unable to send
something back that reaches the router.

> What is it doing now, sending, waiting for a response, did it negotiate
> encryption yet?

Wrong crypto results in no association - difficult to tell apart from low
signal.

> Is the low-level connection OK, and we have a problem with DHCP? This you
> can detect, but the OS obscures it from you.

Yes. You can look at network-manager logs from Linux but sometimes it's
difficult to tell - especially on Windows.

> There is no easy way to see signal strength.

There is also noise, antenna properties and everything depends on the rate and
encoding used - it's not exactly easy.

> Why does the connection suddenly drop out or get slow?

Add also the rate control algorithms, the amount of clients, for certain 5GHz
channels you have to obey radar then there is interference with bluetooth and
microwaves. Also all that funky physics stuff with reflection and antenna
positioning where moving a little bit might be just enough to cause problems
for you.

~~~
captainmuon
> SSID is send with the lowest rate (1Mbit) and with the most solid encoding -
> usally the transmit power and antenna sensitivity is also the strongest on
> the lowest rate so the SSID can be seen quite far away - often so far away
> that your clients wifi is able to receive the SSID beacons but unable to
> send something back that reaches the router.

Thats what I mean. It would be great if you could click "debug", and it says
"Receiving beacons from router, but router is not responding to frames".

Maybe it would even be smart enough to do a back-of-the-envelope calculation
if details about the router are known, e.g. router transmits with X watt, the
recieved signal strength is Y. I can transmit with Z watt, so the router
should / should not be able to hear me.

Then it could suggest "You have to move closer to the router, use a better
antenna, or increase power to the antenna (reduces battery life)". Nowadays
you don't get anything, not even an error message. It silently stops
connecting on most platforms, and the WiFi symbol slightly changes.

------
StillBored
Personally I think a big problem are all these 3 in 1 type modem/router/wifi
solutions and the inability to debug which part is causing problems. These
devices a less expensive but seem to always compromise some functionality.
What that is, is rarely evident from the box/review/manual.

For a long time now, I've always recommend to people to buy standalone
cable/DSL modems and figure out how to get the analog signal strength/error
correction/retry counters from them (and return it if you can't get them).
Then spend the time replacing connectors in your house and getting the modem
as close to the curb as possible.

Then pick a good wifi solution, and tweak that until its as clean as possible.
Wifi and modem standards evolve at different paces, and its silly to upgrade
your modem or router every time you want a newer wifi standard, or for that
matter change your wifi to get a newer cable or DSL modem.

~~~
legulere
Or just buy something of quality like AVM Fritz!Box, which has quite good
diagnose utilities on board

~~~
voltagex_
It depends. The Fritzboxes that were sold in Australia had an ADSL chipset in
them that wasn't very good at handling noisy lines (you're very, very lucky if
yours isn't). Your best bet is/was to buy a device with a particular Broadcom
chipset in it that _was_ good at handling noise. The difference it made to my
connection speed was from syncing at 4 megabit downstream to syncing at 10
megabit downstream. This steadily degraded to 8 megabit over the next few
years, but what are you gonna do?

------
peterburkimsher
1\. Replacing your router (or firmware) almost always fixes your problem.

2\. Adding an additional router almost always makes things worse.

Solution: Switch off nodes when they stop working.

My house has this problem. I live on the 4th floor, next to the main VDSL
router. The kitchen & living room on the 2nd floor can't get a strong WiFi
signal. After some thought about WiFi repeaters, I decided to buy a long
Ethernet wire and run it down the stairs. I trip over it sometimes, but it's
basically reliable. The downstairs router runs dd-wrt, and works fine 99% of
the time (I've had to reboot it once in the past year).

The upstairs VDSL router is owned by the telecom company (Chunghwa), and they
don't give me administrator access. I can't change the firmware, and the box
is frankly awful. About 6 months ago we complained repeatedly, asking them to
replace the box. They sent tech guys, who tested the wire, and said that
everything is fine. Indeed, it works about 70% of the time. But very often I
come home from work, and I have to reboot that router before I can get online.

My solution is to buy a Edimax WiFi power socket and a Raspberry Pi, which
checks to see if it's online. When the network goes down for more than a
minute, it'll automatically tell the WiFi power socket to turn the router off
and on again.

I know that power-cycling is bad for the router, but I don't see any other
solution given the software constraints. I'm also surprised that there are no
companies making devices for the purpose of power-cycling routers.

~~~
danieldk
_I decided to buy a long Ethernet wire and run it down the stairs. I trip over
it sometimes,_

Why not use Powerline Ethernet? In most houses, it's typically fast and
reliable and requires no extra wires. E.g. the data rate between my desk and
router is currently 784 Mbit/s.

~~~
ht85
Powerline is a crapshoot, at least as much as Wifi. I installed several
different models a few years back, for me and some friends/family, and in
every case ended up finding another solution as they would just add another
layer of unreliability.

Cheap ones suffer from the same problems as cheap routers, randomly hanging
and requiring a physical plug / unplug when the room gets hot, or whatever
other interferences degrade their performance over time.

Expensive ones are still not 100% reliable, and most of the time will cost
more than a well done ethernet installation.

~~~
danieldk
Never had any problems in three different houses. Two different brands (we use
AVM Fritz!PowerLine now). I reliably get high bandwidth and low latency. A
gave an old set of adapters to my parents in law who live in an older house
(from the 1930s) and it works well for them as well.

Of course, as a sibling said, it'll depend a lot on the wiring and/or other
connected devices. So I guess it's best to try with a borrowed pair of
adapters first.

------
LobsterY
This is a great illustration of why the naive belief that a microservice
architecture will solve all the things is so ill-conceived. When developers
fail to properly assess how deeply coupled microservices are to their
dependencies, the result is cascading failure scenarios that take down the
whole ecosystem.

Perhaps we should aim for 'services' architecture that are just right. When
two concerns are deeply coupled/dependent on one another, the ecosystem will
likely be more robust if these are left together in a single service. In
addition, this adheres to good old fashioned software design principles of
striving for good coupling/cohesion relationships.

~~~
bsder
> <insert buzzword> will solve all the things

This is the failure _irrespective_ of what <insert buzzword> is.

However, in my opinion, microservices are about decoupling rather than
reliability.

In addition, proper use of microservices means that people producing for and
consuming from a microservice now _MUST_ plan for how to detect when it is
down and what to do when that microservice is offline.

------
jv22222
After 20 years of buyin (15+ routers) and searching for a wifi router that
"just works"TM in large house scenario I finally found the ASUS AC5300.

[https://www.amazon.com/gp/product/B0167HG1V6/](https://www.amazon.com/gp/product/B0167HG1V6/)

It has 3 radios on board 2x5ghz and 1x2.4ghz.

It has a mode that allows you to use a single ssid across all 3 radios. It
uses QOS to dynamically switch each devices to the best radio for at any
moment.

I purchased the same device for a small retail business and it works amazingly
well in that setting too.

Highly recommended.

~~~
Foxhuls
I've really been enjoying my Ubiquiti setup that I switched to about 5 months
ago. The internet has still gone down but that is due to my ISP and not the
hardware.

~~~
tombrossman
How do you feel about the security of the products? 20 year old PHP versions
and the company itself was scammed out of tens of millions of dollars. I see
good reviews from bloggers on the hardware, but the company doesn't seem to
have made security a priority.

One recent example, there are plenty more out there:
[https://www.theregister.co.uk/2017/03/16/ubiquiti_networking...](https://www.theregister.co.uk/2017/03/16/ubiquiti_networking_php_hole/)

~~~
cpach
That’s worrying indeed. But is the track record of Asus/Linksys/Buffalo/etc
better? For decent security I guess the best option is to run pfSense or
similar, but that seems to require more expensive hardware.

------
capitalsigma
Only tangentially related -- I swear, nothing makes me feel more technically
incompetent than debugging WiFi issues. I do moderately low level systems
programming all day long, but I can't get the damn internet to reach from my
living room to my desktop computer?

~~~
Qub3d
I mean, the fact that WiFi works _at all_ is amazing. We're encoding
information--lots of it, too--and sending it through light forms of energy
ripples through the air, which we can then decode.

All of this happens at a sub-millisecond rate. Technology is amazing, and we
can sometimes forget that [0]. I wouldn't treat yourself harshly because you
don't fully understand it.

[0]:[https://youtu.be/zbCoe3vIskA?t=40s](https://youtu.be/zbCoe3vIskA?t=40s)

~~~
scrollaway
That was a really wholesome comment :) As someone who feels a lot like the GP,
thank you and you're absolutely right.

------
amiga-workbench
I run a Microtik Routerboard and a few Ubiquiti Unifi AP's. Never had a peep
out of it with over a years uptime now.

The problem seems to be shitty consumer grade hardware (isn't it always), I'd
advise upgrading to entry-level business grade gear, it's much more reliable.

~~~
jdblair
+1 for the MikroTik routerboard. My 2011UiAS-2HnD has been running great for 2
years.

Caveat: the configuration UI complex! It's easy to set up a basic
configuration. More complex features (vpn, port forwarding, custom firewall)
are arcane, in the same way configuring a Cisco router is arcane. But those
complex features are available!

~~~
amiga-workbench
Yep, the quick setup will handle a standard residential configuration for you,
but if you want failover or load balancing things get more complicated.

------
untangle
I don't know who this analysis is aimed at but I would advise caution in
trying to apply it.

My main criticisms of the article are that it: (1) ignores modern wi-fi mesh
technology; (2) oversimplifies "solutions" to the point of false
equivalencies; (3) ignores performance and convenience concerns; (4) uses
vague language ("it works"); and (5) treats "wi-fi" as a black box that
operates in a vacuum and is not amenable to tuning (channels, power levels,
freq's, etc.).

Ironically, the author's "buddy wi-fi" proposal embodies a subset of the
capabilities of a current consumer mesh setup such as the Amplifi product that
I use in my 3-story home.

~~~
brians
What's true about modern meshes that obviates the conditional probability of
success? This'll be news to me!

~~~
exelius
Modern meshes still often have a choke point. Kill enough nodes at the right
point in the mesh and the reliability degrades quickly as the backhaul links
get oversubscribed. At least that's what I think OP meant :)

~~~
lstamour
Intriguingly, I think my Google OnHub devices use spanning tree to find
Internet across the mesh. I actually wired up both rather than use mesh
networks on their own, though, and a bug in my Cisco wired switch's use of
spanning tree was what caused problems -- despite having the cables connected,
and ethernet running just fine between them, the spanning tree figured
transmitting over wireless from one OnHub to the other would be faster than
the ethernet switch between them. (The fix was to disable broken, manual
spanning tree settings on the Cisco switch.)

This indicates to me that if you had a dozen Google Wi-Fi devices and a few of
them had ethernet connections, the rest would quickly find the shortest path
from your device to the AP you're connected to the nearest mesh device with
the fastest ethernet to the outside world. I'm not sure how the hops are
weighted though, presumably reliability and signal strength plays a role.

------
rickpmg
> Distributed systems are more reliable when you can get a service from one
> node OR another.

I'm confused.. why did it take the author years to come up with this? Why is
this a revelation?

------
chiph
> 90% of customers

I'm probably being pedantic here, but shouldn't this be 90% of devices? If I
have three wifi devices at home, but then I go out and buy a fourth one -- and
I find out it can't reliably connect. I've just fallen into the 10% zone. So I
order a new router - I get it set up and now my new device works. Yayy!

But now one of my older 3 devices has stopped connecting. I'm still in the 10%
failure zone, even after spending a hundred bucks on a replacement for a
router that was "mostly" working. So I keep both routers, assigning different
SSIDs so the wireless devices will only find the routers they like. Have I now
decreased my overall reliability because I'm now running in the "And" case
with it's 90x90 multiplication, not the "Or" case? Most likely.

------
agumonkey
Interesting, I just moved the ISP router back into the living room instead of
my room, and used a tiny Range Extender. The bridging makes internet very
different, latency is random, throughput will cap and then drop from time to
time.. I suppose a network engineer would probably know how to make router and
RE talk together better.

------
Mysterix
>Replacing your router:

>

> Vendor A: 10% broken

> Vendor B: 10% broken

> P(both A and B broken):

> 10% x 10% = 1%

>

>Replacing your router (or firmware) almost always fixes your problem.

The conclusion is false :

if router A is broken, router B still have 10% chance to be broken, the two
events being independant.

P(A broken | B broken) = 10%

To get the 1% effect, advice could be :

Always buy 2 routers instead of 1

~~~
Dylan16807
What exactly are you saying is wrong?

If you have to replace the router, there's a 10% chance that new router is
broken. But you only replace when the first router is broken, so it's 10% of
10%.

Read it as "a strategy of replacing when needed" rather than "replacing in all
cases for the hell of it".

~~~
Mysterix
The strategy "Buy 2 routers, and if the first one fails, then use the 2nd one"
is ok, and gives you the 1% result.

My (little) problem is the sentence "Replacing your router (or firmware)
almost always fixes your problem.", because if the first router is broken,
replacing it will only fix your problem in 10% of the cases, which is not
"almost always".

~~~
Dylan16807
You don't actually have to buy a second router upfront, so that's not a good
way to word it either.

I'm struggling to find a great way to put it. Maybe "a one-replacement backup
plan gives you a 99% chance of success"? Close but not very elegant.

"Replacing your router (or firmware) fixes the problem except for 1% of all
router buyers"?

------
Yizahi
Author forgot to add that clients also work only 90% of time (I'm looking at
you, expensive Samsung and Apple phones). So in case of Router plus AP you get
0.9 _0.9_ 0.9 = 72% reliability. Basically wifi is a mess.

------
lostboys67
with my wireless CCNA hat on I am not sure describing an extender as a router
exactly makes me that confident about the article.

"Adding an additional router almost always makes things worse. " Not if you
know what your doing :-) if it said adding an extender I would have agreed.

------
_pmf_
What's the best COTS WiFi module for as-generic-as-possible WiFi sniffing?

~~~
nisa
Anything ath9k PCIe would be my guess. Driver is open-source, little to no
closed firmware and it's battle tested and good Linux support. No 802.11ac
through but for sniffing it shouldn't matter.

------
xxxdarrenxxx
Edge cases are king in networking, at least now that humans are so
"connected".

Sometimes I'm playing a multiplayer online game and the connection lags for a
second. My network then switches over, which takes up at least more "real
time" than the actual lag (perhaps re-initialize sockets or arbitrary NAT
operations or something).

Here I want it to opt for the 1 sec dropout, because it will only take a
second, instead of often needing to reload client on a network switch. The
router's logic is "not wrong" though.

Another problem is when 2 connections are close in signal strength. Ever other
minute my network switches, coming across as pseudo-lag, because it's
noticeable.

Many "modern" webapps are designed to compensate for this (also for general
offline usage). A "modern" web app does not actually need persistent stable
connection to function. It either caches or just needs initial data
(variables) and does all the calculating logic client-side.

I like this, because it's more on similar grounds with how humans have/are
working/communicating together in real life.

This has more to do with conceptual architecture, and not tech though. Many
off these things can be configured right as it stands, but routers default
setting is: "stupid". This offcourse has it's reasons.

Wifi also often gets confused if other wifi routers are nearby, or more
relevant, if say 10 devices operate concurrently on the same local wifi. This
is handicapped from the get go, because it has to filter the right packages
out off the air. A wired connection is basically absolute about this. Like the
article has stated, LTE seems more stable, because you have 100's off
"routers" ur phone can choose from. Consumers don't want to buy 4 routers more
for around the house to ensure optimal redundancy and stability.

Best would be to not even hook nodes directly to a main consumer router, but a
switch. If ur main acces-point fails on a software level ( or hardware for
that matter), everything down the hierarchy is irrelevant, even though ur
"landline" is working just fine.

I also wish routers were less static from a consumers perspective. Ie. if i'm
doing stuff on the net, I want my router to send me a msg saying, "hey for the
past 20 minutes, you seem to have lag spikes, shall I switch to node x for
you? _prompt_ yes/no "

There's not much done to make routers user friendly (isp has it's own
personal-gain-y reasons for this). These things should be more integrated to
the user. Again there are good reasons like security to not have a hard bound
to your "main machine", just saying that it irks me.

