
What Happens Inside a 100-hop IPv6 Wireless Mesh Network? - adunk
http://www.thingsquare.com/blog/articles/100-hops-ipv6-mesh/
======
hedora
It seems as though we are getting closer and closer to having feasible rooftop
mesh networks, where one house gets a fiber drop, and splits the bandwitdth
with N neighbors.

I'd love to know what a feasible value is for N, assuming 10's of MBit average
throughput per house (so a few 4K netflix streams at night, once that's a
common thing), and sub 100ms RTTs (absymal, because this is for early adopters
anyway).

[edit: put another way, what's the upper bound where this stops working as a
broadband solution?]

~~~
paulddraper
We have this, basically.

There are several wireless internet providers available in my area (e.g.
Vivint Wireless). One person has fiber and rooftop dishes relay it.

------
JosephRedfern
Would be interesting to know about things like latency, packet loss, maximum
throughput etc (and how these properties change over larger distances).

Also, how in an "in the wild" setup works with 802.15.4e, especially with
moving nodes -- how would routing table normally be determined? Signal
strength alone, or is it possible to discover how central a neighboring node
is to the network and use that to inform the choice too? Really interesting
subject.

~~~
bipson
Moving nodes are summarized under the term mobility, and in RPL it is usually
handled by reconstruction from the top or re-discovery of parents by the node
itself. Timers play an important role, and mobility is a known weak spot of
RPL.

The choice of preferred parent is governed by the objective function OF, which
could be anything, but usually is the rank of the node (relative to the DODAG
root).

If you are really interested, RPL is an open IETF internet standard, so are
the underlying protocols and principles. Down the rabbit hole it is.

~~~
fredrik4943
Fredrik, Thingsquare CTO here.

We have a bit more detailed information about how we use mesh and RPL here:
[http://www.thingsquare.com/docs/mesh/](http://www.thingsquare.com/docs/mesh/)

Regarding mobility, we're using a number of tricks and custom solutions to
discover and quickly join new networks/parents - in vanilla RPL networks it
sometimes takes a long time to drop a bad or distant parent, causing long
delays for mobile devices.

------
baq
an interesting description of a few screenshots, but omits all details
unfortunately.

~~~
snarf21
I agree, it was interesting but lacking. This seemed to be more about how many
hops it could go through and the tools they use to test it. The hard part of
mesh is finding a good path but this removed that whole piece.

~~~
fredrik4943
Fredrik, Thingsquare CTO here. Glad to see an interest in this article, and
happy to prepare more detailed information about how the system works and
performs. There's of course tons of information that we're happy to discuss :)

Anything else in particular that you would like to know more about?

~~~
sathackr
Thanks for stopping by to comment, I have a good bit of respect for C-level
executives that take the time and risk to comment in public forums.

Let me color the remainder of my comment by saying that my experience is in
the 802.11/Wifi land and may not directly apply to the systems Thinksquare
uses.

I have been working in (802.11) wireless for ~20 years and have seen pretty
much every implementation of mesh fail spectacularly. One implementation I got
to see fail up close was Kissimmee/St-Cloud's Tropos system.

Some of the largest obstacles were outlined in the linked page -- how to
figure out what's going on. It seems Thingsquare is not forming the mesh on
the 802.11 networks, but on a separate low power network. Not sure all of the
intimate details of this 802.15.4e setup but I imagine it has some of the
issues of 802.11

One big problem on 802.11 networks is that each radio is half duplex, and once
you get 2-3 hops out, speed an latency are severely impacted. The variability
in latency caused by retransmits in a noisy environment wreaks havoc on TCP
window scaling.

I've always heard about this self-healing, route-around-problem-areas magic,
but have never seen it work well in practice. You generally don't know if a
link is bad until you try to use it, particularly if you aren't actively
testing the links in order to reduce your power consumption. But then actively
testing the links can itself generate interference with other links.

Some of these have been "solved" by using devices with dual radios, one for
the 'mesh' network and one for user access. Maybe someone finally has some
magic that works, but I've seen many municipalities and corporations promised
the moon, spent millions on networks that then fall flat on their face.

------
mcbits
It feels like this should be the first part in a series.

------
rocky1138
What is the discovery process for these things like? What I mean is: if I buy
100 of them, put them in a room and turn them on, how do they work together to
organize IP addresses?

Humans normally do this by "hi, I'm John" and "hi John, I'm Bob."

Do these things broadcast a MAC address set at the factory?

I ask because I think that when mesh networks become really interesting me is
when they are entirely decentralised, as in finding a way for these things to
have a universally unique MAC the second they turn on for the first time
without some sort of authoritative server.

~~~
adunk
Each device does have a unique MAC address. But the protocols are designed to
work even if that MAC address happens to not be unique. The IPv6 address
assignment procedure first attempts to claim an IPv6-address based on its MAC
address, but before finally assigning the address, the device will test if the
address is already taken. If another device in the same network happens to
have the same MAC address, that device will have claimed the same IPv6 address
already and that device will defend its address. The new device will then
choose a new IPv6 address to avoid collisions.

------
shadowashe
If one wanted to go about simulating one of these in a homelab what sort of
hardware would your recommend that is price accessible? even if a smaller mesh
like 20-30 nodes.

~~~
Taniwha
10 years ago I built a streetlight system, we implemented and tested the
protocol by build a simple simulator in a qt app and embedding the same
software stack inside it, it allowed us to create models that modelled x/y
locations, distance signal drop off, random noise, exit points etc

Large randomly placed simulations only started to slow with ~15k nodes ... In
other words it's not hard

~~~
solotronics
awesome. it's probably not a reasonable request but if any of this was open
sourced it would be a great asset to humanity!

~~~
Taniwha
Probably not that useful in that it was an old QT, no longer compiles.

More importantly I have a tiny OS that doesn't have real threads (code had to
run on 8k), I port it all over it simply consists of a timer queue, and one
stack - threadlets are a queue entry with a data pointer - running 14k
instances of this tiny OS in one simulator is pretty easy (you still just have
one timer queue ....) - so you can see it's all pretty dependant on the uOS

The only hard part is making packet delivery not O(N __2)

------
eadmund
I'm very excited to see mesh networks become a thing; there's a lot of
potential there to replace ISPs someday. Gonna be a long time of course — a
100-node network is nothing like The Internet write large, but it's still good
to see progress.

Little funny that streetlights have a network, but I guess it makes finding &
fixing faults easier.

I really love that this used animated images instead of JavaScript — works
with any browser that way.

------
jeff6845
I'd be concerned if the data from the 100 wlan nodes in very close proximity
becomes too skewed from real world implementations. And whether that skewed
data is valid enough to build a simulation around.

~~~
adunk
(Adam, Thingsquare CEO here)

Yes, the 100 nodes in close proximity is quite a bit different from 100 nodes
spread out across a larger geographical area. Even if we can play tricks with
routing tables to make the testbed act more like a real deployment, there is
much more congestion in the testbed because all can hear each other. Also,
there are many wireless effects in play in a real-world situation, such as the
capture effect, that has several implications for how the distribution and
routing protocols operate.

By default, the simulation does not try to replicate the testbed setup, but
rather the real-world deployments. But we can tweak the simulation to behave
more like the testbed if we want to inspect behavior we see in the testbed
that we don't see in real-world deployments.

(As it happens, this was pretty much the topic of the PhD thesis of
Thingsquare CTO Fredrik a few years back: [http://uu.diva-
portal.org/smash/get/diva2:447343/FULLTEXT01....](http://uu.diva-
portal.org/smash/get/diva2:447343/FULLTEXT01.pdf))

------
baybal2
is there a hard limit to amount of nodes? What is the routing protocol?

~~~
adunk
(Adam, Thingsquare CEO here)

The routing protocol is called RPL (pronounced "ripple") and is designed to
create a directed acyclic graph to route IPv6 packets in networks where the
nodes are severely memory-constrained. It is defined by RFC6550. There is more
information in
[http://www.thingsquare.com/docs/mesh/](http://www.thingsquare.com/docs/mesh/)
and [https://tools.ietf.org/html/rfc6550](https://tools.ietf.org/html/rfc6550)

There is no hard limit to the number of nodes in a RPL network. The protocol
is defined so that every node can reach the root of the network, but requires
additional work to reach nodes inside the network. The mode of operation that
we are using in the Thingsquare system is called storing mode and requires all
nodes on a path between two nodes to maintain information about the route.
This does not scale to large number of nodes. But this is needed only when
setting up a TLS connection, to exchange security secrets, which is normally
only done once per node. When the TLS connection goes down, the route is torn
down, which allows for another route to take its place.

