
IPv4 route lookup on Linux - luu
https://vincent.bernat.im/en/blog/2017-ipv4-route-lookup-linux
======
erikb
We seriously need more articles like this. As someone who would really like to
understand more about the kernel source code, but without prior kernel
knowledge (no classes in that direction) and barely any C knowledge the kernel
is not the easiest part to read oneself into. Articles like this one help a
lot, since they already provide an abstract model of what the code does and
intends to do and why.

~~~
majke
Technical articles get downvoted on HN a lot these days. It became super hard
to promote technical stuff here.

Just look at this article - it's successful (on the front page) but it's
hovering around 20th place, without any chances of raise:
[http://hnrankings.info/14631808/](http://hnrankings.info/14631808/)

~~~
dom0
> articles get downvoted

You can't downvote articles. (You can flag them, but flagging fine submissions
is obviously abusive).

~~~
erikb
flagging is basically downvoting

------
betaby
Unfortunately IPv6 stack doesn't match IPv4 one on features and optimizations.
IPv6 is still using regular radix tree and a route cache.

~~~
bogomipz
Do you have any insight into why this is?

~~~
vbernat
Author here. The weaknesses are known (see for example
[http://www.netdevconf.org/1.1/proceedings/slides/kubecek-
ipv...](http://www.netdevconf.org/1.1/proceedings/slides/kubecek-ipv6-route-
lookup-performance-scaling.pdf)). IPv6 doesn't have the same features as IPv4
(notably, IPv6 can do a route lookup using both source and destination
addresses). Therefore, it's not just a matter of reusing what is done with
IPv4.

~~~
bogomipz
I guess you are referring to scoped addresses then in ipV6? That would makes
sense. Thanks for the interesting link.

------
comice
The removal of the route cache caused a notable change in behaviour for equal
cost routes.

When there was a cache, you got a kind of session stickiness for free. Which
meant you could do some fairly good load balancing at the network layer by
advertising multiple routes to the same IP locally. The router would choose
one of the routes when it saw the first packet, cache the decision and send
the rest of the packets to it (the cache key was configurable and could
include src address, amongst other things).

Now you just get round robin between all the advertised paths - no good for
tcp load balancing.

~~~
vbernat
This behaviour has been fixed in 4.4. It has been mostly unnoticed until then!
See
[https://www.reddit.com/r/networking/comments/4q3wmq/ipv4_flo...](https://www.reddit.com/r/networking/comments/4q3wmq/ipv4_flow_based_ecmp_broken_in_linux_kernels_36/)

~~~
comice
oh super! thanks!

------
xiconfjs
How does this 50ns compare to commercial hardware routers like Brocade, which
have dedicated memory (TCAM) for their routing table, which should be extra
fast memory - as e.x. Brocade claims?!

~~~
vbernat
Hardware usually scales at line rate without any issue, something Linux
cannot.

From a pure latency point of view, a medium-range router (using the latest
Broadcom Tomahawk chipset) takes around 500ns to route a packet (note that
such a chipset cannot handle a full view). It doesn't seem unlikely that Linux
can do the same job in certain configuration in about 100ns (notably, no
netfilter rules). However, this takes a whole core to do that. If you have 24
of them, you get 12Gbps of transfer rate. With an hardware platform, packets
are pipelined, so with some relatively small buffers, you can still achieve
several hundreds of Gbps of traffic (per chip), something Linux cannot do in
software.

I can't say about TCAM performance alone. Are there super fast or are there
able to perform parallel lookup?

~~~
yusyusyus
> I can't say about TCAM performance alone. Are there super fast or are there
> able to perform parallel lookup?

I think this doc[0] helps to understand the parallelism/performance from a
TCAM perspective.

[0] [https://www.renesas.com/pt-
br/doc/products/memory/r10an0013e...](https://www.renesas.com/pt-
br/doc/products/memory/r10an0013eu0100-memory.pdf)

------
ladzoppelin
Why is this image [https://d1g3mdmxf8zbo9.cloudfront.net/images/linux/lpc-
trie-...](https://d1g3mdmxf8zbo9.cloudfront.net/images/linux/lpc-trie-
struct.svg) jumbled in Firefox beta?

~~~
vbernat
Author here. I am also using Firefox (on Linux) and the image is displayed
fine. I don't know exactly what you mean by jumbled, but I don't hard-code
fonts in the SVG: it uses the default sans-serif font of your system. May this
explain what you get? I am usually careful with the anchor points and leave
some additional space to ensure it is displayed correctly with whatever font
is used.

A solution would be to convert the text to paths, but I didn't find any tool
that would do that efficiently. With Inkscape, there is a serious size bump
when doing that.

~~~
dom0
> With Inkscape, there is a serious size bump when doing that.

Yes, you can't really avoid that when converting text to path objects, there
is no way to do something similar to PDF's subset embedding AFAIK.

I've tried to get SVG diagrams with text to work, but they just always fall
apart somewhere. High-resolution PNG or inflated SVG seem the only viable
methods. svgz (svg+gz) helps with the size inflation, but is not supported by
Firefox when accessing local files.

It's kinda a shame that web browsers can't display single-page PDF documents
in-line, as images.

[http://i.imgur.com/x0TuCMC.png](http://i.imgur.com/x0TuCMC.png)

~~~
vbernat
SVG can be served gzipped transparently. Ideally, when converting text to
path, each letter (or group of letters) should be put in a group which can be
referred with <use>.

Thanks for the example. Is the text in other figures OK?

~~~
dom0
No, they're all affected like that which is also my general observation. I'm
not sure what causes it (possibly differing default font sizes, zooms or font
selection). Chrome seems to generally use less defaults from the platform than
Firefox, as is the case here as well — Chrome renders the figures fine. It
usually does, though it can't help with missing fonts.

> Ideally, when converting text to path, each letter (or group of letters)
> should be put in a group which can be referred with <use>.

Oho, that's a very interesting idea! Might be possible as a post-processing
step, depending on how exactly Inkscape converts runs of glyphs (I never
looked inside one of those 1.5 MB SVGs...).

> SVG can be served gzipped transparently.

Yes; in my case the figures are mostly in docs, so checked-in size matters;
colleagues have to pull changes and pushing xx MB for figures (and frequently
updating them as well — something that git does not handle well) is not nice.
For open source, it ends up in the source distribution. I think in one of my
projects it was inflated by xxx % due to that :D

Some hard numbers: A not-that-complex figure showing a workflow. The .vsd is
145 kB, the PNG is—after optipng—about 1 MB and the SVG was iirc around 2.5 MB
(Visio->SVG->Inkscape on the same machine otherwise results are bad->Convert
To Paths->SVG), not gzipped since that kinda breaks local editing (with
Firefox). While the high-resolution PNG looks good in print, scaling it to
displays ain't that great and text doesn't look sharp. SVG needs a separate
conversion for print (SVG[->PS]->PDF).

~~~
vbernat
> Oho, that's a very interesting idea! Might be possible as a post-processing
> step, depending on how exactly Inkscape converts runs of glyphs (I never
> looked inside one of those 1.5 MB SVGs...).

I would hope that something like svgo could do that. BTW, due to an error on
my side, the SVG images didn't get processed by SVGO. I just fixed that, maybe
this would help.

~~~
dom0
> I just fixed that, maybe this would help.

No luck

> SVGO

Interesting tool, didn't know about it. It certainly works and reduces the
size of a SVG with path-based text by ~50 %. Further, gzip'ability is
increased as well. I tested this with a 1.4 MB SVG, SVGO shrunk it to 765 kB,
and gzip compresses it further to 100 kB. The original SVG is only compressed
to about 260 kB, thus gzip compression is enhanced by ~20 % by SVGO.

On the other hand, neither Gwenview nor Karbon display the SVGO'd figure as it
was. Firefox and Chrome seem to render it fine. rsvg-convert results in a
large and slow PDF.

At this point the "technically best" option to me seems to be

(1) create an SVG, convert text to paths on the same machine, use SVGO and
gzip (and just use python -m http.server or sphinx-livehtml or similar for
local editing) for HTML and

(2) create a PDF completely separate from the SVG processing chain for print,
since no SVG to PDF conversion is satisfactory (file size, preservation of
contents, rendering / printing time—stuff that is slowly rendered is slow to
print as well!).

... and then tell the rest of the content pipeline that figures are now in two
formats. Ugh.

Anyway, my illustration troubles are a bit off-topic to your great article :)

~~~
vbernat
This problem was bothering me. After digging more, I have found this bug:
[https://bugzilla.mozilla.org/show_bug.cgi?id=935056](https://bugzilla.mozilla.org/show_bug.cgi?id=935056).

