
Traceroute Lies: A Typical Misinterpretation Of Output - jsnell
http://movingpackets.net/2017/10/06/misinterpreting-traceroute/
======
jlgaddis
As a network engineer who gets similar reports fairly often, I recommend
watching "Tutorial: Troubleshooting with Traceroute" (a.k.a. "A Practical
Guide to (Correctly) Troubleshooting with Traceroute"), as recorded at NANOG
62 (there's also an older presentation from NANOG 47, IIRC). It covers the
same stuff John's article does and more. A PDF of the slides [1] is also
available.

John alluded to this in his "side note" but often (the majority of cases, IME)
the TTL will not be decremented inside of the MPLS core so you, as an end
user, will have no visibility into it. You'll just see, for example, hop #4
(Los Angeles) then hop #5 (New York City), completely unaware that the IP
packet actually passed through (MPLS "P") routers in Phoenix, Denver, Kansas
City, Chicago, and Washington, DC, in between.

You'll see the same thing -- although at a different layer -- when Q-in-Q
tunneling is in use (assuming L2 protocols are also being tunneled).

N.B.: On common Linux distributions, there are usually several traceroute
variants available. They are not all created equal. If a "regular" (UDP)
traceroute won't work, you can try using ICMP or even TCP.

[0]: [https://youtu.be/a1IaRAVGPEE](https://youtu.be/a1IaRAVGPEE)

[1]:
[https://www.nanog.org/sites/default/files/tuesday_steenberge...](https://www.nanog.org/sites/default/files/tuesday_steenbergen_troublshootingtraceroute_62.49.pdf)

------
stephengillie
Great post. This is a very clear explanation of the routing situation. The
last image is especially good.

What's needed is some kind of L2 Traceroute. It's maddening that an
increasingly common piece of routing infrastructure is invisible to almost all
common troubleshooting tools.

Edit: Cisco has one for their routers [0]. Since each hop would have to be
traced locally, an L2 Trace would require orchestrating against all hardware
devices in the route which introduces legal complications. A new discovery
protocol might be needed.

[0]
[https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6...](https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SXF/native/configuration/guide/swcg/l2trace.html)

------
wpietri
For those wondering, this is the MPLS to which the author refers:
[https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching](https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching)

------
isostatic
I rarely see mpls lines decreasing ttls. What I do see is traceroutes which
are something like

2ms 3ms 38ms 5ms 8ms 9ms (target)

That 38ms showing packet losss too. I suspect it's the control plane that
doesn't prioritise responses.

As I have no access to intermediate hops, nor even a business relationship
with the owner, traceroutes, and things like bgp.he.net, help me work out
where problems may be occurring and sometimes let me reroute.

Even on private wires it's hard to explain to network providers what an outage
means. Sometimes a 10ms outage can be the difference between your application
working and failing. Had a mikrotik at one event this year that I think was
bouncing the queuing process from core to core. Looking at the rtp before and
after showed it often throwing 30 packets out of order - about 15ms.
Occasionally it would simply drop them, and that causes a major problem with
broadcast video as normal FEC can cope with 20 losses on the trot at most.

Had to change ISP in kiev a couple of years ago because of a frequent 136ms
drop. I'd isolated it to the next upstream provider from our existing isp, but
nothing more could be done as it seemed that was their only real transit
provider, and 130ms doesn't really stop Facebook from working.

------
foobarbecue
Would have been nice if MPLS was spelled out when introduced, or you could use
one of those hyperlink things I've heard so much about to link to a
reference...

~~~
jlgaddis
I suppose, although the target audience will already know what MPLS stands for
(Multi-Protocol Label Switching, FWIW).

~~~
masklinn
The post explains MPLS's impact on getting traceroute wrong, and its target
audience would thus be people getting traceroute wrong: people who know what
MPLS is likely understand its impact on traceroute traces already, however
there would be a very large overlap between people not knowing about MPLS and
people misunderstanding traceroute output.

------
inopinatus
The other big lie is re. symmetry.

Replies may not be travelling the same path as the request, but traceroute
offers no visibility of the return path.

------
rwmj
Is it common for MPLS routers to decrement the TTL inside the packet for each
hop within the MPLS network? I thought that the whole point of MPLS was the
packet doesn't need to be looked into?

~~~
jlgaddis
IME, no, the TTL usually isn't decremented. It is configurable, though, and is
sometimes done for visibility and/or troubleshooting.

------
devonkim
Last I remember newer versions of mtr include MPLS labels in their output to
help avoid this confusion, but that version wasn’t included in CentOS 6
perhaps even from EPEL.

------
hw
is there a way to identify a MPLS router aside from just analyzing the
traceroute? Would be nice if the hostnames had an identifier in them for that.

~~~
cperciva
It's not ideal, but if you ping an internal MPLS router you'll typically see a
very different RTT to the one you see from traceroute. (On the other hand,
you'll often see nothing at all, since internal MPLS nodes often aren't using
publicly routable IP addresses.)

------
jeff6845
That traceroute example doesn't lie at all, just creative writing. The author
conviently does not mention VPN not once. Plus, MPLS service providers have
been around awhile. It's technology is well-understood.

Traceroute can missrepresent where latency is due to how it uses the TTL
field, but increasing the sampling size usually averages out that issue.

