Hacker News new | past | comments | ask | show | jobs | submit login
How to properly interpret a traceroute or mtr (lavin.me.uk)
100 points by zdw on March 23, 2022 | hide | past | favorite | 20 comments



There's a pretty big difference between ICMP and UDP/TCP pings: UDP and TCP can and most likely will be load-balanced across multiple different links, as the 4-tuples change with every new packet. ICMP almost never is, so you only see a single route where there may be tens.

If the network you're trying to debug is a) IPv4-only and b) not too wide, you may also find some utility in ping -R, which turns on the oft-forgotten but rarely firewalled IPv4 Record Route option, and that way you may get visibility into at least one of the return routes as well. Unfortunately, the IPv4 header can only record up to 9 hops, so it's less useful over the internet.


> Unless packet loss or increased RTT is seen on every hop between a given hop and the end of the trace, it is not a problem.

I once spent a month trying to argue this point to my ISP when they asked for mtr tests to diagnose a packet loss issue. My home router hop showed 90% ‘loss’ so they repeatedly refused to investigate on that basis.

The problem turned out to be widespread across their wholesaler’s network, but of course, according to the ISP, I was the only complaint they had heard of. Over the course of two years the wholesaler was bought out and their network upgraded, which immediately solved the recurring packet loss. Surprise, surprise.


I had a similar problem with my ISP gateway giving spurious readings on a traceroute, to prevent other techs from getting distracted by it I started using `mtr --first-ttl=3 ...` to ignore the first two hops.


RAS' (of nLayer/GTT/PacketFabric fame) presentation[0] about this same topic at NANOG47 is a lot more detailed.

[0]: https://archive.nanog.org/meetings/nanog47/presentations/Sun...



Maybe a bit of a tangent but does anyone have a good recommendation for monitoring a networks upness? My home internet suffers quite a bit at random times, sometimes it's very local (noise between repeaters on my telephone line) and sometimes it's 2 or 3 hops away. I've taken to running mtr during problem times so I can have a beat on the flakiness as that informs me how much it will affect work/gaming/browsing, but it's not so great of a tool to run constantly and look back at historical data. I'd really rather have something that can log, doesn't even have to graph it.

The problems I encounter go anywhere from blips of one way packet loss (I've been able to ping my IP from a server but not ping the same server from my apartment), latency randomly jumping 500ms, or full downtime. mtr is better than ping for this as I like to see whether the issue is happening on my line or further out, that usually informs how much noise I should make or how much time I have to kill.


Sibling comment looks interesting, but two other options I've used are A) https://testmy.net/auto which lets you set up scheduled speed tests, and B) some manual hacky scripts running via cron on a Raspberry Pi.

The latter is useful because you can run whatever combination of tests you want - there are CLI programs that run on a Pi for bandwidth/speed testing (I think speedtest.net has one specifically, but there are others, including at least one other that targets speedtest.net), mtr/traceroute/ping/etc. etc. can all be scripted, and if you get a bit clever with your pipes and sed scripts, you can end up with a file that can be directly loaded into a spreadsheet program for plotting (or you could go all-out and have the script generate a plot.ly plot directly). I've also plotted ambient temperature here, to maybe find any correlations.

What might be the most useful for you is to find a metric on your router/modem that provides some value - years ago I was able to find a signal quality metric on my DSL modem that I could record programmatically (simple API call with a the simplest HTTP password encryption option...), which eventually lead to discovering that there were line filters/attenuators that had been installed but not removed for the different seasons.


> My home internet suffers quite a bit at random times

If you are in the UK, just move your home broadband to Andrews & Arnold[1].

They are an ISP that know their stuff, have a knowledgable helpdesk and run 24x7 diagonstics on every line of every customer and make that data available to the customer[2].

No I don't work for them. No I'm not a shill. But I have sufficient techie colleagues in the UK who vouch for A&A being tip top.

Alternatively, if you have control of a remote server, you could try running IRTT [3] and feeding it into your favourite graphing/monitoring software is.

[1]https://www.aa.net.uk/ [2]https://support.aa.net.uk/CQM_Graphs [3]https://github.com/heistp/irtt


I built something which may be able to help: https://github.com/dmuth/grafana-playground

You'll want to spin this up in Docker Compose, and to add custom hosts upstream of you (such as ISP routers), you can edit the hosts.txt file (to add in human-friendly names such as "upstream") and add those hosts in the HOSTS variable in the docker-compose.yml file.

Feel free to open an issue on the project or reach out privately if you have any questions.


So if you have copper cable phone lines to your telecoms cabinet (typical UK BT OpenReach setup not Virgin Cable), disconnecting any and all your phone extension sockets in the property and having your router plugged into the master socket can see a doubling of speed! Its possible to have router establish a speed that is faster than your broadband package offers, so you will be throttled back to the speed you are paying for.

Some routers will also highlight if there is interference on the line, which makes it harder to calculate how long the cable from your property to the cabinet can be. Some routers will also show who the cabinet manufacturer is and firmware version, usually Broadcom or reportedly Huawei, but I've not found a Huawei cabinet yet. I've found German routers to be best.

The new cabinets also reportedly emit an independent radio signal in case the line to the cabinet goes down which might also be used to detect when someone has opened the cabinet. Anyone could mess around with the older telecoms cabinets! So monitoring the RF with a SDR could throw up more interesting data.

And then its off to your exchange where companies like TalkTalk also run their own cabling so you may not be on the BT OpenReach network, especially if you are a TalkTalk customer. I'm not aware of ISP using both OpenReach and TalkTalk but some arrangements might exist.

And then from there, if the problem is not between your exchange and your property, then sometimes using a VPN service into another country can help for those situations where the usual route to your popular sites are congested/problematic, ie VPN to a different country so that you can come in to the US from the West coast instead of the East coast, or VPN to German in order to access a Swiss website, instead of France to Switzerland.

(Submarine) Cable maps can be handy for working out the countries to VPN into to get a different route into a country and thus favoured websites.


In addition to the above advice about copper/ADSL, if any of your sockets are still live after unplugging whatever you can and/or you actually want to use a phone, you should also plug filters into ALL of your live sockets (whether they have something plugged into them or not).

For the ultimate, cleanest & best connection you can remove the faceplate of the master socket to reveal the "test socket" [1] (which in so doing disconnects every other socket in the house) and then plug a filter & your ADSL modem directly into that socket, or fit a replacement faceplate e.g. [2],[3]

[1] http://www.hmmm.ip3.co.uk/bt-master-socket-nte5.shtml [2] https://www.amazon.co.uk/N247-ADSL-FacePlate-Filter/dp/B0030... [3] https://www.run-it-direct.co.uk/adsl-vdsl-faceplates/vte2015...


> and then plug a filter & your ADSL modem directly into that socket

You probably don't want to use a filter here.


I have been running the official speed test cli script from ookla directly on my opnsense router and dumping the json output into a database. The script runs every 15 minutes and gives me an idea how my ISP is performing.

https://www.speedtest.net/apps/cli


I've used this. It's not everything I want but it's something I can show to the cable internet techs.

https://loggger.com


There actually an Ethernet level protocol which performs a similar function: https://en.m.wikipedia.org/wiki/IEEE_802.1ag (delay is measured using a sibling standard https://en.m.wikipedia.org/wiki/Y.1731) Unfortunately it's normally filtered out by telcos so you are unlikely to be able to use it to probe your telcos Ethernet. That's what they use to probe hops between actual routers though.


Somehow this reminded me of this video [0]. I can’t really tell if it’s a joke or just complete cluelessness. But it’s kinda fun regardless.

Edit: The article posted by OP is obviously high quality stuff, diving into MPLS, etc.

[0] https://youtu.be/SXmv8quf_xM


Seems legit. I can now see who is all browsing my gf onlyfans page.


Even though it lets you see network issues there sadly isn't always a way to fix them. One time I reported an issue and I was told that the issue was with my ISP despite having many people test and and using both residential and from within a data center. I was also told that my 60ms of ping between San Jose and Los Angeles was due to overhead of running it from a virtual machine. Despite having the ticket open for months it went nowhere and I eventually just found a different company to host my servers who didn't have routing issues.


Spot on. It gets messier when a failure in a T1 carrier between your CDN and a customer is causing issues. Neither of you (nor possibly even the CDN) has the power to fix it. Well, other than bypass the CDN and hope the traffic passes through different peers.

On the up side, MTR is indeed a useful tool for finding the issue.


love this. I used to manage a Operations team at a datacenter and this is good reading for the front line support guys. This provides a good overview and points the direction for more reading to troubleshoot and diagnose a potential problem. well done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: