firebinder's comments

firebinder · on March 1, 2017

Lots of great feedback here. I know from my own experience that the leading culprit of bad VoIP (and video) quality is a poorly performing ISP connection.

I'm one of the co-founders of Firebind. We use active traffic to continuously assess network quality of last mile ISP links, with a focus on sites using VoIP and video. Our software agent tests back to the cloud (combo of AWS and customer chosen destinations) every 5 minutes and makes 11 measurements by putting traffic on the wire. Two of those measurements are upload and download packet loss based on a synthetic G.711 VoIP call that runs for 25 seconds in each direction to one of our AWS hosted agent clusters.

By graphing the upload and download loss 24x7x365, we can see how good the network quality is on an ISP circuit far better than you can with a TCP Bandwidth test (which is effectively a destructive test) or anything ICMP based. And since we use active traffic, we have visibility even at 3am when no calls are being made.

I've reviewed hundreds of agent charts personally just in the last few months, many times with the customer. By far the number one issue is a bad quality ISP circuit, followed by an occasional bad or misconfigured network device at the customer site. Every once in a while the hosted VoIP provider might have some temporary glitch, but it has been pretty rare in my experience. Similarly the WAN path beyond the local ISP is usually not the problem.

Oversubscribed links are a very prevalent issue, especially with cable-modem based circuits where the upload bandwidth is a fraction of the download. Some of the things I've seen recently...

- Cable-based ISP circuit in Concord, CA whereby the upload packet loss increased in proportion to the daily high temperature - Customer site who accidentally configured their backup process to kick in at 3pm instead of 3am and hence flooded the upload path during working hours - Bad loss in both directions that was eventually traced to a bad coax line that had been nicked by a lawnmower - A cable modem with a memory leak that would gradually drop packets at a rate increasing up to almost 20% over the course of 2 weeks, where a power-cycle would "fix it" for a few days until the leak grew again and dropped more packets. - Incident with 2 Charter sites in 1 state and another in a different state whereby all 3 had transient 70+ percent download loss for three days back to our test site at AWS in US-East-1 (I sampled dozens of other agents around the U.S. and only the Charter sites saw this issue.) - And of course, the Netflix effect. Dozens of times I've seen download loss spike during prime-time TV hours, and that's the only time each day that remote site has download loss.

The best test you can do with open-source tools is to use iperf out to the cloud (to a $5/mo Digital Ocean droplet perhaps) on regular intervals and simulate the codec you care about. We built our own iperf-like solution that is highly concurrent but you can copy our settings and aim for 87 Kbps with 218 byte payloads over UDP. That should give you 50 pps of VoIP.

And of course for the shameless plug, if you want to check out Firebind, you can find a free trial link at our site. https://www.firebind.com.

dtaht · on March 2, 2017

I would be very interested in trying to see the results of the firebind stuff, with our stuff (fq_codel and cake based "sqm") in place on the link. Could you drop by the bufferbloat mailing list and chat with us?

Also: We use a tool that we consider much better than iperf alone, it's from "flent.org" and gives us the abilty to inject loads and measure the side-effects.