
Tcpdive – A TCP Performance Profiling Tool - EvanJS
https://github.com/fastos/tcpdive/blob/master/README.md
======
brendangregg
Looks like a good project -- new metrics like how much time a TCP session has
been in different retransmit states should be useful for properly
comprehending issues, and estimating speed up.

And it's a bunch of SystemTap scripts that trace send/receive/etc! Glad it
isn't a kernel module, at least. :) So tracing send/receive does cost, which
is covered in the README:

"The figure above shows Per-core CPU consumption of tcpdive is less than 10%
while QPS is no significant influenced, which we believe is acceptable in most
cases."

Actually, I'd say this was acceptable _because_ it is so clear in the README,
and we can judge its usage beforehand. The overhead is actually pretty high,
2-10% CPU per core, but then that's what I'd expect for tracing send/receive
using SystemTap. Again, by making it clear we can accept or not accept it
beforehand and use accordingly.

eBPF should be making this type of tracing lower overhead...

~~~
aduitsis
You don't mention it, but credit where it's due: DTrace, with the creation of
which you had a lot to do, can be used in quite the same way in kernels that
have added the TCP provider. I think we all owe you a big thanks.

~~~
brendangregg
These look like a lot of new scripts, and not quite what I was doing with
DTrace, although in the same spirit that DTrace pioneered: tracing of TCP
internals for custom metrics.

I didn't create DTrace, but I did create the DTrace TCP provider and many
networking scripts. A nice list of them is here:
[http://dtracebook.com/index.php/Network_Lower_Level_Protocol...](http://dtracebook.com/index.php/Network_Lower_Level_Protocols#Scripts)
. My scripts focused on tracing events, workload characterization at different
levels, and some timing: connection lifespans, and 1st byte latency. (At least
in that location; I've got DTrace scripts scattered elsewhere too). The
tcpdive scripts have focused so far on perturbation study: congestion,
retransmissions, resets. Also useful!

------
tomkinstinch
And to simulate a degraded connection, the colorfully-named _comcast_ tool
works well:

[https://github.com/tylertreat/comcast](https://github.com/tylertreat/comcast)

~~~
wtallis
That tool is a very thin wrapper over preexisting OS functionality, and it
does little more than disguise how limited those capabilities are when it
comes to making realistic simulations.

If you really want to assess how your application will perform on low-quality
connections, it behooves you to understand what makes those connections suck,
and what specific capabilities your OS of choice has for simulating or
generating those conditions.

Statically setting latency to 500ms or packet loss to 10% is not realistic; it
simultaneously exaggerates the kind of performance issues that exist in the
real world and is much easier to compensate for than real network dynamics.

~~~
tomkinstinch
The wrapper makes the tool. It may not be perfect, and it would be nice to
have more of a "random but stochastically representative degradation" option,
but it's the tool I know and it works well. Can you suggest a better one for
simulation? On OSX there is Apple's Network Link Consitioner[1], but on Linux?
Characterizing a bad connection is great, but that isn't a tool. It's super
hand wavy to say "what specific capabilities your OS of choice has for
simulating or generating those conditions". My OS doesn't have capabilities I
can access easily, and _importantly_ , disable easily. Building on Comcast to
add simulation improvements seems like a viable option.

------
aduitsis
Some tools like tcpdump or ss are mentioned, but those tools are not really
comparable to what's described here. What I'd like to see would be a rough
comparison with the existing web100 set of kernel patches
[https://web10g.org/](https://web10g.org/) which is used for many many years
in conjunction with the Network Diagnostic Tool (NDT) from Internet2. It
provides userland visibility into some of the TCP kernel parameters of each
connection via a documented interface that can be used e.g. by a special web
server that does performance measurements. Also see
[http://www.measurementlab.net/](http://www.measurementlab.net/).

Similar results could theoretically be obtained with the TCP Dtrace provider,
which was added in Solaris 11 if memory serves. I am not aware whether FreeBSD
or MacOSX have any similar providers, but my info could be outdated.

The idea behind all these approaches is basically to target a specific TCP
connection and generate an event each time a TCP packet arrives. For each of
those events, a rolling estimation of the RTT is generated by the kernel and
is used as a basis for calculations for the congestion window, which limits
how many bytes can be subsequently sent. Various timeouts can probably trigger
similar events and so on.

(edit: s/packets/bytes/)

------
cbsmith
So, I just read the readme and haven't even fully digested that, but the tool
I'd have used in the past four this kind of problem space was tcpdump +
tcptrace ([http://tcptrace.org](http://tcptrace.org)). To help me understand
tcpdive, how would you compare the two?

------
ck2
I know there are 3G and 4G simulators but it occurs to me there might be a
market for a proxy/VPN that really does use 3G or 4G on a connection and feed
it back to you for testing in a desktop environment.

~~~
ori_b
> _but it occurs to me there might be a market for a proxy /VPN that really
> does use 3G or 4G on a connection_

Why not just use tethering?

~~~
ck2
Well I mean on demand, as a service.

[http://www.webpagetest.org/](http://www.webpagetest.org/) does this for
waterfalls, you can select real-world 2G/3G and even LTE

~~~
ori_b
Yes, and I mean what does doing it "as a service" bring over just pushing a
button on your phone to act as a hotspot?

~~~
ck2
What if you have a first-world problem and your cell service is too good and
reliable - even forcing the phone to 2G/3G service is not simulation enough.

~~~
pyvpx
the you spend some time defining what a typical 2G and 3G connection looks
like to your end users, and then you replicate that with existing tools. There
are very, very many open source solutions to adding latency, jitter,
throttling bandwidth, and replicating packet loss. in fact, I'd argue that
coming up with a few solid definitions of what a 2G and 3G connection look
like is far more accurate than running tests over one dongle "IRL"

