
WDT, a library to transfer files as fast as possible over multiple TCP paths - ot
https://github.com/facebook/wdt
======
mpitt
What does "multiple TCP paths" mean? TCP does not deal with paths
(routing/forwarding). Is it just a typo for "connections" or does the library
really use multiple paths where available?

~~~
ldemailly
Facebook internally use SDN which hashes source host/port destination
host/port to pick paths. By using multiple connections on independent ports we
do get multiple path utilized and can thus get better reliability and
throughput if some are less healthy than others.

------
amelius
Using UDP, they could have possibly made it a little faster even. See [1].

[1]
[http://web.cecs.pdx.edu/~jsnow/wireless_performance/tcp_udp....](http://web.cecs.pdx.edu/~jsnow/wireless_performance/tcp_udp.html)

~~~
mpitt
But when you're transferring files, you usually want reliability. They could
have used UDP and implemented reliability in the application, but would that
have saved overhead? I don't think so.

EDIT: By the way, you source is about 802.11b. Very specific case, and also
quite old.

~~~
IshKebab
TCP sends data in-order but you don't need that for transferring a file. Hence
you can do better than TCP.

~~~
ldemailly
[http://udt.sourceforge.net/](http://udt.sourceforge.net/) is a solution using
UDP but we decided that TCP with multiple flow was a better tradeoff for us
(not to reinvent most of the window, congestion control, etc... of TCP)

------
windexh8er
Maybe I'm missing something but why would you do this, even in your own DC,
with no authenticity or authentication as core components of the tool?

Interesting the performance they get, but it seems like running the
server/receiver indefinitely would be a not-so-good idea.

?

~~~
acdha
I've known some people (researchers in HPC data centers) who took approaches
like this where they shared MD5[1] hashes on a traditional channel so they
could use a simple UDP system with a really tuned file -> network path and not
have to implement the crypto.

1\. This was a long time ago. Don't do this now.

------
rakoo
I'd like to think you can do the same using plain bittorrent, because it will
consider every new eth interface as a possibility to open another connection,
maximizing the amount of data among all connections.

You also get authenticity for free, although you lose overhead time because
you need to hash the full data and send the torrent info first.

~~~
ldemailly
bittorrent is optimized for sending to many and across peers. for fastest data
transfer the time it takes to read the data is the bottleneck once the network
is optimized, so hashing before sending would be more costly (for a 1:1 copy)

we are considering 1:many case for future development but there are tools
(like bittorrent) which already excel at this

------
timeu
How does this compare to bbcp ? I had always trouble getting bbcp to work
especially if the hosts are behind a gateway.

------
Tekker
How is this better than 0MQ (or ZMQ)?

~~~
ldemailly
My understanding iz ZMQ is a messaging/rpc mechanism while WDT is aimed at
large data transfers (though it also works well with lots of small data as
it's streaming)

Also I don't know that ZMQ uses multiple streams ?

