Hacker News new | past | comments | ask | show | jobs | submit login
WDT, a library to transfer files as fast as possible over multiple TCP paths (github.com/facebook)
34 points by ot on July 21, 2015 | hide | past | favorite | 14 comments



What does "multiple TCP paths" mean? TCP does not deal with paths (routing/forwarding). Is it just a typo for "connections" or does the library really use multiple paths where available?


Facebook internally use SDN which hashes source host/port destination host/port to pick paths. By using multiple connections on independent ports we do get multiple path utilized and can thus get better reliability and throughput if some are less healthy than others.


I expect it actually means multiple routing paths, using one TCP session per route to use the full bandwidth of each route.


Using UDP, they could have possibly made it a little faster even. See [1].

[1] http://web.cecs.pdx.edu/~jsnow/wireless_performance/tcp_udp....


But when you're transferring files, you usually want reliability. They could have used UDP and implemented reliability in the application, but would that have saved overhead? I don't think so.

EDIT: By the way, you source is about 802.11b. Very specific case, and also quite old.


TCP sends data in-order but you don't need that for transferring a file. Hence you can do better than TCP.


http://udt.sourceforge.net/ is a solution using UDP but we decided that TCP with multiple flow was a better tradeoff for us (not to reinvent most of the window, congestion control, etc... of TCP)


Maybe I'm missing something but why would you do this, even in your own DC, with no authenticity or authentication as core components of the tool?

Interesting the performance they get, but it seems like running the server/receiver indefinitely would be a not-so-good idea.

?


I've known some people (researchers in HPC data centers) who took approaches like this where they shared MD5[1] hashes on a traditional channel so they could use a simple UDP system with a really tuned file -> network path and not have to implement the crypto.

1. This was a long time ago. Don't do this now.


I'd like to think you can do the same using plain bittorrent, because it will consider every new eth interface as a possibility to open another connection, maximizing the amount of data among all connections.

You also get authenticity for free, although you lose overhead time because you need to hash the full data and send the torrent info first.


bittorrent is optimized for sending to many and across peers. for fastest data transfer the time it takes to read the data is the bottleneck once the network is optimized, so hashing before sending would be more costly (for a 1:1 copy)

we are considering 1:many case for future development but there are tools (like bittorrent) which already excel at this


How does this compare to bbcp ? I had always trouble getting bbcp to work especially if the hosts are behind a gateway.


How is this better than 0MQ (or ZMQ)?


My understanding iz ZMQ is a messaging/rpc mechanism while WDT is aimed at large data transfers (though it also works well with lots of small data as it's streaming)

Also I don't know that ZMQ uses multiple streams ?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: