Hacker News new | past | comments | ask | show | jobs | submit login
Dragonfly: Alibaba P2P file distribution system (github.com/alibaba)
269 points by eb0la on June 13, 2018 | hide | past | favorite | 56 comments

Container image distribution seems to be one of the primary problems this tackles:

"DevOps ... brings a lot of challenges: the efficiency of image distribution, especially when you have a lot of applications and require image distribution at the same time. Dragonfly works extremely well with both Docker and Pouch, and actually we are compatible with any other container technologies without any modifications of container engine."

FWIW, this was a similar problem that I tackled for Golang gopher gala hackathon 2015 - a custom bittorrent based docker image registry POC.


Interestingly my problem statement was somewhat similar:

"Large scale deploys are going to choke your docker registry. Imagine pulling/deploying a 800mb base image (for example the official perl image) across 200 machines in one go. That's 800*200 = 160GB [EDIT: Correction thanks to kingbirdy] of data that's going to be deployed and it'll definitely choke your private docker registry (and will take a while pulling in from the public image)."

Devs need to be aware that bittorrent now(for sometime) have a DHT solution that allows to have "mutable" slots with the crypto public address tied up to it.

So if you own the private key you can write the payload, and just share your public-key address to the people you want to share the payload with. In the payload you can write a traditional immutable torrent manifest for instance, which is in essence a public-key crypto based update system.

For a lot of cases i think it's a better approach than what IPFS and DAT provides, because you dont care about it being a global address. All you want is to share with a group of people, more in the p2p social/organic way.

I was playing with it once, using the libtorrent library and the main bittorrent DHT, and it was a very nice experience.. it finds the payload pretty fast when you think its a DHT, and you are working in pure p2p fashion.

The only single point of failure here is the DHT bootstrap peer.

Im planing to use this feature to distribute binary images for clients that have my public key.

Very interesting. Forgive my ignorance, but you mean it's kind of like mutable torrents that only the uploader can modify?

Does a client just search the DHT for the public key? I thought torrent clients searched for the hash of the files.

If it's searching for the public key then how does one person upload multiple different torrents, or do they create a new public key for each torrent? How does a client know which is the latest version if it has been updated multiple times?

Are there any example projects using this?

> Are there any example projects using this?

The only project I know is gittorrent (https://blog.printf.net/articles/2015/05/29/announcing-gitto...), but it hasn't gone anywhere.

@namibj has made a more elucidative comment with links to libtorrent library as also the torrent BEP that describe how the DHT are supposed to work in detail.

> but you mean it's kind of like mutable torrents that only the uploader can modify?

Yes, your public key(hash) is the DHT key which is the one that identifies the payload, and only the private key owner can modify the content in that particular slot.

Thats why its cool, because you can have a p2p system that rely on trust between the parties, unlike the traditional torrent system.

Also im not so sure, but lately using centralized trackers are discourage and i guess that magnet links must use something like a DHT to work the way they do.

> I thought torrent clients searched for the hash of the files.

You are correct, but the DHT is a BEP and is something more "on the corner", but its theres and at least in the time i've tried it was working great.

> If it's searching for the public key then how does one person upload multiple different torrents, or do they create a new public key for each torrent? How does a client know which is the latest version if it has been updated multiple times?

The rules are:

You can create any slot you want, its just a matter of generating the public/private pair you want to use. (This would allow you even to use a forward-secrecy algorithm if you need one)

The byte payload/value must be small, so you should use to give a manifest of something or to point to something else.. But lets not forget that Git just look at the HEAD record with only a hash to go on from there. Just pointing out to something else that can be a immutable resource, like a traditional torrent.

So you can point to a torrent, download and have a small list of anything you like.. working as a catalog, and go from there.. anyway if you play the indirection game here right, theres no limit to what you can do.

If you want the payload to be there you need to keep writing to it from time to time (the same value if you want), or it will expire and other peers will not be able to locate it anymore.

What would i do? i would use the payload to point to something else.. like some torrent in the classic bittorrent network (you can use just a magnet link), or expose the whole torrent header. You can also point to some http resource or whatever.

Need something more? how about point to a torrent that download a bootstrap program that start a RPC service over tcp.. or over a more simple HTTP interface.. than do something else from there..

I was thinking about how can i use this to create a update by using diff and patch, giving that by using a public key scheme you can create a trust relationship between parties and patch the binary with something coming from the 'mothership'.

> Are there any example projects using this?

I dont know any, but in my case i was playing with libtorrent implementation of DHT. And also as far as i know, is this kind of properties in the DHT that allow projects like IPFS to exist.

The cool thing about using the torrent implementation is because the main DHT have a lot of nodes already so you can find something pretty fast.

Is there anything I can play with for mutable torrents? I thought it was still only in research paper form.

libtorrent has support for it [0], as far as I know. The "research paper" you probably mean [1] is the torrent equivalent of and RFC. AFAIK at least python uses something very similar.

There is an apparently Node.js implementation [2] of something that can publish a given torrent to a mutable address, and also retrieve a mutable torrent from a given address. I do not know how well this is implemented in the usual clients, but if you want it in your docker, you might want to talk to libtorrent directly, and implementing BEP 46 yourself should not be hard with the things the library has to offer. A benefit could be that depending on how you handle it, you might be able to store the tarballs docker images seem to be in their unpacked form, and just keep some metadata about what the header(s) of the tarball were, along with some file offsets. This way you would be relieved of the unnecessary storage burden, and able to possibly use many more of your servers to seed at least part of the images, e.g. maybe only the parts that are not mutated when the software is running. E.g., download once, unpack, only offer to seed those files/pieces that did not get modified in the meantime, without trying to re-download the "broken" data.

[0]: https://www.libtorrent.org/dht_store.html [1]: http://www.bittorrent.org/beps/bep_0046.html [2]: https://github.com/lmatteis/dmt

It would be 160GB, not 1.6TB

Thanks, fixed.

Ok, this is huge.

>At Alibaba, the system transfers 2 billion times and distributes 3.4PB of data every month, it has become one of the most important piece of infrastructure at Alibaba. The reliability is up to 99.9999%.

I took at look at their repo and it turns out there are surprisingly lots of good stuffs in it which never gets much spotlight or attentions.

It's a cool project, but I'm not sure how huge that is.

If you have 5000 nodes running this, that could be as small as distributing 23.8 gigs/day to each node over the course of the month.

3.4PB/month is only 111,000 GB/day. That's not particularly huge in this day and age, except to consumers.

I imagine this traffic is fairly bursty as well.

Also keep in mind this is a data distribution system. In the case of large data pushes, new builds, etc. it's important that all peers get the new data on a timely and reliable manner.

I suspect you're underestimating the problem this project solves by focusing on a mostly irrelevant data rate stat.

> The reliability is up to 99.9999%

"up to"... Certainly looks like a wrong wording.

I read it in the "we have now gotten it up to X" sense, such as "I'm now up to my 5th beer"

except reliability reduces a total, not sum integers like beer count.

if I write a shell script with a scp line right now I can tell you it has 100% reliability. up to X% reliability means that "in the very nice and controlled environment, with the own devs attacking every production problem, trying to get 100% got us up to X%"

Except that is clearly not what they where trying to communicate.

That's the way to read it.

Quay has supported Docker image pulls over BitTorrent for a couple years now https://coreos.com/blog/torrent-pulls.

Docker themselves have discussed making the official registry extensible enough to support BitTorrent pulls, but I don't know if anything ever happened there.

Facebook has been using BitTorrent for deploys for something like 9 years now. They configured the tracker to prefer sharing peers with longer matching subnet prefixes, to keep bandwidth off the backbone as much as possible.

Is there a paper or talk you could link to that describes their system?

Justin linked the blog post[0] which is probably the best written description. The short of it is that when you upload layers to Quay, it stream calculates the BitTorrent pieces. Private layers are given unique swarms isolated by namespace and peer discovery is protected by a tracker[1] that contains middleware validating JWTs passed in the announce URL of the torrent. A custom client[2] can be used to simplify downloading and importing of images into the local docker CAS.

Honestly, most organizations don't have sophisticated enough networks that the benefits outweigh the complexity of p2p orchestration. This is why it's popular at Alibaba, Facebook, Twitter, but most people are still just using the OCI distribution protocol[3].

Feel free to contact me (Keybase is in my profile) if interested. I'd love to get more people on the path to p2p, but it's often a solution looking for a problem.

[0]: https://coreos.com/blog/torrent-pulls

[1]: http://chihaya.io

[2]: https://github.com/coreos/quayctl

[3]: https://github.com/opencontainers/distribution-spec

From: https://github.com/alibaba/Dragonfly/blob/master/src/README....

- supernode(Java)

- dfdaemon(GoLang)

- getter(Python)

Interesting distribution of languages in what seems to be a somewhat self-contained project.

Oddly, https://github.com/alibaba/Dragonfly/tree/master/src only contains the getter and the supernode at the moment

Looks like dfdaemon is at the root: https://github.com/alibaba/Dragonfly/tree/master/dfdaemon

There is a plan to use one language(GoLang) to refactor this project. Now the project's directory structure has been restructured to meet the GoLang project style, and the entire CI process has been built. The next step is to migrate the 'getter' & 'supernode' from '/src' to the root and reconstruct them with GoLang.

This is very similar to what BitTorrent does or am I missing something?

The amount of p2p-ness in it is not any much more than that of any DFS of previous decade.

The P2P slang though is freaking everywhere here. There are P2P bank and P2P brand sausages.

Not quite, this is not a "distributed file system" this is a "file distribution system".

This is more along the lines of https://github.com/lg/murder

How does this compare to IPFS or BitTorrent?

or Syncthing

Sounds like Twitter's Murder tool from 2010:


Looks like its not maintained anymore

Anyone knows if it does NAT hole punching? I'm interested in such a tool for deploying to remote machines.

If you only need such a tool, other comments on this submission linked ways to use bittorrent with docker, and µTP [0] seems to be reasonably good at punching through the good old style of NAT, where ports are sequential and on the same IP, with something that can coordinate accessible to both. It also enables gentle use of your bandwidth, in the sense of playing reasonably well even if you don't have fq-codel or similar in use on the router. With somewhat nice networks it can be pretty gentle on other users of the networks, without wasting any part of it. Do consider QoS though, it is preferable to send other, important traffic first, as µTP is good at backing off in these cases. The latency in backing off is just a little high to be stealthy towards concurrent TCP connections. Packet loss is rare, but lag spikes are still a nuisance.

[0]: https://en.wikipedia.org/wiki/Micro_Transport_Protocol

Is Dragonfly a server or client? Cause it is compared with wget which is a client. Am I missing something?

It's P2P, so I would wager it is both.

It duplicates most features of libtorrent.org; but requires a server. Therefore, a comparison to wget misses the 17 year old Bittorrent protocol.

Exactly. They should be comparing their P2P protocol to the whatever is currently the best of open-source, P2P libraries.

Also a bit confusing as Dragonfly BSD is known for it's custom filesystem, HAMMER:


If you get confused between a filesystem called HAMMER, and a file distribution system called Firefly, I don't know what to tell you.

They are different things with different names.

Well, this one is called Dragonfly, not Firefly, so the naming overlap is relevant.

Do you also get confused by Dragon Naturally Speaking and Firefox and their similarity to those 2 names?

These are literally two things with the name Dragonfly. I don’t particularly care about whether the shared name is confusing here but it is disingenuous to pretend that identical names are merely similar.

Dragonfly and Firefly are not the same word.

One of us is clearly missing something here. The two things I’m talking about are Dragonfly BSD and Alibaba’s Dragonfly file distribution system. Neither of these are named Firefly.

Do you also realize that calling the browser Firefox was a rename from Firebird due to confusion with an unrelated RDBMS system?

And Firebird was a rename to avoid infringing on the name Phoenix because the owner of Phoenix BIOS was threatening a lawsuit.

And https://firefox.org/ is still completely unrelated to the browser.

Jesus, is it so hard to google at least once before naming a project? Just recently that Sonar thing now this. What's next? A Windows editor called Linux?

To be fair, I've to admit that Dragonfly BSD doesn't show up on the first result page when you search for "dragonfly". What does show up is the malware by the same name.

Side note: this also shows how simplistic Google search really is. No way to search for "Dragonfly /computers/" opposed to "Dragonfly /nature/" with the terms in slashes denoting a concept or domain instead of a syntactic element.


dragonfly +nature

dragonfly +computer

Though i don't blame you for not knowing, out of 4-5 operator cheat sheets and guides i only see mention of the "-" to exclude terms.

"+" is for synonyms. That might lead in the right direction but it's not the same.

FWIW, they can just call it Alibaba File Distribution System (AliFDS), which is:

- Easier to recoganize

- More branding influence (see the g... staff from Google)

- Easier to search

BTW: A windows edit called linux: not a bad idea!

I used to think that too, but at the rate JS developers are creating package it wouldn't take very long before all decent, easy to remember names to be used up. Surely you don't want Alibaba to name a product in Chinese sounded English which no one could spell or remember right?

( I still cant spell many of the Chinese Companies' English name )


The developer has since changed the name from Samba to Battlecry....

Google manipulates its results in a variety of ways. The end result is that if you and your networked peers aren't already searching for things related to Dragonfly BSD, then you're less likely to find it, because Google will be 'helpfully' biasing your search towards other stuff.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact