
Kraken, an Open Source Peer-to-Peer Docker Registry - kungfudoi
https://eng.uber.com/introducing-kraken/
======
leetrout
I’ve said this at other companies I’ve worked at — the datacenter / enterprise
software distribution is underserved by P2P.

I wish more companies used BitTorrent internally for distribution of tools and
software. I feel like homogenous environments like those that benefit from
centralized config management like puppet would probably see performance gains
with P2P technologies introduced.

This looks like a great contribution from Uber engineering and I look forward
to playing with it!

~~~
jzelinskie
This is partially why I built chihaya[0]. The idea is by having a powerful
middleware model for traditional BitTorrent software, devs can easily extend
the protocol for whatever they need internally. As a sibling comment mentions,
an older fork of Chihaya is one of the tools used at FB for distribution in
their orchestration system, Tupperware.

In practice, internal networks are such a mess that leveraging p2p is often
more trouble than it's worth.

[0]: [https://chihaya.io](https://chihaya.io)

~~~
dman
Could you elaborate on the issues with local networks?

------
stuff4ben
Having been at the hurting side of both a 500+TB globally distributed
Artifactory cluster and a relatively smaller 30TB Quay cluster, I for one
welcome this new contender! Scaling binary object stores globally is no small
feat. FWIW, JFrog the company behind Artifactory is way better than
CoreOS/RedHat from a vendor support aspect.

~~~
regnerba
O_O 500+TB Artifactory cluster! Hahahaha ours is like 30GB and just RPMs. Was
it much of a challenge to scale Artifactory to that level?

------
webmonkeyuk
I'm chuckling inside thinking about how many people go and install/use this
versus how many people actually work at a scale that they need to use this.

~~~
farisjarrah
It makes sense to plan ahead for increased scale. If you are working for a VC
backed company whose mission goals are to grow grow grow scale scale scale,
then you cant exactly build for the infrastructure you currently are using.
Its perfectly acceptable to build out an overbuilt infra, as long as your
costs aren't shooting you in the foot. You know whats worse then paying too
much for infrastructure? Loosing money and clients because your infrastructure
breaks anytime you get a real workload on it.

~~~
franciscop
But even worse is not being able to release because the system complexity has
shot through the roof. Plan (and test!) for 10x scale at a time, then optimize
to squeeze another 5-10x while you build the 1000x system.

------
hardwaresofton
Naively I would have thought a docker registry with on-disk storage managed by
ipfs would have been a low-effort way to meet this requirement.

Unfortunately you'd need _something_ to manage the pinning settings, but it
feels like a relatively small addition to some other registry that could be
hacked together in a small amount of time.

~~~
zcw100
You mean something like ipfs cluster?

[https://cluster.ipfs.io](https://cluster.ipfs.io)

~~~
hardwaresofton
I didn't know ipfs cluster existed -- thanks for the reference.

I was more thinking of a harbor[0] like cobbling of these technologies
together -- harbor combines a bunch of F/OSS tools into one powerful registry
solution. Here are a few:

\- Distribution for storing images

\- Notary for signing

\- Clair for static analysis

Based on this I was thinking of was basically Distribution + ipfs cluster + a
small management daemon. The daemon is only there to tie the other pieces
together and present a unified interface but the bulk of the work can be done
with the other pieces.

[0]: [https://github.com/goharbor/harbor/wiki/Architecture-
Overvie...](https://github.com/goharbor/harbor/wiki/Architecture-Overview-of-
Harbor)

[1]:
[https://github.com/docker/distribution](https://github.com/docker/distribution)

[2]:
[https://github.com/theupdateframework/notary](https://github.com/theupdateframework/notary)

[3]: [https://github.com/coreos/clair](https://github.com/coreos/clair)

------
zerotolerance
A P2P software supply chain had better have a content signing and key
distribution framework in place.

------
discordianfish
I'm intrigued by p2p distribution like this but it makes me wonder if it's
really more cost effective to switch/route all these small torrent packets
instead of just using fileservers with 10G interfaces and maybe tiered
caching.

~~~
justinsaccount
fileservers don't send packets?

"... a test where a 3G Docker image with 2 layers is downloaded by 2600 hosts
concurrently (5200 blob downloads)"

they show this finishing in 20s, so that's 7,800 GB transferred in 20 seconds.
That works out to 390 GB/sec. A 10Gbps interface can transfer about 1GB/sec,
so you'd need 195 dual attached fileservers, or about 100 if they were running
at 40gbps.

~~~
jsight
> fileservers don't send packets?

This line is far funnier than it should be. It reminds me of the time that I
was told about how awful some architecture was because it would overload the
network.

Their improved architecture sent lots of smaller files over NFS instead.

That was novel.

~~~
thekhatribharat
Well, probably they meant network congestion. Network congestion depends on
both time distribution and link distribution of data, which differs widely
with the choice of network protocols and network topologies (even if total
data flowing through network remains the same)

~~~
discordianfish
Yep. I'll admit, I didn't crunch the numbers but routing torrent traffic isn't
cheap. I haven't found a definitive resources but AFAIK/IIRC torrent uses at
least an order of magnitude more packets per second for the same throughput as
HTTP.

~~~
justinsaccount
> torrent uses at least an order of magnitude more packets per second for the
> same throughput as HTTP.

This is total nonsense. This would mean that instead of using 1200+ byte
packets, bittorrent uses 120 byte packets to transfer data. This is easily
disproven by looking at the implementation of any client or just by looking at
the traffic and seeing that it uses 1200+ byte packets for sending chunks
around.

------
instaheat
Kraken, not to be confused with the Bitcoin and Cryptocurrency exchange.

~~~
oregontechninja
Also not to be confused with the GUI git client "gitkraken"

~~~
barbecue_sauce
Or the php framework. Or kraken.js. Or the responsive css boilerplate. Or the
API gateway (KrakenD). Or the Joomla theme. Or the ransomware-as-a-service
(RaaS) affiliate program.

~~~
franciscop
I just had a talk last week with my coworker about how many tech projects seem
to be called Kraken and BAM, here's another.

~~~
WrtCdEvrydy
Same as projects call Phoenix.

------
warp_factor
Cool project.

I'm always wondering though, if those optimizations are really needed or if
they are created by engineers that want to have fun and do premature
optimizations. Ever wondered why a "WebApp" like Uber needs so many engineers
and new projects?

~~~
barnabee
I equally wonder where we'd be if no-one ever indulged in a premature
optimisation…

~~~
warp_factor
premature optimization is super fun. I did it a lot and I would advise to
engineers to do it if they can.

But when I look at companies like Airbnb and Uber for example, I cannot really
understand why they require that type of tools. Yes they are big, even very
big, but nowhere the size of a Facebook or a Google. Most of those projects
seem to be engineers having fun at work, and marketing themselves by
publishing cool blog posts.

------
jvassev
Wouldn't IPFS be a more natural fit - and also trivial to implement. Not HN
material but here it is anyway
[https://github.com/jvassev/image2ipfs](https://github.com/jvassev/image2ipfs).

I guess what IFPS is missing and Kraken has are "pluggable storage options,
and instead of managing data blobs, Kraken plugs into reliable blob storage
options like S3, HDFS"

------
jnsaff2
Reminds me of twitter murder:
[https://github.com/lg/murder](https://github.com/lg/murder)

------
marenkay
Why would anyone want to use this over Dragonfly, which seems to have way more
battle testing? Is this another case of NIH?

~~~
ShakataGaNai
This short answer seems to be "Timing".

#1 - Dragonfly started November 2017 where as "Kraken was first deployed at
Uber in early 2018". Which presumes Uber was working on this probably much
earlier than Dragonfly even started.

#2 - Dragonfly is still listed as a Sandbox CNCF project, where as Uber is
running this in production for roughly a year it seems.

#3 - Dragonfly was _just_ refactored completely from Java to Go. So its kind
of hard to say it's "battle tested" at all, at this point in time. It's
basically an entirely new project as of 2 days ago.

~~~
marenkay
[https://www.alibabacloud.com/blog/behind-alibabas-
double-11-...](https://www.alibabacloud.com/blog/behind-alibabas-
double-11-mysterious-dragonfly-technology-%C2%AEc-pb-grade-large-file-
distribution-system_594074) indicates Dargonfly existed much longer and was
used in scenarios much larger than what Uber presented.

What got into CNCF may be a newer iteration based on what Alibaba has been
running but that means it should at least hold a candle to what is described
in the post. With that said, one can surmise the Java version is the one they
used in production.

Apart from that, P2P image distribution is probably limited to internal uses
that are not critical. Kind of missing indications of much difference in
reliability to say an internal CDN this would give.

~~~
monoian
Kraken's readme included comparison with Dragonfly:
[https://github.com/uber/kraken#comparison-with-other-
project...](https://github.com/uber/kraken#comparison-with-other-projects)

------
justboxing
Gonna be really hard for devs to find tutorials[1] or other related things on
this, given the name ( 'Kraken' ) is already a well-established crypto
exchange.

Same is the case with 'Discourse'. Almost impossible to find relevant content
cos any search for 'discourse + keyword' invariably shows content related to
"online discourse (conversation)"

[1]
[https://www.google.com/search?q=kraken+docs](https://www.google.com/search?q=kraken+docs)

~~~
flurdy
My initial thought was that Kraken had released a Docker registry which was
odd.

But now I realise it is a separate project by Uber that is called Kraken.

But as it is a project I am okay with the name clash. It kind of suits the
distributed nature of its arms.

I know there will be some confusion but hopefully, people's google-fu and
search terms will be able to distinguish from Kraken the cryptocurrency
company, Kraken the docker registery tool and Kraken the mythical mega
octopus, and more.

------
ianstallings
This has to be the most insane shit I've seen in quite a while.

