Hacker News new | past | comments | ask | show | jobs | submit login
How we moved to Google Cloud using Consul and ZeroTier with zero downtime (channable.com)
105 points by rkrzr on Oct 25, 2017 | hide | past | favorite | 21 comments

This is very cool. I'm wondering if decentralized overlay 'mesh' networks will become more prevalent in the future. Overlays have obvious benefits for multi-cloud setups if latency isn't an issue for the services you run through the network. I can imagine this becoming a more popular technique in the future as cloud instances become cheaper and more people are willing to make the performance trade-off for convenience. Additionally it can be a great pattern to fight vendor lock-in.

I run a similar concoction for my 'home network' which consists of various mobile devices like my Android phone and some cloud servers, but instead of ZeroTier and I use CJDNS in combination with Consul. The code for it is on github in the vdloo/raptiformica repo. I ran into various issues with the difference in latency between nodes. I think most people run Consul in a very homogeneous environment (like in one datacenter), but maybe perhaps the differences between using it cross-cloud is not enough to cause problems. I'm wondering if there were some Consul settings that the author had to tweak (and how) for stability and if there were any unexpected issues.

One thing that caused me problems with Consul on overlay networks was an ARP cache overflow. DigitalOcean also ran into that running Consul at scale in their DC if I recall correctly: http://youtu.be/LUgE-sM5L4A I noticed that if I put enough Dockers on one host (like 50 - 100) in an overlay network and tried to run Consul on top of that things would start to overflow, presumably because of all the network abstraction and redundancy. I'm wondering how many machines the author had in one Consul cluster and if they tested to what amount of nodes this setup could scale.

This is a fantastic write-up. Also I never knew this about S3:

"While it does provide read-after-write consistency for new files, it only provides eventual consistency for overwrite PUTs and for DELETEs."

Glad you enjoyed it! I'd be happy to answer any questions about our experience.

> A technical problem was the lack of true strong consistency for S3

What use case made this cause an issue for you guys? My go-to resolution here is just to use immutable files (sha256'd filenames typically), though that does entail storing separate keys for every related object

We host a lot of files and we do need permalinks to them. We also need updates to be visible immediately for everybody.

That's fair. Anything you don't need to resolve as redirects/file serving through an app is a big win

Have you ever tested S3's eventual consistency? Just curious.

There's no explicit SLA, but I've heard rumor that they start getting paged at ~4 hours staleness, though that might not be actually-true. I would think it's pretty quick in the average case.

Random question, but did y’all purchase a different license from Zero Tier the company or is the GPL on ZT somehow not relevant for you given how it was deployed or used? (And if so can you elaborate?)

It also only provides that read-after-write consistency for new files if you didn't already ask for that new file before writing it. Otherwise you could get a stale "no object" response.

Sorry, "S3" as in: the whole AWS (Amazon cloud) system? o.0

It's awesome... Just saying. Despite of: https://jacquesmattheij.com/the-web-in-2050

S3 is just the simple storage service.

> "With GCP, the pricing is very straightforward: you pay a certain base price per instance per month, and you get a discount on that price of up to 80% for sustained use. In practice: If you run an instance 24/7, you end up paying less than half of the “regular” price."

I wish either of those two statements were true! I hope you didn't actually base your pricing decisions on what you wrote in the article. The sustained use discount is a total of 30% off (not "more than 50%"), if you use an instance 24/7 for a calendar month. Also, the 80% discount off the full price is only for pre-emptible instances, which are the ones that may not be available and are always killed within 24 hours.

I think you are right. There is nothing "transparent" about pricing in the various clouds. I am of the mind that one should test for min. 30 days to get a good feel for what it will cost. There are just too many variables and little minefields (ingress, egress, network activity, etc.)

Founder of ZeroTier here -- nice writeup!

Huge fan. I've always been curious, but never reached out; Is there any way for people to host their own controller for free for non commercial endeavours? I'd love to get more hands on experience with how the software I love so much actually works.

Yes, you just have to control it via its JSON API. The only real non-free thing we have is the control panel GUI.

A very satisfied user of Zerotier myself. Big thanks!

We are a 3 people distributed team, with a bunch of machines in various inhospitable environments. We needed to connect to each other, without a full on VPN due to the performance issues. Also, we needed to access these random machines we have (some of them are just arm boards in remote locations with unreliable wifi).

Zerotier came as a life saver. The other folks are not very tech savvy (in the sense of being able to configure VPNs, manage bastion hosts, etc). When I enabled ZT for them, one of them commented: "this is science fiction. zero to connected in under a minute, without any mumbo-jumbo".

You (and your team) have done a great job at building this!

Never heard of you before this

Going to give it a spin. Seems a lot better than OpenVPN for client to office communication.. is this reasonable use case for it, or am I misusing zero tier?

That's a perfect use for it, actually.

I personally would try wireguard instead of Zerotier for those hybrid cloud topologies instead.

Of course it's a kernel module, so you have to build for target host and load.

Anyone tried this?

Been using it for internal apps (to mesh kubernetes nodes in virtual deployments) successfully for some months.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact