
Why is Multi-Cloud a Hard Problem? - atopia
https://www.planetscale.com/blog/behind-the-scenes-at-planetscale-engineering-why-is-multi-cloud-a-hard-problem
======
hn_throwaway_99
> Why use true multi-cloud clusters?

> Two reasons: Disaster recovery and freedom from vendor lock in.

In my experience, those two reasons are almost never sufficient to warrant a
multi-cloud solution. The costs for multi-cloud are enormous. Another
commenter mentioned egress costs, but there are numerous other costs:

1\. You've added a _lot_ of complexity on top of existing cloud solutions.
That complexity can make things fail in unique ways that may make some of your
cherished reliability benefits moot.

2\. You are always coding to the "lowest common denominator" of any cloud
service, meaning you're missing out on a ton of productivity by forgoing
useful services.

I'm just curious if anyone can comment that has experience actually using
multi-cloud, and was it worth it?

~~~
dkhenry
I was one of the individuals who helped put this together and hopefully can
answer your question. Specifically about complexity and the productivity of
going multi-cloud.

Here is what we found, previously when people talked about going to the cloud
the state of the art was everything is done targeting a specific cloud
provider, you put your software in immutable AMI's and you use ASG, and ELB,
along with S3 and EBS to have really robust systems. You instrument everything
with CloudWatch and make sure everything is locked down with IAM and security
groups.

What we have seen lately is that because of Kubernetes that has all changed.
Most systems being designed today are being done very much provider agnostic,
and the only time you want to be locked into a specific technology is when the
vendor provided solution doesn't really have an alternative in a truly vendor
agnostic stack. Part of what this service is doing is taking the last true bit
of Gravity for a cloud provider and removing it, you can now run in both
clouds just as easily as if you were all in on one of them. There are some
additional costs if you are transferring all your data across the wire, but
that is where the power of Vitess's sharding comes in. You can run your
service across two clouds, while minimizing the amount of cross talk, until
you want to migrate off.

Also while this post makes a big deal about being multi-cloud, this also gives
you true multi-region databases. Thats something that was really only
available with Spanner or CosmosDB previously, both of which require you to
target them explicitly. PlanetScaleDB lets you use your existing MySQL
compatible software.

~~~
rubyn00bie
What does this mean: "PlanetScaleDB lets you use your existing MySQL
compatible software"?

I thought you all were offering Vitess not a "custom" solution, or are you
speaking marketing?

~~~
derekperkins
They are providing a managed Vitess service, which allows you to use almost
any software that is compatible with MySQL.

------
9nGQluzmnq3M
One key missing factor: live replication across three clouds is not just a
technical problem, but a cost problem, because the egress costs will be
murderous.

~~~
elabajaba
Not everyone rips you off on egress fees the way Amazon/MS/Google do. Quite a
few of the 2nd tier providers (eg. Vultr, Linode, Digital Ocean, Upcloud, etc)
offer $0.01/GB for public outbound bandwidth with a free allowance each month
(usually anywhere from 1-10+TB/month/instance depending on what you deployed).
Some companies even waive fees for companies they're partnered with (eg. if
you use Wasabi or Backblaze B2 instead of S3 you won't be charged transfer
fees to a number of cloud providers or Cloudflare thanks to the bandwidth
alliance).

~~~
9nGQluzmnq3M
The article is specifically talking about their DB solution "across the three
major cloud providers" = Amazon/MS/Google.

~~~
dkhenry
It's not mentioned in the blog, but we are actively working with other
providers to bring them on board. Hopefully we will have Digital Ocean soon,
and I am hoping for Packet after them

------
different_sort
The article was a good read, but I wanted to try to answer this from the
perspective of a fortune 100 enterprise (I work for one, in their cloud team).

We're starting a journey on Azure and AWS at once with limited financial
resources, and limited talent(it's tough to hire in cloud skills to work for
us, and our stack is so old it's not an easy transition for people who only
know that). Operating AWS and Azure and require different skill sets and
different approaches and they're far from transferable. All the tools and
techniques we develop or acquire for managing AWS are not applicable to Azure
and vice versa, and because we're splitting our effort between the two
everything takes twice as long.

I think the right way for a company like us to approach this would be to go
"all in" in one, build expertise and offer a lot of value back to the
business, then look to build out the second cloud to meet your BCP/Cost
Savings goals.

------
k__
Funny thing is, if those multi-cloud proponents would go in 100% on one
provider, things would go much smoother and they would have less reasons to go
multi-cloud in the end.

But yeah, when I look at the rate that some companies sunset their products, I
understand the fear a bit.

------
fooker
Multi-anything is a hard problem.

~~~
tekno45
Multi-person arguing is pretty easy.

~~~
vikramkr
But winning an argument against multiple people is a lot harder than winning
an argument against yourself

------
boris-ning-usds
I'm dealing with similar problems - trying to setup direct connection between
AWS and Azure.

How does planetscale handle the complexity of DIY classic VPN and ensuring a
high availability on those VPN links - and ensuring that a certain amount of
throughput can be sustained?

Is there a requirement for planetscale to create a full network mesh between
all cloud providers, all regions? I'm assuming that it's more selective
because it becomes untenable as more cloud regions pop out requiring (n *
(n-1))/2 VPN links where n is the number of cloud regions.

Happy to learn anything I can here. Thanks for the blog post.

~~~
dkhenry
Yes there is a requirement for a full mesh, and generally it has to be in all
regions. GCP will route for you at the Network level so we could get away with
not all GCP regions being peered to all AWS and Azure regions, but all AWS and
Azure regions need to be peered to each other.

For the HA of VPN links for most providers its handled automatically, AWS <->
Azure and AWS <-> GCP both are HA links offered by the provider. Azure <-> GCP
is a Classic VPN so we need two of them and we need to manage the routes to
make sure they would fail over in the event of a loss of one system.

Throughput is another story, we are very much limited by the throughput of the
various VPN's. We haven't pushed GCP or Azure to the max to see what they can
do, but according to the documentation we should be expecting around 300Mbps
across each link in the mesh before we start to see throttling. At that point
it makes sense for us to move to a co-located exchange and peer with dedicated
connections.

Finally for other providers, when we start to get them inbound we will be
looking at using the transit gateway's of the various providers to reduce the
total number off links needed, or standing up virtual routers to act as
exchanges.

Hopefully we will be doing another post with more technical details and some
benchmarks!

------
2ion
> Abhi: Hi team! On the level of our Kubernetes operator, what do you think
> was the hardest challenge in making multi-cloud databases work?

Multi-cloud is a network problem. Ask anybody who knows what they're doing: is
it the best idea to have dependencies over WAN? No. Can it be a solution to a
problem? Yes, but what's your problem? PlanetScale might have a case if their
product _sells.

_ Then* only come the platform problems.

------
alex_young
This is an advertisement

------
jspaetzel
Square peg, round hole.

