Why we are not leaving the cloud (gitlab.com)
67 points by happy-go-lucky 2 hours ago | 20 comments





It's interesting that the first half of the explanation is largely quotes from a prior HN post's responses.

Does anyone else feel a bit weird seeing off hand comments getting quoted in the explanation for a business decision? I guess we should all get more accustomed to our public input carrying weight in the zeitgeist.

Gitlab's "develop in the open" nature really shows through here. I am not saying that's bad, it's just so different than most startups and established businesses.

I don't think they were using this as justification or explanation in their eventual reversal. I think they provided these quotes as interesting tidbits. Kind of like pull-out quotes on an article. You can ignore them entirely and still get the gist of the post.

On a more general note: It's got to be incredibly hard to do what GitLab does with their extreme transparency. I feel like we have to be careful about reading too deeply into things and nitpicking their culture or process. HN is full of "expert" advice, much of it being terrible. They weighed their options, invited feedback, then made a decision.

I appreciate what GitLab does in being so transparent. None of us are owed explanations or insight into how they operate, yet they go out of their way to provide it. Kudos to syste and his team!

This echochamber has a profound and mostly unwarranted impact on people, myself included.

A bit disappointed in the article, as the bulk of it is simply quoting various viewpoints, some of which disagree with each other. I'd be much more interested in the thought process that occurred for them to reach the conclusion they did. What internal doubts did they raise? What possible solutions to those doubts did they consider? Things like that.

The reasoning behind their decision is linked near the top of the article, under "Sid and the team decided": https://gitlab.com/gitlab-com/infrastructure/issues/727#note...

Gitlab is hosted on Azure and like many startups received six figures worth of free credits to be in the cloud.

As the credits run out the architecture reverts to the natural state - architectures tend to be the derivative of organizational structure, code base, and processes of a company.

Through this lens these decisions start making more sense. It is a reactionary process. Sometimes when you start small you bounce around like that and are lucky enough to grow for years and years until you hit a wall.

Where is the original thought? There were few hard numbers. There was way too much quoted material and no Gitlab thoughts on it. It looks like they made the post based on feelings instead of a metric like time or dollars.

HN is great way to get opinion about a decision. Although sometimes the criticism is way more than the positivism and support.

Post about "We use X and is great" will usually attract lots of "X is bad, use Y" comments.

I really hope that GitLab took a well informed decision based on arguments and counter-arguments.

Why was hybrid not considered? With providers like megaport providing really inexpensive direct connect, this is almost a no brainer.

This may seem harsh, but relying on random commenters shows a huge flaw in how you guys went about this.

Physical environments DO work well, but they do require experience to run them. I have used AWS for as long as its been around, but nothing beats physical environments for "known workloads", as long as you run them EXACTLY like you would run a cloud environment. This is why the hybrid approach is such a great thing. Run what you know well in physical and reap great savings, run what you dont know well cloud ( as well as take advantage of the analytics products), and reap the speed ;)

- This means thinking of physical servers are individual units. - This means redundancy at every level. - This means architecting for failure ( servers and switches do gie ). - This means no shared storage for performance critical parts ( shared storage as below the OS level ). - This means objects stores/sharding/etc as your storage layer. - This means real engineering - And this means exactly the same whether you're physical or not.

I've managed environments of 50 vm's and environments of 50000 physical servers. The methodology is always the same.

and yes, this means you can save some monthly cost, and apply it to staff, that can do a lot more than just maintain this infrastructure.

PS: for those that think that showing up to a datacenter is required, you're doing it wrong. Pretty much any datacenter has hot-hands service, and with the right redundancy, hardware replacement is something you can do at a slower pace.

PPS: im sure someone will nitpick some of my points. The reality is, there is real money savings here. For example, Snapchat spends MORE in cloud infrastructure in 2016/2017 per year, than ALL OF GOOGLE did in 2012... think about that for a second... even netflix runs openconnect to push bits..

Buried towards the bottom is the interesting decision to dump CephFS and go with NFS instead.

I find that interesting as well. Furthermore, looking into deploying containers with cheap cloud hosters , I was curious how to share data across various nodes by looking at how Google and Amazon do it. Turns out it's still NFS. I've never looked into networked FS before, but I thought they'd move past NFS to something more advanced...and newer I suppose.

Fundamentally not much different, though. N shards holding the data, you have to know which shard is which and maintain connections to it. Split brains could happen if that sharding is dynamic etc.

Nitpick for the author, you put twaleson for my quote, should be jtwaleson.

Whoops, sorry. Opened a merge request to fix that.

https://gitlab.com/gitlab-com/www-gitlab-com/merge_requests/...

What about managed dedicated servers? You don't need to go all in full metal, and be responsible for networking and swapping disks and all that.

I know it's more than a handful of deployment recipes, but it's not like they don't manage their cloud instances either...

This article is a false dichotomy with a seemingly rushed decision.

Couldn't they have easily just drawn the opposite conclusion? I don't see a strong connection between the article's contents and its conclusion.

I'm really glad you guys decided to listen to the experienced commenters' advice and change your mind, rather than obstinately plowing on with what seems to be a bad decision.

Hmm. Nearly all the commenters seem to be "extreme", i.e. either full-cloud or full-bare-metal.

Why not go hybrid? Do your baseline load with bare-metal and your spike load with AWS, GCE or Azure Cloud.

Cloudbursting either a database server or their git file storage doesn't sound all that viable, and I'd expect that to be the things that matter/are the bottleneck for Gitlab?

For most use cases that will complicate the system. Keeping everything the same and simple is preferable.

