However, considering they make money out of private installs of gitlab, it makes sense to keep gitlab.com as an eat-your-own-dog-food-at-scale environment. Necessary for them to keep experience with large installs of gitlab. If one of their customers that run on-prem has performance issues they can't just say: gitlab.com uses a totally different architecture so you're on your own. They need gitlab.com to be as close as possible to the standard product.
Pivotal does the same thing with Pivotal Web Services, their public Cloud Foundry solution. All of their money is made in Pivotal Cloud Foundry (private installs).
From a business perspective, private installs are a way of distributed computing. Pretty clever, and good way of minimizing risk.
Dedicated servers, and colocation are going to be far cheaper than the cloud, and worse, the savings directly related to the size of the infrastructure you need.
That, combined with the fact that even the very best of virtualization on shared resources still kills 20-30% of performance.
So there's 3 things you can use the cloud for:
1) your company is totally fucked for IT management, and the cloud beats local because of incompetence (this is very common). And you're unwilling or unable to fix this. Or "your company doesn't focus on IT and never will" in MBA speak.
2) wildly swinging resource loads, where the peak capacity is only needed for 1-5% of time at most.
3) you expect to have vastly increasing resource requirements in a short time, and you're willing to pay a 5x-20x premium over dedi/colo to achieve that
The thing I don't understand is that cloud has both a lower limit (cloud is (far) more expensive than web hosting, and having a VPS) that is an extremely common case, and far more expensive once you go over a certain capacity (doesn't matter which one, CPU, Network, Disk, ... all are far more expensive in the cloud). Even if you have wildly varying loads there's an upper limit to the resource needs where cloud becomes more expensive.
The thing I don't understand is why so many people are doing this. I ran a mysql-in-the-cloud instance on Amazon for years, with a 300 Mb database, serving 10-50 qps to serve as a backend to a website, and a reporting server that ran on-premise. Cost ? 105-150 euros per month. We could have easily done that either locally or on a VPS or dedicated server for a tenth of that cost.
Cloud moves a capital cost into an operational cost. This can be a boon or a disaster depending on your situation. You want to run an experiment that may or may not pan out ? Off the cloud you'll have spare capacity that you can use but don't really have to pay for. On the cloud cost controls will mean you can't use extra resources. You can't loan money from the bank ? The cloud(but also dedi providers) can still get you capacity, essentially allowing you to use their bank credit for a huge premium.
Another use case would be where infrastructure costs are minor compared to dev and ops staff costs. If hosting on AWS makes your ops team 2x as productive at a 30% infrastructure markup that can be a steal.
It's best to design your stuff so it can easily go from hosted solutions (this "cloud" bullshit term people keep using) to something you manage yourself. Docker containers are a great solution to this.
If you setup some ansible or puppet scripts to create a docker cluster (using mesos/marathon, kubernets, docker swarm, etc) and built it in a hosted data center; it's not going to take a whole lot of effort to provision real machines and run that same stack on dedicated hardware.
2 people doing ops work cost 7-8k $ per month. Let's assume each of them is managing at least 5x their own cost in infrastructure spend, ie 35k+ $/month. That easily buys you 20-30 extremely high spec dedicated machines, if necessary all around the world, with unlimited bandwidth. On the cloud it wouldn't buy you 5 high spec machines with zero bandwidth, and zero disk.
Let's compare. Amazon EC2 m3.2xlarge (not a spectacularly high end config I might add, 8vCPU, 30Gig ram, 160G disk, ZERO network) costs $23k per month. So this budget would buy you 2 of those. Using reserved instances you can halve that cost, so up to about 4, maybe 5 machines.
Now compare softlayer dedicated (far from the cheapest provider), most expensive machine they got: $1439/month. Quad cpu (32 cores), 120G ram, 1Tb SSD, 500G network included (and more network is about 10% of the price amazon charges for the same). For that budget it gets you 25 of these beasts (in any of 20+ datacenters around the globe). On a low cost provider, like Hetzner, Leaseweb or OVH you can quadruple that. That's how big the difference is.
It used to be the case that Amazon would have more geographic reach than dedicated servers, but that has long since ceased to be true.
There is a spot in the middle where it makes sense, let's say from $100+ to maybe $10k where cloud does work. And you are right that it lets a smaller team do more. But there's 2 things to keep in mind : higher base cost that rises far faster when you expand compared to dedicated or colo. This is not a good trade to make.
An m3.2xlarge is $.532/hr or $388/month, not $23k/month . A similar instance on GCE (n1-standard-8) is $204/month with our sustained use discount, and then you need to add 160 GB of PD-SSD at another $27/month (so $231 total) .
Disclosure: I work on Google Cloud, but this is just a factual response.
EC2 reserved instances offer a substantial discount over on-demand pricing. The all-up-front price for a 3-year reservation for m3.2xlarge would be an amortized monthly rate of $153/month, which is a 61% saving vs. the on-demand price of $388/month, according to the EC2 reserved instances pricing page.
Granted, using this capacity type requires some confidence in one's need for it over that period of time, since RIs are purchased up front for 1 or 3 year terms. But RIs can also be changed using RI modification, or by using Convertible RIs, and can be resold on a secondary marketplace. As a tradeoff in comparison to GCE's automatic sustained use discount, the EC2 RI discount requires deliberate and up-front action.
Edit: I do see in another comment you concede the value of the cloud for ppl spending under $10k a month.
1) Your needs are very static, or
2) Your IT department can competently replicate the PaaS experience on its in-house metal (common big tech company strategy)
The cloud is likely to do wonders for velocity, as when you have a new use case, you can "just" spin up a new VM and run the company Puppet on it within an hour or so, vs. wait weeks to months for a purchase order, shipping, installation at the colo facility, etc.
If your IT department is doing Mesos or Kubernetes or something with a decent self-service control panel for developers, then you get the best of both worlds, but you also have to build and maintain that.
If so, they could (in theory) split Gitlab.com into a bunch shards which as a whole match the properties of N% of enterprise users. That'd be a pretty cool way to avoid the different-in-scale problem (although you might still run into novel problems as you're now the Gitlab instance with the most shards..).
In general, over time, load on some shards will increase while others decrease. Migrating a customer from one shard to another will likley cause a short outage for them, and many bugs down the line when they've bookmarked all kinds of things.
Depends on how they bill.
If they bill on a few private installs while giving unlimited storage & projects for free on the cloud, they're setting up themselves for bankruptcy.
Gotta take care of the accounting. Having an unsustainable pricing is a classic mistake of web companies.
at the moment you can just setup multiple mount points, but I guess it would be superior to actually setup a moint point or object storage. I'm pretty sure that you can put git into something like s3.
A) I think it would be kind of difficult to store git on S3 (the magic would be in the caching / consistency layer to keep it performant)
B) it would make sense for their customers that run on AWS. However, many of them just run on a single VM / physical server.
btw here is the alibaba approach: http://de.slideshare.net/MinqiPan/how-we-scaled-git-lab-for-...
Edit: and I heard github uses gitrpc.