Hacker News new | past | comments | ask | show | jobs | submit login

So Gitlab is in a bit of a strange position here. Sticking to a traditional filesystem interface (distributed under the hood) seems stupid at first. Surely there are better technical solutions.

However, considering they make money out of private installs of gitlab, it makes sense to keep gitlab.com as an eat-your-own-dog-food-at-scale environment. Necessary for them to keep experience with large installs of gitlab. If one of their customers that run on-prem has performance issues they can't just say: gitlab.com uses a totally different architecture so you're on your own. They need gitlab.com to be as close as possible to the standard product.

Pivotal does the same thing with Pivotal Web Services, their public Cloud Foundry solution. All of their money is made in Pivotal Cloud Foundry (private installs).

From a business perspective, private installs are a way of distributed computing. Pretty clever, and good way of minimizing risk.

It's not just that. The cloud is far, far more expensive than bare metal, and that is under completely optimal financial conditions for the cloud providers (extremely low, even 0%, interest rates for the companies owning them, hence required return is only a few percent)

Dedicated servers, and colocation are going to be far cheaper than the cloud, and worse, the savings directly related to the size of the infrastructure you need.

That, combined with the fact that even the very best of virtualization on shared resources still kills 20-30% of performance.

So there's 3 things you can use the cloud for:

1) your company is totally fucked for IT management, and the cloud beats local because of incompetence (this is very common). And you're unwilling or unable to fix this. Or "your company doesn't focus on IT and never will" in MBA speak.

2) wildly swinging resource loads, where the peak capacity is only needed for 1-5% of time at most.

3) you expect to have vastly increasing resource requirements in a short time, and you're willing to pay a 5x-20x premium over dedi/colo to achieve that

The thing I don't understand is that cloud has both a lower limit (cloud is (far) more expensive than web hosting, and having a VPS) that is an extremely common case, and far more expensive once you go over a certain capacity (doesn't matter which one, CPU, Network, Disk, ... all are far more expensive in the cloud). Even if you have wildly varying loads there's an upper limit to the resource needs where cloud becomes more expensive.

The thing I don't understand is why so many people are doing this. I ran a mysql-in-the-cloud instance on Amazon for years, with a 300 Mb database, serving 10-50 qps to serve as a backend to a website, and a reporting server that ran on-premise. Cost ? 105-150 euros per month. We could have easily done that either locally or on a VPS or dedicated server for a tenth of that cost.

Cloud moves a capital cost into an operational cost. This can be a boon or a disaster depending on your situation. You want to run an experiment that may or may not pan out ? Off the cloud you'll have spare capacity that you can use but don't really have to pay for. On the cloud cost controls will mean you can't use extra resources. You can't loan money from the bank ? The cloud(but also dedi providers) can still get you capacity, essentially allowing you to use their bank credit for a huge premium.

I think you're missing some use cases here. Cloud can be really helpful for prototyping systems that you may not even want running in a few months.

Another use case would be where infrastructure costs are minor compared to dev and ops staff costs. If hosting on AWS makes your ops team 2x as productive at a 30% infrastructure markup that can be a steal.

I was thinking this too while reading the above comment. For starting up and prototypes, the aspect of a managed host far outweighs and initial performance losses. As companies scale, we see them move to their own hardware (Stackoverflow, Gitlab, etc.)

It's best to design your stuff so it can easily go from hosted solutions (this "cloud" bullshit term people keep using) to something you manage yourself. Docker containers are a great solution to this.

If you setup some ansible or puppet scripts to create a docker cluster (using mesos/marathon, kubernets, docker swarm, etc) and built it in a hosted data center; it's not going to take a whole lot of effort to provision real machines and run that same stack on dedicated hardware.

I don't think stack overflow ever used cloud providers for their main compute and storage needs, just periphery stuff like cloudflare.

Look at those cost factors. If you have an ops team (ie. 2+ people) it is a near certainty that cloud is more expensive. And if you absolutely do not need an ops team, VPS is going to beat the cloud by a huge margin.

2 people doing ops work cost 7-8k $ per month. Let's assume each of them is managing at least 5x their own cost in infrastructure spend, ie 35k+ $/month. That easily buys you 20-30 extremely high spec dedicated machines, if necessary all around the world, with unlimited bandwidth. On the cloud it wouldn't buy you 5 high spec machines with zero bandwidth, and zero disk.

Let's compare. Amazon EC2 m3.2xlarge (not a spectacularly high end config I might add, 8vCPU, 30Gig ram, 160G disk, ZERO network) costs $23k per month. So this budget would buy you 2 of those. Using reserved instances you can halve that cost, so up to about 4, maybe 5 machines.

Now compare softlayer dedicated (far from the cheapest provider), most expensive machine they got: $1439/month. Quad cpu (32 cores), 120G ram, 1Tb SSD, 500G network included (and more network is about 10% of the price amazon charges for the same). For that budget it gets you 25 of these beasts (in any of 20+ datacenters around the globe). On a low cost provider, like Hetzner, Leaseweb or OVH you can quadruple that. That's how big the difference is.

It used to be the case that Amazon would have more geographic reach than dedicated servers, but that has long since ceased to be true.

There is a spot in the middle where it makes sense, let's say from $100+ to maybe $10k where cloud does work. And you are right that it lets a smaller team do more. But there's 2 things to keep in mind : higher base cost that rises far faster when you expand compared to dedicated or colo. This is not a good trade to make.

Your math is off by 60x for AWS and 100x for GCE.

An m3.2xlarge is $.532/hr or $388/month, not $23k/month [1]. A similar instance on GCE (n1-standard-8) is $204/month with our sustained use discount, and then you need to add 160 GB of PD-SSD at another $27/month (so $231 total) [2].

Disclosure: I work on Google Cloud, but this is just a factual response.

[1] https://aws.amazon.com/ec2/pricing/on-demand/ [2] https://cloud.google.com/compute/pricing

EC2 m3.2xlarge can be had for $153/month as well when purchased as a reserved instance.

EC2 reserved instances offer a substantial discount over on-demand pricing. The all-up-front price for a 3-year reservation for m3.2xlarge would be an amortized monthly rate of $153/month, which is a 61% saving vs. the on-demand price of $388/month, according to the EC2 reserved instances pricing page.

Granted, using this capacity type requires some confidence in one's need for it over that period of time, since RIs are purchased up front for 1 or 3 year terms. But RIs can also be changed using RI modification, or by using Convertible RIs, and can be resold on a secondary marketplace. As a tradeoff in comparison to GCE's automatic sustained use discount, the EC2 RI discount requires deliberate and up-front action.

For the saas business I run, Cronitor, aws costs have consistently stayed around 10% total MRR. I think there are a lot of small and medium sized businesses who realize a similar level of economic utility.

Edit: I do see in another comment you concede the value of the cloud for ppl spending under $10k a month.

One use case where cloud shines is managed databases. Having a continuously backed up and regularly patched DB with the promise of flexible scalability is definitely worth it.


1) Your needs are very static, or

2) Your IT department can competently replicate the PaaS experience on its in-house metal (common big tech company strategy)

The cloud is likely to do wonders for velocity, as when you have a new use case, you can "just" spin up a new VM and run the company Puppet on it within an hour or so, vs. wait weeks to months for a purchase order, shipping, installation at the colo facility, etc.

If your IT department is doing Mesos or Kubernetes or something with a decent self-service control panel for developers, then you get the best of both worlds, but you also have to build and maintain that.

A risk is of course that your public environment is so much bigger than your biggest private install, that it doesn't make sense anymore to keep the same technology for both tiers. I think both PWS and gitlab.com suffer from this.

Does Gitlab support some kind of federation between instances?

If so, they could (in theory) split Gitlab.com into a bunch shards which as a whole match the properties of N% of enterprise users. That'd be a pretty cool way to avoid the different-in-scale problem (although you might still run into novel problems as you're now the Gitlab instance with the most shards..).

this is what Salesforce does - they have publicly visible shards in the url (na11, na14, eu23, etc) and login.salesforce.com redirects you to your shard when it figures out who you are.

For anyone considering this, reconsider. It bit me many times.

In general, over time, load on some shards will increase while others decrease. Migrating a customer from one shard to another will likley cause a short outage for them, and many bugs down the line when they've bookmarked all kinds of things.

You need to preserve the name<->customer association and maintain another key that you can use to split traffic at the LB in case a customer outgrows their shard. But personally I think it looks hokey and should not be something a customer sees anywhere but perhaps a developer tool or sniffer.

could that not be resolved with a reverse proxy or load balancer trickery? ie; hide the shard name externally.

> However, considering they make money out of private installs of gitlab, it makes sense to keep gitlab.com as an eat-your-own-dog-food-at-scale environment.

Depends on how they bill.

If they bill on a few private installs while giving unlimited storage & projects for free on the cloud, they're setting up themselves for bankruptcy.

Gotta take care of the accounting. Having an unsustainable pricing is a classic mistake of web companies.

well they can probably include a better storage layer.

at the moment you can just setup multiple mount points, but I guess it would be superior to actually setup a moint point or object storage. I'm pretty sure that you can put git into something like s3.

s3fs? ;)

A) I think it would be kind of difficult to store git on S3 (the magic would be in the caching / consistency layer to keep it performant)

B) it would make sense for their customers that run on AWS. However, many of them just run on a single VM / physical server.

well I just said s3 since it's the best known, but I guess it would be feasible to run on object storage. I heard that libgit2 can actually change his storage layer..

btw here is the alibaba approach: http://de.slideshare.net/MinqiPan/how-we-scaled-git-lab-for-...

Edit: and I heard github uses gitrpc.

Indeed this is a big factor. We want our customers and users to be able to use open source to host their repositories. That is one of the reasons to use Ceph instead of a proprietary storage solution such as Nutanix or NetApp.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact