Hacker News new | comments | show | ask | jobs | submit login

> rolling your own infrastructure just means you have to build systems even better than the providers

As a cloud provider, though, you're trying to provide shared resources to a group of clients. A company rolling their own system doesn't have to share, and they can optimise specifically for their own requirements. You don't get these luxuries, and it's reasonable to expect a customised system to perform better than a general one.




As a counter argument: very few teams at Google run on dedicated machines. Those that do are enormous, both in the scale of their infrastructure and in their team sizes. I'm not saying always go with a cloud provider, I'm reiterating that you'd better be certain you need to.


Interesting, presumably they're very well informed and they obviously feel that Google's cloud offerings are the best way to go.

If I may ask a few questions:

- Are they charged the same as external customers or do they get a 'wholesale' rate?

- As internal clients, do they run under the same conditions as external clients? Or is there a shared internal server pool that they use?

- Do they get any say in the hardware or low-level configuration of the systems they use? (ie. if someone needs ultra low latency or more storage, can they just ask Joe down the hall for a machine on a more lightly loaded network, or with bunch more RAM, for the week?)

- Do they have the same type of performance constraints as the ones encountered by gitlab?

I feel like most of the reason to use cloud services is when you have little idea what your actual requirements are, and need the ability to scale fast in both directions. Once you're established and have a fairly predictable workload, it makes more sense to move your hosting in-house.


The Google teams that he's referring to probably don't run on Google Cloud Platform, and rather run on Google's internal infrastructure that GCP is built upon. So most of your questions may not apply. However, his points about cloud infrastructure are still valid.


If you're right, then Google teams using internal Google server infrastructure is literally Google rolling their own.


He technically only said they don't run on dedicated machines, not that they run on GCP. My guess would be Google has some sort of internal system that probably uses a bunch of the same software but is technically not GCP.


This is mostly speculation base don having read this http://shop.oreilly.com/product/0636920041528.do

Hopefully someone who actually knows what theyre talking about will be along shortly!

> Are they charged the same as external customers or do they get a 'wholesale' rate?

Id be quite surprised if internal customers are charged a markup. Presumably the whole point in operating an internal service is that you lower the cost as much as possible for your internal customers.

> As internal clients, do they run under the same conditions as external clients? Or is there a shared internal server pool that they use?

From the above book, it seems that the hardware is largely abstracted away so most services aren't really aware of servers. I assume there's some separation between internal and external customers, but at a guess that'd largely be because of the external facing services being forks of existing internal tools that have been untangled from other internal services.

> Do they get any say in the hardware or low-level configuration of the systems they use? (ie. if someone needs ultra low latency or more storage, can they just ask Joe down the hall for a machine on a more lightly loaded network, or with bunch more RAM, for the week?)

As above, the hardware is largely abstracted away. From memory, teams usually say "we think we need ~x hrs of cpu/day, y Gbps of network,..." then there's some very clever scheduling that goes on to fit all the services on to the available hardware. There's a really good chapter on this in the above book.

> Do they have the same type of performance constraints as the ones encountered by gitlab?

Presumably it depends entirely on the software being written.


But some workloads get all the priority while others get zero/idle priority. Not true in public cloud.


Multitenancy is a large part of what makes public cloud providers profitable, but they all understand the need to isolate customer resources as much as possible.


Using a resource abstraction layer such as mesos can alleviate this downside by consolidating many of your workloads onto a pool of large dedicated machines.


In the end it doesn't really say that they rolled their own on prem solution. For the kind of money they were forking out in the cloud you could just buy a Netapp or Isilon and get something that provides enough consistent storage performance. You don't need a distributed FS for the kinds of numbers they're looking at, using one is just a complicated way of working around the underlying limitations of cloud storage. In your own datacentre getting storage that works is pretty easy.


Most appliances are not geared towards many small random reads. And if you scale they start to be very expensive. And we would love to use an open source solution all our users can reuse.


I'm not an expert in Ceph but I've built many other storage solutions and typically where distributed filesystems fall down in performance is with small files. Even something like an Isilon can get into trouble with those kinds of workloads. The files are too small to be striped across multiple nodes and there's a lot of metadata overhead. Monolithic systems tend to do better with small files but even then you can run into trouble at the protocol (NFS) level with the metadata.

Disk systems do get expensive at scale but the scale that they're usually sold at these days is pretty huge. You talk about going up to a petabyte but that's a fraction of a single rack's worth of disk these days. Not everyone wants to be a filesystem expert and distributed filesystems are jumping in at the deep end.


You are right regarding many small files. Interestingly reading from many small files didn't seem to be so much of a problem with CephFS as it was to keep a large file open while reading and writing to it from thousands of processes (the legacy authorized_keys file).

Clearly CephFS as weak spots, but for what I've seen those are sports that we can work out, rough edges here and there. The good thing is that we are much more aware of these edges.

We are already working on what the next step will be to soften out these weaknesses so we are not impacted again. And of course to ship this to all our customers, either they run on CephFS, NFS appliances, local disks or whatever makes sense for them.


FYI. The kernel Ceph client has local fscache cache. I added it to the kernel :) https://lwn.net/Articles/563146/

We started using Ceph because we wanted to be able to grow our storage and compute independently. While it worked well for us we ended up having much larger latencies is as a result of this. So we developed FSCache support.

Even better yet, if you data is inherently shareable (or has some kind of locational affinity) you can end up always serving data with a Ceph backend from the local cache with the exception of a server going down or occasional request. I'm guessing it is (repo / account)

On your API machines serving out the git content of out the DFS you can setup a local SSD drive to read only caching. Depending on you workload you can end up significantly reducing the IOPs on the OSDs and also lowering network bandwidth.

With the network / IOPs savings we've decided to run our CephFS backed by Erasure Coded pool. Now we have lower cost of storage (1.7x vs 3x replication) and better reliability because now with our EC profile we can lose 5 chunks before data loss instead of 2 like before. That's because we more the 90% of requests are handled with local data and there's a long tail of old data that rarely accessed.

If you're going to give it a try, make sure you're using a recentish kernel such as a late 3.x series (or 4+). That has all the Cephfs FSCache / and upstream FSCache kinks work out. If you're using relatively recent kernel such late 3.x series or 4+ (as in ubuntu 16.04).


Thanks mtanski, this is great data.

We are running a recent kernel as in ubuntu 16.04.

The reason I'm framing the caching not so much at the CephFS level is because we are shipping a product, and I don't think that all our customers will be running CephFS on their infra. Therefore we will need to optimize for that use case also, and not only focus on what we do at GitLab.com.

Thanks for sharing! Will surely take a look at this.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: