Hacker News new | comments | show | ask | jobs | submit login

It sounds like they didn't design for the cloud and are now experiencing the consequences. The cloud has different tradeoffs and performance characteristics from a datacenter. If you plan for that, it's great. Your software will be antifragile as a result. If you assume the characteristics of a datacenter, you're likely to run into problems.

This got me curious again about the pluggable storage backends in Git(I assume AWS code commit is using something like this). I've looked at Azures blob storage API in the past and found it incredibly flexible..

Here is an article from a few years ago: http://blog.deveo.com/your-git-repository-in-a-database-plug...

In any case, GitLab is amazing and I can see how it's tempting to believe that GitLab the omnibus package is the core product. However, HOSTED GitLab's core product is GitLab as a SERVICE. That might require designs tailored a bit more for the cloud than simply operating a yoooge fs and calling it a day.

Gitlab is limited by the fact that they need their hosted product to run the same code as their on-prem product in order to avoid forking the codebase.

If on-prem customers can't get AWS/GCE/Azure SuperFastCloudStorageâ„¢, then it can't be part of their codebase.

Exactly, we want our users to be able to scale with us using open source technologies. Some are at 20k+ users so they are close to needing something like Ceph themselves.

From looking at the comments here, thinking about what I would do/want, and what others have done to scale git GitLab is in the minority of wanting to solve this issue with a Ceph cluster.

> Some are at 20k+ users so they are close to needing something like Ceph

Or they will scale it themselves ala Alibaba: http://www.slideshare.net/MinqiPan/how-we-scaled-git-lab-for... . They appear to have written a libgit2 backend for their object store(among other things).

I don't see a good reason why solutions using different storage backends could not make it into the OSS project. Many companies run their own Swift cluster, which is OSS.

If you're using CephFS and everyone else wants to be using other Cloud storage solutions, that would actually put you at a disconnect with your users and leave room for a competitor with the tools and experience to scale out on Cloud storage to come in offering support. I would at least consider all the opinions in this thread and maybe reach out to that Minqi Pan fellow from Alibaba with questions..

I actually really like GitLab and wish we could be using it at my company; this is why I'm spending so much effort on this topic(and scaling git is interesting). Hopefully my opinions are not out of place.

Thanks, we're in touch with Minqi Pan I think we all agree the Ceph solution is great if we can make it work.

Can you go more into the difference in the tradeoffs and how one should design differently?

A blunt summary would be that everything is unreliable and disposable. You have to design for failure, because most things will almost certainly fail (or just slow down) at some point.

I learned a lot when I was first moving to the cloud from Adrian Cockroft. He has a ton of material out there from Netflix's move to the cloud. I recommend googling around for his conference talks. (I haven't watched these, but they're probably relevant: http://perfcap.blogspot.com/2012/03/cloud-architecture-tutor...)

I'm trying to understand 'antifragile'. Are you trying to say: 'robust'? If not, what is the difference?

There's a whole book[0] about it. But to summarise (poorly):

'robust' - resilient against shocks, volatility and stressors

'antifragile' - thrives on shocks, volatility and stressors (ie. gets better in response)

Antifragile is a step beyond robust. Examples of antifragility are evolutionary systems and true free market economies (as opposed to our too-big-to fail version of propped-up, overly interconnected capitalism).

0: https://en.wikipedia.org/wiki/Antifragile

Can you provide any examples of working, real-world antifragile systems which were designed and built by humans and accomplish their purposes? Preferably in terms of software and hardware?

So far you've named "evolutionary systems", which are quite fragile in the real world, and an imaginary thing called a "true free market economy".

Taleb's book often came back to the example of the relatively free market that is a city's restaurants. Any one restaurant is fragile, but the entirety of the restaurant business in a city is antifragile - failure of the worst makes the marketplace better.

Just trying to define it. I'm not advocating for anything.

That said, if you are calling the entire multibillion year phenomenon of life on this planet "fragile" then we are not going to get on well.

Netflix with their chaos monkeys is a great and relevant example.

But their system doesn't thrive on chaos monkey. It's just resilient to it.

In this case, the anti-fragile system is the entire system, including Netflix engineers and the cloud over time. The cloud is stressed, maybe even goes down, but in response, becomes stronger and more reliable because engineers make changes.

Point-in-time human-engineered systems still can't really be anti-fragile, except perhaps in some weird corner cases, but the system as a whole with the humans included, over time, can be.

It should also be pointed out that "anti-fragile" was always intended to be a name for things that already exist, and to provide a word that we can use as a cognitive handle for thinking about these matters, not a "discovery" of a "new system" or something. There are many anti-fragile systems; in fact it's pretty hard to have a long-term successful production project of any kind in software without some anti-fragility in the system. (But I've seen some fragile projects klunk along for quite a while before someone finally came in and did whatever was managerially necessary to get someone to address root causes.)

Ah, true when you include the engineers in the loop I suppose. But then that becomes a vague term for any system where engineers fix problems after some load/failure testing.

When I think of anti fragile systems, truly adaptive algorithms come to mind that learn from a failure. For example, an algorithm that changes the leader in a global leader election system based on the time of day because one geographic region of the network is always busier depending on time of day and latency to the leader impacts performance.

Yes. It is stronger because it is attacked by it.

Isn't that bad because now you're depending on having stressors and shocks to have the best performance?

If you can't control the amount of stressors and shocks, you want a system that is neither antifragile or fragile, but strictly indifferent to the level of shocks.

Good questions. It is a very interesting subject. If you haven't read the book, I recommend it. I think you will find it interesting if you read it with an open mind.

I don't think antifragile works in the parent comment, robust would be better.

As for a definition, antifragile things get stronger from abuse--like harming muslces during a workout so they grow back stronger. If they were just robust, it would be like machinery (no healing / strengthening).

I suppose I had Netflix in mind...as part of their move to the cloud, they developed chaos monkeys to actively attack their own software to make sure it is resilient to the failures that will inevitably, but less frequently, happen as a result of operating in the cloud.

Stop using the term, you don't understand what it means. Chaos monkey is a form of testing. Testing is not antifragility.

Chaos monkey attacks their production environment. They must make their software stronger/more resilient/less fragile in response. It would be more helpful if you clarified how you think that's different from antifragility.

Antifragility describes a system that becomes more resilient automatically, because it is built such that it must, in response to attack or damage. Such a system doesn't require management. It doesn't require people to actively test, to think about how to make the system better, to implement the fixes.

I think that's an overly narrow definition of the term antifragility and interpretation of the system in this case.

While I can imagine software that is in itself antifragile, I think it's entirely reasonable to include the totality of the people and process that make up an operational system, in which case even your narrow definition applies here.

If you are including the developers in the system you're calling antifragile, then Chaos Monkey is also part of the system. Antifragility refers to a system that benefits from attacks or disorder from outside itself.

You are watering down the word so much that it wouldn't have reason to exist. Is every chair-making process antifragile since chairs get stress tested before being sold?

Chaos Monkey is just a tool to accelerate/simulate the disorder inherent to the cloud. The original point was that the cloud is unstable and hostile, so software designed for it benefits from that disorder. Granted, it's not doing so all on its own, but exhibits the effect nonetheless, and I think Taleb's book is full of similarly impure examples.

There's a world of difference between stress testing before something is sold or released and welcoming ongoing hostility throughout its lifecycle, and this difference is absolutely in line with the concept of antifragility.

They designed for a small operations with limited resources. That might be fair given their budget/funding which we don't know.

At the moment, the discussion in the GitHub issues looks like people who are buying servers to put and run in their garage ^^

Planning for the cloud doesn't make your software antifragile. antifragile != robust

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact