I expected this to be some micro-optimization of moving a thing from taking 10 s...

nemothekid · on Oct 26, 2020

I'm not sure this is complacency - this just seems like regular old tech debt. The build takes 40 minutes but everyone has other things to do and there is no time to tend to the debt. Then one day someone has some cycles and discovers a one line change fixes the underlying issue.

I'm sure many engineering projects have similar improvements that just get a ticket/issue opened and never revisited due to the mountain of other seemingly pressing issues. From IPO to the start of the year Pinterest stock price had been trending downwards - I'm sure there was more external pressure to increase profitability than to fix CI build times. The stock has completely turned around since COVID, so I'm sure that changes things

dataflow · on Oct 26, 2020

IMHO (from having addressed such CI issues personally on teams that otherwise wouldn't bother) it's likely due to other factors, like a lack of interest, being scared of breaking the build, not being terribly comfortable touching build scripts, or the inability to run scripts locally, than a genuine lack of time. The returns you can get can be ridiculously huge across the entire team compared to the hours you might spend, but I've found many people just aren't terribly interested in sitting down and digging into ugly scripts and pushing dozens of commits to figure out what might be slowing things down. And honestly, it's not exactly trivial to structure things in a way that's simultaneously both efficient and maintainable, especially if you're refactoring an existing system instead of starting from scratch, so that can be another turn-off.

MaulingMonkey · on Oct 26, 2020

For me the biggest issue is that CI is often siloed to hell and back.

Even when most of the rest of the engineering environment is fine, the build scripts and configuration often aren't under version control themselves, or are manually deployed - meaning any changes require access to carefully guarded server credentials. This may even be by design as a "security measure" - as if I didn't already have the ability to run arbitrary code on the build servers in question through unit tests etc. The gatekeepers in question are often an underfunded IT department that has too much on their plate already, and are underwhelmed by the idea of reviewing a bunch of changes to "legacy" code that they've somehow convinced themselves they'll rewrite "soon" that they don't directly benefit from anyways.

And I find I can rarely run the scripts locally. They're also often hideously locked in to a specific CI solution that I can't locally install without a ton of work on my part to figure out the mess of undocumented dependencies, and rife with edge cases that I can't easily imitate on my dev machines.

My preferred CI setups involve a single configuration file, checked into the same repository it's configuring CI for, that simply forwards to a low-dependencies script that works on dev machines. Getting there from an existing CI setup, however, can be quite the challenge.

Aloha · on Oct 26, 2020

Or just creeping build time over years, "its always taken a while, I guess it just takes longer now". You dont bother optimizing things until they cause you sufficient pain to optimize them.

scsilver · on Oct 26, 2020

I can totally see a situation where the engineers whp made the script are long gone, the new engineers are justifying their hiring by churning out features and trying not to break things, especially things they dont own and effect everyone, like ci/cd, and that annoying but manageable 40 minute wait, just gets put on the backlog, waiting for half a year until someone with just enough experience and frustration makes a push to management to dedicate a bit of time to diving into the issue.

rhizome · on Oct 26, 2020

My assumption is some or all of those more than people thinking it's "fine," that it's deficiencies more than complacencies.

zorked · on Oct 26, 2020

Yup, it's all about incentives alignment. If you get promoted for shipping a feature but you don't get promoted for saving 40 minutes of everybody's time every day you will get a lot of features, delivered slowly.

pojzon · on Oct 26, 2020

This is the kind of thinking I tried to sell in my corpo.. where cloning monorepo takes 30m and building this monstrosity takes 1.5h (first time). Got scolded by management for saying - speed of changes should be more important than “looking busy” delivering stuff.

fn1 · on Oct 26, 2020

> I wonder what 'institutional complacencies' we have. Problems we assume are unsolvable but are actually very trivial to solve.

I spend a lot of time optimizing builds, because the effect is a multiplicator for everything else in development.

But it is not an easy task. One issue with performance-monitoring is that you have to carefully plan your work, or you will sit around and wait for results a lot:

Try the build: 40 minutes. Maybe add profiling statements, because you forgot them: another 40 minutes. Change something and try it out: no change, 40 minutes. Find another optimization which decreases time locally and try it out: 39.5 minutes, because on the build-server that optimization does not work that well. etc.

You just spent 160 minutes and shaved 0.5 minutes off the build.

I'm not saying it's not worth it, but that line of work is not often rewarding.

On the flip-side I once took two hours to write a java-agent which caches File.exists for class-loading and managed to decrease local startup time by 500% because the corporate virus-scanner got active less often.

innagadadavida · on Oct 26, 2020

Considering the build host does this hundreds of times every day, a better solution would be to simply have a git repo cache locally, should be secure and reliable given git’s object store design?

Any simple wrappers for git that can do this transparently?

manojlds · on Oct 26, 2020

Build servers don't git clone everytime though. They do a git clean if needed followed a git fetch / git pull equivalent.

GoCD for example maintains a single copy of the repo on the server for every pipeline that refers to it and the agents have the repos that they work on checked out. Any local changes or untracked files are by default cleaned. There are settings to force reclone etc, but it's not the default.

robjan · on Oct 26, 2020

In many cases the build agent is a stateless container which is destroyed as soon as the build is finished. In cases like this the repo needs to be (shallow) cloned each time.

dagmx · on Oct 26, 2020

That depends very heavily on the build infrastructure being used however

NikhilVerma · on Oct 26, 2020

I doubt that they started off with a 40 mins delay. It probably crept slowly as the repo got bigger and no one noticed it because of the gentle gradient. And they didn't have the time/resources to look into it.

edoceo · on Oct 26, 2020

You're confusing full clone, which for a huge repo is OK to be long as the fix which was to specify one RefSpec so they don't clone the full repo in CI.

bluedino · on Oct 26, 2020

People probably did complain, but they were met with, "We're cloning a 20GB repo! It's not going to happen in an instant!"

raverbashing · on Oct 26, 2020

This is the real complacency

Did someone really think "well it takes 40min, what can you do about it?" and just left it as such?

I knew people who would have that mentality in companies that are not around anymore. Take it as you want.

Yes, git is hard, but you know, maybe someone else has a better idea, or you can check SO, etc. (I don't even know why they were adding the refspecs there)