Hacker News new | past | comments | ask | show | jobs | submit login
Staying on the path to high performing teams (2018) (lethain.com)
103 points by bsilvereagle on May 9, 2021 | hide | past | favorite | 60 comments



I have to agree with the other commenters about #1. Adding people adds team overhead, and that is killer. It’s actually fairly surprising how people still think of that as a solution, when there is so much prior art, showing it’s a lot less effective than you’d think (see Brooks, Fred -The Mythical Man-Month).

I encounter people all the time, that are inordinately proud of their infrastructure and process, and how that reduces team overhead, while allowing them to treat engineers like LEGO blocks. I worked for a Japanese company, and they actually make this work (sort of, and at a big price). It requires a fairly massive and inflexible structure (despite being labeled “agile,” it tends to be remarkably rigid).

There’s absolutely no substitute for a seasoned, experienced team that has worked together for years, and is secure in its cohesiveness. “Tribal Knowledge” is a dirty word in today’s business culture, but it is, hands down, the most efficient “team glue” in the world. It is also thousands of years old. No high-tech equipment required. Mammoth hunters worked as a team, and achieved big results. Military teams are as old as history, and that is all about commoditizing “tribal knowledge,” and investing in people, not just weapons and tactics.

When I read “add slack,” I was like “How does adding a gab tool improve innovation?”. Then I read that it meant adding downtime/flex goal stuff, and totally agree.


Having been a Navy officer myself, I cannot agree more about the role of experience and training in team formation. So many software companies just throw together a few people and expect them to be productive right off the bat but that is simply not how people work. Reorgs every six months actively destroy any sense of team formation too.

That said, the military enjoys a level of personnel lock in that few software orgs can dream of. I don't know many software engineer that would agree to a five-year contract with no option for early leaving.


> I don't know many software engineer that would agree to a five-year contract with no option for early leaving.

Which is why it's important to practice good, humane management. Good engineers can punch their own tickets. Need lots of carrot, and very little stick.

I kept high-functioning, smart, experienced, C++ image processing engineers together for decades on my team, despite mediocre pay, and a pretty challenging environment (no, I won't go into detail).

This required that I connect with each and every member of my team as a human, often on an equal basis, as opposed to as a "superior/inferior" basis. It also meant that I needed to learn about the drivers for each person, give them as much family time as they needed, back off from micromanaging, and find ways to help them to resolve differences in ways that strengthened the team.

As I have stated before, when they finally rolled up my team, after 27 years, the person with the least tenure had a decade.

In about a month, we'll be getting together for a cookout at one of my former employees' home. We haven't worked together in over 3 years, but we still share a bond.


The Mythical Man Month as I remember only argues that adding more people to a PROJECT is a killer and that adding more people to a team gives you non-linear productivity gains. Project being defined as a self contained piece of work with a scope and deadline. In the long term more people with a decent structure will give you more overall output although less efficiency. If that is worth it is a business decision.


Good point, but the OP was using the term "team," while diagramming what is clearly a project lifecycle.

I think that the article is a bit of a "mashup." You can't really have a team "falling behind," but a team can certainly be "innovating."


I saw his diagrams as referring to a stream of projects rather than a single project. Or a backlog of tickets if you look at it from an agile perspective. In that environment you can definitely have a team be falling behind in the sense of items going into the backlog faster than they are worked on. Each item is a project with a deadline that is being missed. However the team metric is not merely those individual existing projects but the aggregate of all current and future projects. A larger team will usually help with that last metric in the long term.


"When falling behind, the system fix is to hire more people until the team moves into treading water."

I actually disagree with this statement. Teams can be falling behind for a variety of reasons and throwing more people at the problem can often cause further issues especially if the domain / technology / code base / system ..... Is complex.

Onboarding people into a team is an incredibly costly exercise, to do it while a team is already struggling will often create a worse problem. You need to be sure that the reason the team is struggling is purely an hour's in the day and not due to other factors.


It's like every generation is determined to re-learn the Mythical Man Month - the hard way. You crack it open and boom, there is your anti-pattern right there, called "Brooke's Law".


MMM was making a different claim. “Adding people to a project that is late will make the project more late.”

This is taking about the best way to minimize growing pains, assuming you want to grow your org.

> To spread hiring equally across the teams in need, or to focus hiring on just one or two teams until their needs were fully staffed? That was the question.

It’s not claiming that adding people will make the current project ship faster. It’s claiming that focusing hiring people to underperforming teams will eventually buy them the slack they need to start fixing their issues and increase their velocity in the future, and that you get less total disruption by hiring many people into one team than by hiring one person onto many teams.


Completely agree - augmenting teams with extra people can have the opposite of the desired effect depending on the situation.

There's only so much work that can be parallelized efficiently before people start stepping on each others toes.

Even if you are able to split things into independently workable chunks, the lack of domain knowledge in a new team member will still often result in a net negative for the first few weeks/months


This is the only point in the article I disagreed with as well. My suggestion for the "falling behind" phase would be to buy a few days to allow the team to decompress from whatever shit coaster they are currently riding. Letting people actually unwind can help bring strategic focus and calm back into the fold.

Otherwise, this read like a documentary about my current company and product.


Agreed that this was my biggest disagreement with the article as well. You describe the solution to "falling behind" as giving the team a few days to decompress. I would create that decompression in a different way - by setting and sticking to priorities.

Ultimately, if you're falling behind, there's not enough capacity in the system, and so some things won't get done. Better to make that judgment explictly by setting priorities, rather than having it happen ad hoc.


> hour's in the day and not due to other factors.

I think the author’s framework is that most “other factors” would be fixed if the team had more slack. So increase the team’s head count, wait for them to gel, and then with the new slack they should be able to resolve the issues.

I agree that there could be other factors (eg a toxic employee, say a TL or manager that is holding everybody back, or an ill-defined mission/mandate). But this article is really just answering the question of where and in what order to add employees if you’re doing hiring, rather than giving you a reason to hire more people.


I think the unspoken part is the new manager do not have the clout to push back users/stakeholders, both due to being new and the team having already lost trust of its users/stakeholders.

Adding more people may not be the only option, but pushing back demands or changing the wider organizational dysfunction is probably not even on the table.


I really enjoyed An Elegant Puzzle (the book by the author, this article is there). The greatest thing about it is that it is opinionated. For example, it gives you the optimal number of developers in a team. You can disagree with it, but at least you have something concrete to disagree with. So many career advice sources I read discuss things in such a general manner that it's hard to take away anything concrete from them.


It's interesting that all of these points talk about the speed of taking items out of the backlog but not about the speed at which they are added. Any team can be made to fall behind by unrealistic management action.


There's a techniuqe i like, I think it's a Lean technique, anyways it's called 5 Why's. When ever I get to act as a filter on requirements from the business, I bust it out to tease out the root cause issues driving the solution. It's really good.

Essentially it boils down to asking Why as much as possible until you get to the original problem. From there the problem can be put in the backlog to be solved by the devs rather than the precooked solution.


The author has a complementary article that addresses that: https://lethain.com/limiting-wip/.


Exactly. When you are adding items to backlog faster than the team can process them, it is not obvious to me why "add more people to the team" should be the right solution.

Maybe there was a wrong decision made at the beginning of the project that needs to be fixed. Maybe the version 1.0 has too many features. Maybe the project could be split in two independent parts (which will still require hiring more people, but with less complex of internal communication). Maybe the basic infrastructure was not set up correctly (logging, unit tests, continuous integration) and there is no time to make it right because the visible parts of the application always get the top priority. Maybe the team members are working with a technology they never used before, and giving them a one-week training could increase their speed dramatically. Maybe there is some interpersonal conflict in the team, where things could be improved by removing a team member?

More importantly, why are you adding new items to the backlog so fast? Is your company doing "agile rituals, but with predetermined scope and deadlines"?

Let me guess: the answer "hire more people" is so popular, because it is a universal answer that does not require the manager to actually understand anything; and as a nice side effect, having more underlings increases the manager's importance.


As I understand it the assumption here that the team is already demonstrating high performance in terms of effort and what causes the team to "fall behind" is the sheer amount of necessary work that exceeds the team capacity. In that case hiring more is the only solution leading to making progress.


What a pile of rubbish. Adding people to a team already falling behind? Specifically new hires? By the time the new people are up to speed, the experienced ones will have finished burning out and left, and the cycle continues.

Oh, and a system fix is to add process? That's about as counterintuitive as it gets.

Especially disingenuous: "A friend is six months into supporting a sixty person engineering group"...you mean managing. Say what you mean, and don't wrap it in nonsense like that.


What works, in your experience?


If the team is falling behind, stop incoming work until they catch up.

If the team is treading water, stop incoming work until they start making progress.

If the team is repaying debt but not at the desired capacity, then add more people. In that case, the effort of training and integrating a new person is just another form of debt.

In all cases, if the team is overloaded in some way (debt rising, not delivering quality, missing deadlines) also add slack.


Have you seen that work anywhere, at any scale? What you’re describing is pretty much every single chronic underperforming team I’ve ever watched. It leads to an organization where everyone is essentially “blocked”, and since work tends to expand to take all available time, it’ll remain “blocked” forever (as you remove the main lever for having to unblock yourself - inbound work)


> Have you seen that work anywhere, at any scale?

Of course. It's the only logical and effective way to deal with the problem.

> It leads to an organization where everyone is essentially “blocked”

There's a big difference from being "blocked", and what the article called "falling behind":

>> "A team is falling behind if each week their backlog is longer than the week before. Typically folks are working extremely hard but not making much progress, morale is low, and your users are vocally dissatisfied."

So, you deliver one or two features, or consume a backlog item, but more work comes in, so there's net negative progress (as measured by the list of pending work items). So, you stop the incoming work so that the existing work can proceed to completion, the team can then figure out how to add capacity without falling behind again.

If the team is incapable of doing the work for some reason, that's a completely different problem.


> Of course. It's the only logical and effective way to deal with the problem.

...where, if you can share?


At the last two companies I worked at, before a big management shakeup at my now previous employer. That's when the Agile cultists came in, and started spouting drivel like was in the article.

If you want to add capacity, fine, but you can't do it to a team that's already overloaded. When would they have the time to select, train, and integrate new people into their flow, if they don't have any bandwidth? So you put "add capacity" to the top of the backlog, and clear work until it can be done.


You know what the path of high performance teams is? Not growing that fast. The children of Google, Facebook, and other Giants dispersed across the lands, begetting many an offspring, who worshipped their lord, Complexity, solving problems they don't have.

Don't solve problems you don't have. Most companies think they need microservices, for example. It is an extremely expensive proposition. Debugging anything becomes a nightmare and developer ergonomics are shot to hell. There is a reason why we stayed away from distributed services in the past. What, you think I could not have stitched together 20 different Python services on my machine in 2009 and put my productivity into paralysis? Don't fool yourself about things like Docker making things easier. It doesn't help people reason about distributed systems.

Instagram? Monolith. 12 people when sold to Facebook.

WhatsApp? Monolith. 32 people when sold to Facebook.

StackOverflow. Monolith, with a lite SQL ORM for highly optimized queries.

Look at that - all those things we did in 2004, still work like a charm. But heyo, go ahead and spend a few months learning Kubernetes while the lean companies laugh at you in the rear-view mirror.


So much this. Same goes with all the Kafka usage, making development literally Kafkaesque. The data eventually flows through your microservice landscape, being eventually consistent and performance issues become a whole different nightmare. Yea, maybe I just didn't read enough of the documentation but maybe your 1500 users per month web-app doesn't need 7 microservices and Kafka as single source of truth, all hold together by 8 developers.

It's astonishing how many teams want to use cool tech instead of create a great tool for users or how many teams think that technology will solve bad management.


So much this! As someone that regularly deals with data platforms and data engineering issues this nails it.

Today’s desire to make everything event driven and “real-time” boggles the mind. Solutions are contorted to fit the Kafka operating model.

The inevitable result is blog posts about how “we solved our self inflicted problems that would have never existed if we chose a sane design for our real problem. “

Oh and never forget that no senior stake holder ever really checks their info/dashboards in real time. Never mind actually making decisions in real time based on it.


Same! My dataengineering team wants everything to be moved from batch oriented to 'stream oriented'. We are replacing our batch workflows to flink/spark streaming. But no one has been able to answer me what business use cases need streaming or what if any problem is being solved by this added level of complexity.


Sometimes, streaming makes sense. When I was at a FAANG, we had a sampled streaming database with nice plots (kinda, ugly but functional).

This was incredibly useful as it allowed teams to figure out what had happened in prod without needing to wait for the next day's batch jobs.

OTOH, most companies don't need something this complicated, and I'd imagine you could do the same with MySQL/Postgres if you're not at FAANG scale.


I agree. When it is the right solution to the problem at hand there is nothing better than a good streaming solution.

It goes for pretty much all technology, that you need to choose the right tool for the job. For some reason the marketing just gets most people all worked up and ready to fit problems into technology that was never designed for their problems.


I migrated a unmaintainable over engineered java spring application consisting of around 20 micro services using Kafka all over the place to one single Rust application. A relief for anyone in the company.


One reason this can be doable is that working in the 20 microservice environment is so unproductive that the original app probably did very little.


Exactly. Making sure those services work in production, and the constant troubleshooting in local environments as well would easily be most of the developers' days.


Just you alone? Do others in the company know Rust?

I really like the idea of Rust I just worry about how things shake out In The Real World(tm).

I have no doubt Rust is here to stay for a while so I think it’s pretty solid play when appropriate.


What happened to the people who worked on those 20 microservices. What do they do now?


A few walked away. They did not agree with the move. But the company was really in the need for a radical change since development was stuck in complexity and maintainance.


Are you saying you rewrote not just the Spring application but all of the micro services as well?


I understood it as a distributed spring application.


Note the relationship between the size of the team and the ability to manage a monolith.

A team that grows larger quickly needs some way to manage the complexity within the codebase and the difficulty of knowing whom to talk to. In theory, it is possible to build a modular monolith with clear ownership boundaries. In practice, this requires strong technical leadership.


Kelsey Hightower, a core Kubernetes engineer, once said that microservices were created because teams didn't want to talk to each other. One problem - they still have to talk to each other.


Its a question of how much they need to talk together.

My current company has a code base which, if we want to avoid falling into technical bankruptcy, teams will need to spend 35 hours per week talking to other teams. The goal of well-defined interfaces is to reduce that to 2-5 hours per week.

Saying “microservices” is a concise yet oversimplified way to describe what is actually needed: well-defined modules with clearly-visible boundaries.


> Kelsey Hightower, a core Kubernetes engineer,

Not sure what you mean by 'core' but my impression was that he was more of an evangelist and a community manager.


My understanding of his role is vague, but my point is more that, even as an "evangelist" of K8S, he is not going to suggest that you use it because Google does.


Yup, papito is absolutely correct; microservices are just Conway's law. They're not for scaling the software, they're for scaling the organisation.


Which is why I wish there was a meaningful term like “pizza-sized services” as a way to say “these don’t need to be tiny, but they do need to be small enough that a 4-8 person team can handle the mental load of managing them.”


It always surprise me why startups want to grow in amount of people.

I see this picture quite often. There is a small team that managed to create a great product. Then they get investmens, grow in size, hired a few starts that worked before in FAANG like companies and now trying to adopt microservices. The initial team spirit is gone, people who worked there from the leaving


Sometimes it's done for smoke and mirrors more than anything else. There's suddenly a ton of money pushing in your door,so people decide to start adding random titles to the payroll,even though it's probably still those 5-10 guys that making it all tick.


Because they need to grow users and build a moat against competitors. If your product was made in 6 months by 5 engineers then a competitor can (and will) do the same. Users will generally want new features or capabilities or other such things. Boring but time consuming work not matter how well engineered your product is. This type of work is very different from a green field new product so it'd be odd if many people didn't leave during the shift.


I'm generally in your camp (having been through 3 monolith-to-microservices migrations), but to be fair, at least two of your examples are pretty massive outliers:

* Instagram barely had any features when it was acquired - the entire API was like 3-4 endpoints, and all the web version did was literally _show a single photo_

* Whatsapp was extremely brilliant in its implementation/choice of language (erlang) - they beat competition's headcount 10:1 for the same scale due to that, but hiring Erlang engineers is no easy feat

Facebook is also famous for its monolith approach, so I wouldn't necessarily put Google and Facebook on the same page when it comes to the CS Gods they worship (-:


One shouldn't use outliers and one off examples to guide one's behaviors. Most startups will fail miserably. Most of the ones that don't and do eventually succeed will grow over time to a decently large size before exiting. So you should expect that trajectory and not a lucky exit before you hit 50 employees.


True, but existence proofs are useful if you don’t extrapolate too much from them. Ie you _can_ be a unicorn with a monolith.

I think it’s not so useful to look at 30-person monoliths though, as you say. I’m sure there are 30-person startups that are jumping the gun and building microservice architectures? Don’t do that. First build the monolith.

But I think what you are getting at in the latter part, and I agree: the question is whether there are 100, or 1000 employee-company monoliths. I don’t know of any, I’d guess there are some examples, but I’d be comfortable wagering that they are much rarer the bigger you get.


Instagram is still a monolith. They started breaking out some high-load things for sure, but only once it makes data-driven sense.

https://instagram-engineering.com/static-analysis-at-scale-a...


> you _can_ be a unicorn with a monolith

Even if another company can do it...that does not mean that you can. Know thyself.

Jocko Willink can lead a team effectively on 5 hours of sleep a night. If I try to do that, I lose my grip on the fabric reality and the meanings of words.

Know thyself.


I agree, I think what is unsaid, but assumed, in the article is that the owners of the business want to maximize profit. Other types of organizations (for example hackers organizing around Linux kernel development) do not share that goal and don't have this type of problem.


> Instagram? Monolith. 12 people when sold to Facebook.

So your metric for succes is acquisition size, not good software design?


Companies sell working product and large, active user bases - not "good software design", a metric subjective at best. What you may call good design, I may call "unnecessary and wasteful over-engineering".


Yes, I was planning to edit my comment to add a better description of what I meant (and move from writing on mobile, to typing on my computer), but then I completely forgot about it.

So yes, agreed: 'good software' is absolutely subjective.

My point about 'good software' was about creating non-bloated, modular software - software that is well documented and easy for others to collaborate on, contribute to, or take over.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: