Hacker News new | past | comments | ask | show | jobs | submit login
Don’t Do This in Production (stephenmann.io)
696 points by pattrn 8 months ago | hide | past | web | favorite | 284 comments

I want to underline this bit:

> they hired a team of developers without having a technical person on staff to vet them

For any non-technical entrepreneurs reading this: please don't do this. If you aren't competent to hire developers, please borrow or rent a few trusted technical experts and use them as a hiring committee. Otherwise you are less likely to hire the best technologist, and more likely to get the most glib and appealing technologist.

Over the years I have met far too many smooth-talking consultants and would-be employees. And I've seen them cause enormous garbage fires when managers hire beyond their ability to evaluate. The size of these garbage fires are a direct result of what got them hired: if someone is good at telling managers what they want to hear, they can go a very long time saying, "Yup, we're building it and it will be great! Just you wait and see!"

And then, no matter who you hire, demand a "ship early and often" schedule. Ship to internal users. Ship to alpha users. Ship to external testers. Heck, ship to just person for just one narrow use case. The earlier you can start seeing real-world success or failure, the earlier you can course correct.

I was going to call out this quote : If you read this far hoping for an answer, then I’m sorry: I don’t have a simple one. This is a difficult problem to solve.

And yet there is an answer, it is hire an experienced engineer. And the author glosses over the problem by not realizing they "solved" the problem by hiring him.

It is the bad calculus where one "experienced" engineer is going to cost you maybe $150K - $200K and yet you could get anywhere from 3 to 10 junior engineers for that depending on how far out you go.

It would be awesome of there were some sort of certification for "experience" sort of equivalent to the guild master status, it isn't an education thing it is a "oh yeah I've seen that before, lets not do that here's why" kind of thing.

Projects fail for all kinds of reasons. Usually many reasons at the same time.

While I am sure that _some_ fail because of the abject inexperience of the staff running them, this is hardly ever the complete story. Moreover, projects do fail even with a fully experienced, fully vetted staff running the show.

The article just seems like a consultant marketing his acumen. There's nothing wrong with that. But keep in mind, he dropped in over a weekend and spent a few hours detailing problems that needed fixing and the staff subsequently made corrections and released on time.

It could easily go the other way too. The "inexperienced" team could have instead brought in another seasoned consultant who, after a few hours, got the wrong idea and led them down the wrong path precipitating an even larger slippage of ship date. THAT HAPPENS TOO and I suspect it is even more common.

I don't really agree. Sure, many projects fail for reasons other than developer experience but in many years of many more projects, developer skill was the most frequent reason.

Of course, that's probably more true of my environment (third world country, enterprise software) than others.

> Moreover, projects do fail even with a fully experienced, fully vetted staff running the show.

Sure - but in that case they usually don't fail because of technical decisions (which is what OP was talking about). There are however many other ways a project can fail (bad management mostly).

If you aren't technical yourself, how do you separate an experienced engineer from one who, well, isn't? This was the fundamental problem that the company failed to solve, and frankly I don't think there's any easy solution around it.

Use referrals.

Ultimately, if you cannot identify critical staff and don’t know anyone who can, you’re screwed anyway.

Exactly. Somebody running a company can't be an expert in everything. But they can be a good judge of people and have a wide enough network that they can get recommendations from people they trust.

A few of those trusted technologists will make a fine hiring/oversight committee. You won't need a ton of their time, so you can pay very generously for the insurance they provide.

Agree with this. Same applies to any expert: how do you identify an experienced marketer? Or accountant?

To hire someone with expertise to solve a problem in a domain where you aren't an expert: You find people who claim to understand the domain, and then discuss the problem with them in depth. Your goal is not to evaluate them in general but to understand how they analyze the problem and how they'd deal with it.

You may have to hire them as consultants. You have to dedicate a day or more to discussion with each of them -- maybe considerably more if your problem is complex. You have to reveal information about your plans and/or the condition of your systems that may make you anxious or embarrassed.

After you've gone over the problem separately with five or ten putative experts, you will have a pretty good idea which of them understand it (if any), and you will also know how to ask much better questions going forward.

If necessary repeat this process, using your better understanding to find better experts and gain a better map of the territory.

Once you can rank expertise in your domain reliably, you might hire one (or more) of the known experts, but the best ones will be hardest to hire. Or you may use your new meta-expertise in the domain to look for more junior hires who have an unusually good grasp of your issues. Also, you may be able to get the best of your experts to act as advisors.

This takes a lot of time and may cost some money depending on the domain. But not doing it will take more time and cost more money in the medium to long term (and even in the short term sometimes).

AKA become expert.

I wonder if the people who become fantastically successful simply have the ability to evaluate expertise, without having any of it. i.e. by "reading" people, in some sense.

This relies on the "expert"'s own evaluation of their expertise, which mightn't be accurate, because of Dunning–Kruger, or conversely, self-doubt. So, one would need to "read" those too.

You definitely can't trust people's self-evaluation either way. Lots of really skilled people are self-critical. "Reading people" of course helps but it is nowhere near enough.

But you don't have to become an expert. If I hire a plumber or carpenter, I cannot do what they can. But with enough work I can understand which plumbers or carpenters will do a good job.

... or climate change scientist? Or economist? Or president?

The problem of separating experts vs non-experts might be the single biggest problem society faces right now.

assuming you have a generally good education and brain to start with, a lot of hard work can usually accomplish it.

Think Elon Musk learning about rockets, back before he lost his mind.

I did it once for climate change, took a couple months, some math. Got there though.

Before he lost his mind? You mean the recent interview or before that?

This is the problem that credentials supposedly solve - degrees, certifications, professional body membership etc.

You don't realize the difficulty of the problem until you must hire someone outside your own areas of expertise.

Depends on the magnitude of the problem to be solved.

Small problem: Ask friends for someone they have worked with on a similar problem.

Large problem (building an org): Ask 5 VCs/CEOs/VIPs for intros to the best person they know in that role, under the explicit direction of “learning what good looks like.” Interview all 5 with open ended questions. Ask each for one for one more contact. Eventually you will learn to pattern match, and one of the folks you like will volunteer themselves or a friend, and you will know how to interview.

I remember somebody on this forum saying the best way to judge someone's technical prowess is to give a simple explanation of a complicated part of their field. If you can't summarize your knowledge you don't know what you are talking about.

Unfortunately, a non-expert can't tell the difference between a good simple explanation and glib, confident ignorance. So I think that's a good test for expertise, but it's not one that a non-expert can fully judge.

References and portfolio, at the end of the day.

The orange juice test is one person's answer:


Hire a recruiting company. There are recruiting companies that have a staff of full time consultants that will bring in a whole development team and manage the product.

At some point, you want to bring that knowledge in house, the recruiting company will then help you hire people or let you convert some of thier contractors (not thier full time people) to fulltime.

We have a lot of recruiters sending us "senior full stack engineers" that dont knoe the language they are most confident in and end up being dev-ops engineers who want to get into coding. A lot of recruiting firms dont know how to to get candidates either

I am not referring to recruiting companies that just send developers to a company. I am referring to recruiting companies that have full time staffed developers and project managers that manage the project for you in the beginning and then slowly help you build up your own staff.

Some will have a combination of onsite developers, “rural source” developers who are in a cheaper part of the US, contractors and outsourced developers but they manage the entire project.

I was offered such a position - the pay is above market but it required too much travel for me now.

Now how do you separate out a recruitment company who has solid experience engineers on board from one that doesn't but says they do? I've seen many of the latter and many founders who thought they were getting the former in the deal.

If you are starting a company, I would hope that you at least know what your requirements are and you could sit down with the project managers to hash out acceptance criteria and deliverables.

And then you find that the acceptance criteria aren't met, the deliverables don't work, and your ship date is hosed. What then?

How is that a different risk than every other project that ever existed - even those led by internal teams?

Preferably for an internal project you have a trusted delegation chain that would catch the issue early on.

Two scenarios:

Scenario #1 - you are a founder with a great idea, you hire a good software dev lead who then fills out the department. You work with the Dev lead to hash out your plan They organize it and after certain checkpoints, the dev team demos progress. As the founder, you are at the top of the chain making sure that there is a product market fit, and ensuring things are going as planned from a high level.

Scenario #2. You contract with a consulting company. Your project is led by a dev lead/project manager paid by the consulting company and they fill out the team using thier own staff.

They organize the team and after certain checkpoints, the dev team demos progress. As the founder, you are at the top of the chain making sure that there is a product market fit, and ensuring things are going as planned from a high level.

There is no difference in the delegation chain or the risks involved.

The main difference is that you are writing one check to the consulting company and they are paying the contributors and in the other case, you are paying employees individually.

The usual retort is that employees have more loyalty and ownership in the company. No they don’t. The employees at your company, if they are smart, have one eye toward thier next job just like the consultanting company has one eye toward thier next contract.

The goal of the recruiting company is to get candidates hired. That may or may not conflict with the ultimate goal of the client doing the hiring, which is to get the job done.

I have this problem internally at my company. Recruiting have crazy goals (hire 30 developers per month) so they push us to accept whatever they can get.

Sure, we can keep refusing the candidates but at some point you have to accept someone to get the job done. Even if you need to double check everything they do.

The reasons for that are too complex for a small post but believe me, there are many.

The post you are replying to said:

Hire a recruiting company. There are recruiting companies that have a staff of full time consultants that will bring in a whole development team and manage the product.

You are replying to a scenario that is just the opposite of my suggestion. The client company is not hiring developers at all. They are contracting out with the consulting company to manage the entire project. The founder is working with a project manager who is employed with the consulting company to develop a product. The recruiting company is doing all of the project management, development, QA, etc. The founder is hopefully coming up with acceptance criteria for the product since it was his idea.

After the project gets off the ground, then the recruiting/consulting company works with the client to develop in house expertise starting with finding a dev manager who hopefully understands the business and the technical and the dev manager works with recruiters to staff up.

You’re not hiring anyone at first. You’re basically outsourcing the entire development project to an outside agency. If you are a non technical founder, this may be the best way to go.

There is a local recruiting/consulting company that has been trying to get me to come work for them directly full time in the role of team lead, I just can’t do the travel right now.

I guess the assumption is that anyone or at least an inexperienced engineer would be able to point out an experienced engineer, while only experienced engineers are any good at picking less experienced engineers.

Deloitte has pretty good technical interviews. I'm sure they can provide you interviews as a service. Probably not going to be cheap, though.

Surely there is a big clue in how long they have been working for, how many projects they have done, what their roles were.

Based on my experience interviewing over the past several years, the companies that have a 'production' system seem to only value experience with their tech stack.

That’s because they have set something up and ironed out all the bugs so it will take some convincing to move them to a different stack.

There’s no easy way around it. In fact I’ve seen some teams too eager to rip up existing production workflows every time some new technology becomes popular and that’s not productive either.

I'm not saying that they should change their stack, I'm saying that they should consider that someone having extensive experience with related technologies probably knows the domain well and can come up to speed quickly on the team's shit.

That is sort of a tautology solution. Hire someone who can solve the problem and you won't have the problem.

But how do you know when you have done that? Experience alone doesn't necessarily mean they can solve your problem.

Right, I think that's being missed in this thread.

Experience alone doesn't necessarily mean they can solve the problem and ship/write good code. I've worked with plenty of "senior" engineers that have 20+ years experience yet have never dealt with any serious issues. These usually work at companies where a small team of "ninjas" handles those sorts of things, and the run of the mill Dilbert coder shows up and does his thing. This is not a knock on those people. They are important in their own way, truly. But I would not want them to be the responsible party for steering the ship in a new app.

Define experienced engineer. I've seen many on-paper experienced engineers who would be worse than a junior engineer on a project.

>oh yeah I've seen that before, lets not do that here's why

Which can actually be a pretty bad way of going about it imho if you only use your own direct experiences for that. Just like science, basing your decisions on a limited number of anecdotes is problematic because small biased subsets are not representative of the whole.

You have to make a decision based on something. Experience isn’t a bad heuristic if evaluated correctly: this is why in most serious technical interviews today, you will often get asked about your past experiences.

>You have to make a decision based on something.

And like science you can base it on the collective experience versus your personal experience. Preferably from rigorous studies but we mostly lack those in CS. So hopefully you're reading books, blog posts, studies, conference presentations, discussing with colleagues, etc.

>this is why in most serious technical interviews today, you will often get asked about your past experiences.

Interestingly the top companies seem to be focusing a lot less on experience and a lot more on problem solving/system design/etc.

You bring up a good point. I do think interviewing for developers is not a field that has been studied using the scientific method, which is why we seem to have so many rules of thumbs, cargo cult traditions etc. rather than experiment-backed results. I think a few companies do study this as a science (e.g. Google) but most others are either unwilling or don't have enough resources (or some mix of the 2) to do the same.

> Interestingly the top companies seem to be focusing a lot less on experience and a lot more on problem solving/system design/etc.

As long as we're making conjectures: this is likely because a lot of them have tools and processes that are very different from what you see "in the wild". e.g. working at Google means you can leverage all their deployment infrastructure, something you likely didn't have access to outside (this is somewhat changing with k8s and cloud services).

Very few developers even trust themselves to vet the sorts of things this post is talking about. So our current level of in-industry interview training is going to make that a crapshoot.

The general reason given for non-representative algorithmic whiteboard coding problems is "you can't trust anybody's experience." But those questions are themselves not resulting in production-ready code, so you're flying blind on "can figure out a quick way to solve a problem" vs "knows what to do to turn a quick solution into something that will last."

And then there's a time-based calculation you have to make: hire fast and hope they'll learn on the fly, or slow down and delay until you find people you hope won't have to. You're hoping either way, after all. :)

> hire an experienced engineer.

But "experienced" and "cultural fit" are mutually exclusive for most startups.

Experienced will tell you the refuse to release without X Y and Z because if B happens your server will crash.

Junior wont even know the problem and hack it together.

But hey, thats what reddit /r/askprogramming is for... right?

As an inexperienced developer I'd just like to add that if you're going to have inexperienced developers.... JUST hiring an experienced developer is not an automatic solution.

You need someone who can bring other developers along, teach with purpose, and etc. Experience developing alone does NOT automatically provide that.

There are experienced developers who are ok, some are amazing, and some who for all their skills straight up can't lead / help other developers for any number of reasons. For those who can't / don't want to that doesn't make them bad or anything, just not suited for helping others.

> You need someone who can bring other developers along, teach with purpose, and etc. Experience developing alone does NOT automatically provide that.

And the bloke needs time to train. If the developer is getting chased from dumpster to forest fire and back he won't be able to pass on knowledge and best practices.

I'm a good example of that. I've been in software for about 12 years now and am only just getting started in the mentoring/training type roles. And that's only because I super had to...

How do you deal with someone who gets to a problem and cant figure it out and then just drops it without telling anyone or saying anything, and you dont figure it out for days or weeks?

You usually set up a system where you have constant feedback, like daily stand ups, weekly ( or bi-weekly) sprints etc. before closing out an issue have the devs demo the feature so it’s not based purely on word of mouth.

Personally: I would advice getting lunch with them and getting to know them better. Set up a 1:1 if lunch isn’t possible. Everyone likes honest feedback, or at least a chance to talk about their difficulties in getting tasks done.

> you dont figure it out for days or weeks?

If it was important and estimates on how long it will take are overrun, then don't wait to follow up on it and find out how it's going.

Unless the problem is more nuanced like a small component of a larger body of work? Like knowingly leaving flaws in an implementation, or something.

This sounds too simple, but you just don't let people go days without checking in.

I wrote about once simple process here: http://williampietri.com/writing/2014/simple-planning-for-st...

The key points to keep things on the rails: 1) it's a work queue for the whole team, and the whole team is responsible for every work item, 2) the units of work are small, so they should generally be finished in a day or two, and 3) you to quick daily checkins to make sure work is moving along.

If you add pair programming with frequent pair rotation (e.g., 2-3x/day) then you make it impossible for a person to get stuck for long periods. Even if two people are stuck for a couple of hours, when pair rotation happens then the pair can ask somebody with specific experience to tag in.

You'll want to find out what they are stuck on and help or show them how to remove the initial impediment (they might be waiting on someone else or having trouble breaking down the problem into smaller workable chunks.)

Exactly. Not all technical mentoring is about software engineering; many of the problems junior employees face are regarding soft skills that they need to get their job done. While you might understand what to do if you're blocked internally, or if you're waiting on third party software, or if you get feedback that you don't understand on a code review, these are - new - problems to a lot of people getting started in the workplace, that many try and fail to solve on their own.

That sounds like a manager type situation if you have devs just not doing things and not telling anyone, rather than mentoring.... that's pretty terrible.

> You need someone who can bring other developers along, teach with purpose, and etc. Experience developing alone does NOT automatically provide that.

Not to solve this problem they don't.

And how do you find the trusted technical experts? By using your trusted technical expert expert?

In my experience, it's quite common to have longer standing connections or friendships with people who won't be willing to join a particular startup but who might help with specific issues.

By not going in to a business that requires development unless you are one, or know one.

I wouldn't assume I could start a restaurant or run a medical practice, why do so many people with no technical background assume that they can start a business centering around software?

Because lots of people that haven't run a restaurant have successfully started a restaurant, and people who were in the restaurant business started a restaurant, but failed to keep it going.

History is littered with people who ignored your rule here and have been successful.

I guess we can flip that around: why do so many people with no business background assume that they can start a business?

First of all, if you have never worked in a restaurant in any capacity (chef, waiter, manager, host, whatever), I would say the chances of starting a successful restaurant are much, much lower than someone with some experience. And it’s easy to understand why, as even a short one year stint in a well run restaurant is probably enough to pick up industry best practices (that are centuries in the making) and make a handful of connections.

If you want to start a company which makes software, actually knowing about software is going to do wonders for your chances of success.

As for your flipped question, “business” is not a background. Other than perhaps Math, English, and common sense, there is no knowledge that is so abstract that it can be usefully applied to any kind of business.

Restaurants have an infamously high failure rate and low margins, so I wouldn't use that example if I was arguing for your position.

Unlike other people responding, I agree that people without a business background struggle when starting a business. There are a lot of developers who start a business and have no clue about pricing or marketing or sales.[0] They have a high failure rate.

If you know one side, you may succeed if you're a quick learner and/or your core idea is good enough, but you're at a disadvantage compared to an individual or team with knowledge of tech and business.

[0] Note that these are all things a developer could learn without a business degree, just like I learned to program without a CS degree.

> Because lots of people that haven't run a restaurant have successfully started a restaurant

[citation needed]

Do you have some examples? And do you have data supporting "lots"? Because I find that highly implausible.

I believe you've made a false equivalence when comparing software development to the management of a restaurant. Most people have basic knowledge of food preparation and distribution. They do not, however, have an understanding of the software development cycle.

There are a big difference between having limited knowledge of food preparation and actually running a restaurant without giving your customers food poisoning. The comparison is sound. Maybe even perfect, proven by your comment. Know enough to be humble or enough to be dangerous? How long can I keep that rice pudding on the buffé table before I have to throw it away? How do you make sure the food is actually kept fresh in the cooler overnight? The inspection protocolls makes for interesting reading. Especially for new restaurants.

Prior to the FSMA (Food Safety Modernization Act) [1] there used to be very little regulation in terms of food safety in the US. Since the comment was suggesting that throughout history people have successfully opened restaurants without any related experience, I find it relevant to note that selling food was not always so difficult: if someone had the wealth, they could hire some chefs and wait for customers.

Nowadays, though, there are dozens of different dynamics involved in the management of restaurants. As such, it is very difficult for someone who has both a culinary and business background to open a successful food shop. To believe that someone who does not possess such an expertise could do well in this arena is foolish.

Managing restaurants has always been difficult, but it is becoming harder with each passing year. Nobody in their right mind would recommend a complete amateur to open one of these shops given the current economy. They would need some sort of experience in the food industry before even attempting the feat.

[1] - https://en.wikipedia.org/wiki/FDA_Food_Safety_Modernization_...

why do so many people with no business background assume that they can start a business?

Because starting a business, in and of itself, is trivial. Children do it with lemonade stands. It's "the thing that the business does" that is hard.

This is not a chicken and egg problem. At least not for large organizations. If you read this line: The department that built the product had recently come into existence, and they hired a team of developers without having a technical person on staff to vet them.

This is clearly a management failure. If you start a new department you should at least try to seed it with experienced employees from other parts of your company, even if they're on a loan.

Talk to the most technical person you know. Ask them for an intro to people they would most want to work with.

Repeat this process several times until you have someone who will interview for you.

I get joke, but possibly to others wondering... Everyone knows someone more technical, they can keep asking. Also, a lot of entrepreneurs know other entrepreneurs that might help with this. Either way, they can find technical websites to help, founderdating.com, etc.

I know people who are considered technical by shared non-technical friends - and I think they even are, to some extent. Still, some of those are still very inexperienced and I certainly wouldn't trust them to be able to properly vet new developers yet on their own.

I think to some extent, it's a matter of being lucky picking the right person to vet, since you can't vet that person yourself.

By trial and error. Seriously.

From a higher level maybe don’t start a software company. Just because I have an idea say on how to make a better oven I should not open one until I at least have some industry experience and knowledge first. In other words if your whole business model depends on a specific set of skills then you should at least have some knowledge or expert on your team as a starting point. If you start a plumbing business you don’t just open it up with zero plumbers on staff and no experience in plumbing. But for some reason this is fairly common in software...

I once worked at a company with no product management experience. Because of this they hired a string of toxic Steve Jobs wannabees because they simply didn't know what they didn't know.

They lacked experience in many areas. Ironically they ended up firing their only experienced executive (CTO). It's too bad, they had passionate engineers who could have built some amazing things (as evidenced imo by the amazing things they later built for Netflix/Google/Facebook/Amazon/etc) but they were stuck at a company that sort of wanted to be a tech company but didn't want any of those annoying "IT people" involved in decisions.

Your last phrase is key: They really didn't want the experts, they just wanted the results. No one can hire good people if they think they already know better. Good people got good by picking good learning situations, and avoiding tarpits.

I couldn’t agree more. I just wanted to also add that this is something that you have to always keep in mind for every growing org, not just new ones. I’ve seen situations where Management has decided to pursue new product directions by hiring a whole new team, but without looping in their existing tech teams. It unfortunately had the same expected result.

I doubt that strategy would have been of interest to this organisation, due to this part:

> they had never released a production application before.

It's entirely understandable to end up with a poor-performing team if you start out without any in-house technical knowledge. But how did they end up with a team comprised entirely of people without any experience at all? That to me indicates that, rather than being unable to identify good candidates, they intentionally tried to cut corners. Telling an organisation like this how to identify better candidates isn't going to make a blind bit of difference because they aren't interested in better, they are interested in cheaper.

"they aren't interested in better, they are interested in cheaper"

Or maybe they suck at sourcing candidates, but don't know it. So they hired the top 1% of their candidate pool, without knowing what good looks like.

Even people with zero technical knowledge aren't likely to hire a team of people with zero experience accidentally.

Yeah, there is some "law" here that I can't quite formulate about how developers can't recognize skills above their own level.

This makes it impossible for someone with skill N to hire people with skill N+1, other than by dumb luck.

I think people can recognize those at level N+1, but not (reliably) level N+2 and beyond. In fact they are likely to think that people much above level N+1 are "crazy".

Current political discourse contains lots of examples of people at level N+1 (or emulating that) convincing lots of people at level N that all the people at level N+k, k>1 are dumb, crazy or evil.

Political discourse has very little to do with IQ and very much to do with Persuasion and hallucination bubbles.

Yes but... the level you are at is not just or mainly a function of IQ. I cannot evaluate the quality of a structural engineer who is much more skilled than me. Their cool new structural ideas could be brilliant or just terrible; I can't tell.

A key issue here is how one responds to not knowing how good those ideas are. Dangerous responses: "I can't tell so they must be crazy" or "I can't tell so they must be brilliant." Accepting that one doesn't know mitigates the risk.

I like this quite a lot!

At every level of development when I really started getting serious about my career 10 years ago. I knew what I needed to know to get to the next level. That’s how I chose companies to work for. It would be really easy to tell if someone had more experience in something I had a passing knowledge about.

10 years later, I still know what I need to learn to get to the next level.

> there is some "law" here that I can't quite formulate about how developers can't recognize skills above their own level.

Perhaps you're trying to reformulate/restate the "Blub paradox" [1]?

[1] http://www.paulgraham.com/avg.html

I don’t know. Is that really a law? I feel like I’m relatively confidently able to pick out people above and below my level by now.

In short, don't let your ego overrule your logic. Too often I also see people who consciously or perhaps subconsciously prefer to hire folks who are either less knowledgeable than them or who agree with them on everything. It makes them feel more at ease but the project will suffer for it.

I do like your idea of shipping early and often so you can see the progress and actual results. If I had a dollar for every "we completed 80%" but not being able to demo the product excuse, I'd be so rich that I'd be buying Jeff and Bill dinners. :)

It’s super easy to get experts to vet candidates. Then again it’s also super easy to take a used car to get it checked out, but nobody does that either.

> Then again it’s also super easy to take a used car to get it checked out, but nobody does that either.

Neither of these assertions is true. The second one is easily falsified (even not being pedantic about "nobody"), as evidenced by the existence of "used car inspection" on the price lists of at least some mechanics, if anecdata is not compelling [1].

The first assertion is, in its falsity, makes for an excellent analogy to the topic at hand, which is evaluating developers.

Even a legitimately trustworthy, expert evaluator will not be albe to provide an evaluation that will reliably predict real-world performance over the next several years. This is true for interviewing candidates and for used cars. What one hopes for from the evaluation, is to significantly reduce risk by filtering out obvious (for ones definition) dealbreakers.

[1] Personally, I only buy used cars (and typically 8+ years 75k+ miles). The logistics of getting them to my mechanic for an inspection is the least easy part of the process, when it's a non-dealer seller.

I am surprised. Is it hard to get a car inspected in the US?

In France, every used car sale must be accompanied by a recent inspection certificate. Go to any inspection center and they will do it easily for a fixed fee.

Note that inspection centers can only offer inspections, no repair of any sort, to avoid the obvious conflict of interest.

> Is it hard to get a car inspected in the US?

That depends on the definition of "inspected".

> recent inspection certificate

What information is provided? Are there any details, or just that it passed inspection successfully?

Details that are important to me are generally proxies for abuse or poor maintenance habits (e.g. cheap aftermarket parts), though sometimes they're merely an indication of what maintenance has or hasn't been done, in the absence of records (e.g. signs of age/crack on rubber parts).

> Go to any inspection center

This, already, makes it useless for me. I prefer a certain subset of car manufacturers, and I need an expert in the repair of that make of vehicle to adequately reduce my risk to tolerable levels. At the very least, as the buyer, I need to be able to choose the inspector, to avoid the obvious conflict of interest.

> easily for a fixed fee

This sounds remarkably like California's requirement for a recent "smog" certificate prior to a used car sale. It's a easy enough, but some sellers still don't do it. Other than avoiding a minor hassle and cost, it offers almost no benefit to the buyer.

> only offer inspections, no repair of any sort, to avoid the obvious conflict of interest.

This conflict would seem to exist only because it's chosen and paid for by the seller. As a buyer, I very much want the inspection to be done by a mechanic who is experienced in actually maintaining and repairing that particular vehicle.

There are hundreds of details that are checked for. If any is not in good conditions, it's written down. If it's any important, you have to fix it and come back for a counter visit.

Cracks would most certainly be part of it.

You can choose where you do the inspection. There are a lot of choices, it's like repair shops, but specialized into inspection. It's done by a real mechanics and it's certified.

> If any is not in good conditions, it's written down.

Having thoght about my own process, I realized that, even if all the relevant details are exhaustively documented in writing (which would be a huge waste of time/money), what's valuable to me is being able to talk to my mechanic about them.

The written report is just a summary/highlights for me (and the seller) of topics that are either directly of concern, or, as I mentioned, proxy indicators for issues that are impossible to spot with an inspection.

> If it's any important, you have to fix it and come back for a counter visit.

That is something I most definitely do not want. If there's any major maintenance or repair to be done, I trust my choice of mechanic more than the seller's.

I also very much don't want a forced repair to cover up how bad the situation had gotten previously. If the certification process doesn't include the full history, including failure/remediation, then it's a borderline scam.

> Cracks would most certainly be part of it.

And yet they aren't certain to be signal instead of noise. Context matters. Reading "surface cracks in timing belt and CV boots" is useless.

Knowing the difference between the timing belt merely showing signs of age appropriate for the age/odometer reading of the car (and that it can be replaced on schedule) versus being shockingly old and could snap at any moment, possibly bending the valves (which varies by engine design, so even that context is important), is a hugely valuable indicator.

> You can choose where you do the inspection. There are a lot of choices, it's like repair shops, but specialized into inspection.

I, the buyer, can? You've made it sound like it's the seller doing the choosing. Do sellers end up with a stack of cerificates from different buyers' choices?

> It's done by a real mechanics and it's certified.

At the risk of sounding like a No True Scotsman argument, I'm skeptical. If these shops are employing their mechanics full time, and they only perform inspections, they're not real mechanics, for my purposes.

Certification is meaningless, unless it provides the buyer with some kind of practical, monetary protection.

So, yes, useful inspection, that allows me to buy used cars that have a predictable long-term maintenance/repair cost of USD0.12/mile is hard (in the sense of tedious and time consuming). An inspection that merely prevents me from being cheated and buying a total piece of junk (in some large number of cases) may be easy, and, though not totally useless, may be close, if it doesn't improve my chances beyond random luck.

I think you have troubles understanding the concept because it's too remote from what you are accustomed too.

The seller has to get the car inspected. It's a standard inspection to detect many many problems. The buyer can certainly gets it re inspected if he likes too. The inspection is a legal document that's important in case of litigation. It provides protection to the buyer and the seller.

A car cannot be sold without an inspection certificate. I think the car is also forbidden from being on the road if it didn't pass an inspection in the last 2 or 3 years.

A timing belt that's shockingly hold and on the verge of breaking is a serious issue. The vehicle should not be allowed to be sold or driven in these conditions.

> I think you have troubles understanding the concept because it's too remote from what you are accustomed too.

Oh, I understand the concept. It's government regulation of behavior with the intent of protecting the public. I'm quite accustomed to a form of it, with California's emissions checks, as I mentioned.

I just don't believe it produces the desired (to me, the used car buyer) outcome and therefore don't want it [1]. I'd be interested in seeing data showing otherwise. Are long-term maintenance costs reduced by this system?

> The buyer can certainly gets it re inspected if he likes too.

I posit this is the inspection you need to be comparing the ease (and frequency) of, not the initial, seller-initiated one. How buyers in France actually bother? If it's close to zero, then it's just as hard (if not more so) to get a car inspected in France as in the US.

> The inspection is a legal document that's important in case of litigation.

Now that's something I do profess being entirely unaccustomed to: litigation over a used car. I don't think I even know someone who knows someone who has been involved in something like this.

I'm sure it's because US laws don't, in general, support it (and our courts, even "small claims" ones, wouldn't have the bandwidth for it).

I don't want it, anyway, as it could serve to drive up the average price of used cars or drive down their availability. I have a high risk-tolerance [2], so I favor a freer market.

> A car cannot be sold without an inspection certificate. I think the car is also forbidden from being on the road if it didn't pass an inspection in the last 2 or 3 years.

This is exactly like California's "smog", and like other states' "safety" inspections (which may now mostly be in the past).

> A timing belt that's shockingly hold and on the verge of breaking is a serious issue. The vehicle should not be allowed to be sold or driven in these conditions.

I strongly disagree. Although even I might personally choose to tow such a vehicle to the mechanic after purchase instead of driving it, I most certainly don't want the government forcing me to do so [3].

I most certainly don't want the seller being forced to cover up previously poor maintenance habits by replacing only the items that are visibly bad. That actually increases information assymmetry, in a way that cannot be alleviated with additional inspection by the buyer, which is, presumably, contrary to the point of any required and documented inspection. (You never answered if France's inspection certificate includes a full history of such failures.)

Although it doesn't guarantee the behavior, it certainly also encourages the pre-sale repair of only the required part and to use as cheap a replacement part as possible. To the extent that, as a buyer, I'm reimbursing the seller for the expense of that repair as part of the purchase price, it's usually money flushed down the toilet.

The example of timing belts on Hondas exemplifies my predicament. The water pump is driven by the timing belt. A water pump never lasts two timing belts. The vast majority of the cost of replacing either is labor, and the incremental labor cost to replace both at the same time (plus ancillary drive belts) is close to zero. Total cost is around $700 (and reliably lasts 75-120kmi depending on model). A timing belt (only) replacement might be $400, possibly even less with a cheap aftermarket part. The problem is, the water pump could fail immediately (likely enough if the timing belt was way overdue before replacement), and that repair is going to be around $600, plus towing costs and the cost of my time and inconvenience of a breakdown [4]. If there's an aftermarket timing belt, I'm just having the whole $700 replacement done immediately and not risking it, just as I would with an obviously-too-old belt. If neither of those signals is there, I'm left with the risk of not knowing. That risk exists regardless, but it's increased if the seller is routinely forced into this behavior.

[1] I was reading about Lisp's "social" problems on c2.com recently, and there was a mention of an attitude of "you only don't like it because you don't understand it." Such an attitude could be uncharitably condescending, but I don't intend to suggest such an attitude here, as it would be unchartitable on my part.

[2] There already exist options for a relatively wide spectrum of risk tolerances, including new cars (which are covered by "lemon laws" and have available extended warranties, a.k.a. mechanical breakdown insurance (MBI)), manufacturer-"certified" used cars (which, with some manufacturers, make them eligible for the same extended warranty/MBI as a new car), and merely not-as-old used cars with third-party MBI. I actually purchased and used MBI on my one American car (a full-sized van, so no Japanese, or even, at the time, European options), originally puchased new, and it worked out very well.

[3] You may detect a ("small L") libertarian political bias here, which I admit to, but I actually would agree with regulations prohibiting driving a vehicle maintenance problems that are immediate safety concerns. Tires with inadequate tread depth, for example, continuously affect the safety of the vehicle during its operation. The same can't be said for a timing belt that may fail.

[4] Oh yes, I forgot to mention, reliability is another goal besides my 12c/mi long-term maintenance. In my entire time of implementing this goal (i.e. after my first couple cars of my youth where my explicit strategy was to save money with minimal maintenance and by not repairing until a breakdown occurred), excluding the aformentioned American car, I've experienced only two mechanical failures that rendered the car undriveable. Interestingly, both were burst coolant hoses, and both were close enough to be emergency-repaired and driven to the mechanic without a tow. This is 20+ years and over a million miles.

You'll excuse me but your comment come off as if you don't trust the government and any regulation to achieve its goal. It's almost a satirical US view.

The regulations works fine here. It ensures that problems can be detected and acted upon. It's simple and it's much better than not having any inspections or inspections run by the repair shops.

For the documentation. The seller should provide the history with the cars, although there is no legal requirement for it. The few expensive tasks like the belt are definitely things to watch out for both parties. It's important to negotiate the price up or down, or avoid buying entirely.

I've had the case once of buying a car second hand, where the seller meticulously kept every visit and repair bill since he bought the car many years ago and it was exclusively done at the manufacturer repair shop.

> You'll excuse me but your comment come off as if you don't trust the government and any regulation to achieve its goal. It's almost a satirical US view.

Although it's true that I'm generally skeptical of this form of regulation, primarily because there's plenty of evidence of its ineffectiveness, in addition to plenty of undesirable unintended consequences, I'm not categorically against [1] regulation.

More importantly, I fear you've missed some nuance, in your haste to apply that uncharitable characterization.

Even if the regulation is successful in its goal, that goal still differs from my own.

In this situation, I want neither what (I think) the regulation purports to achieve, nor what it actually achieves.

> The regulations works fine here.

I'm sure you think that, because that's what you're accustomed to, if I may paraphrase your earlier comment.

> It's simple and it's much better than not having any inspections or inspections run by the repair shops.

You say this, but you don't back it up with credible evidence or even reasoning.

I'm not, of course, suggesting that prohibiting any inspection would be better, merely not requiring the "simple", mandated process you've described. I've explained why it's worse than buyer-chosen inspection.

You certainly haven't explained why inspections run by repair shops would be worse, other than alluding to a conflict of interest. As I mentioned, I only saw such a conflict with the seller, not me, the buyer.

> It ensures that problems can be detected and acted upon.

This is an example of a goal that I don't have. An arbitrary definition of "problem" on someone else's car I, at best, don't care about.

> The seller should provide the history with the cars, although there is no legal requirement for it.

If it's not included/mandated, then it's safe to assume it's not going to happen (often enough that it cannot be relied upon). "Should" doesn't enter into it. It means that the forced repair prior to certification is tantamount to a cover-up. This is an example of, at best, an unintended consequence, and certainly a goal I don't want.

> The few expensive tasks like the belt are definitely things to watch out for both parties.

That was neither a particularly expensive example, nor is it particularly rare. Clutches and suspensions can easily be more expensive and have the same OOM of wear-out time, for example. Their remaining life is also generally unknowable from inspection, so proxies that are visible, like the timing belt, are critical to me.

As I said, the repair-before-certification part of the inspection process can make it impossible for me, the buyer, to "watch out".

[1] Trust is an entirely different matter. It has to be earned. Better yet, don't resort to trust in the first place or "trust but verify". We can measure outcomes of regulation. There's no excuse for me (considering myself intelligent, informed, and scientifically-minded) blindly accepting "trust us! we know what's good for you!" from any group of humans, be it politicians, civil servants, corporations, or individuals.

> Then again it’s also super easy to take a used car to get it checked out, but nobody does that either.

I did that...

Wow. Awesome. Thanks for showing that you are smarter than the majority of people who don’t. Have some empathy for the normies out there.

How do I get experts to vet candidates? Where do I find experts, how can I assess them, hire them to do this?

I don't fully understand why there isn't a standard test or the like for vetting programmers. Most companies ask similar types of interview questions, so it seems one place could do it for a job seeker and give them a grade.

I once read, and have found to be true, that one can only train someone else to be 80%, or whatever, as good as you. If you let that person train someone else, then they will be worse than that.

Teaching someone to only 80% might be true on average, but the remaining parts they have to figure out on their own. But if you have 20 years experience you surely will never train a newbie to that level in a short time and expect him to train others to your level. For certain skills it works, but you just need experience and that happens over time and willingnes to learn.

the thing is you also need a technical person on staff who is good at vetting. I consult and am starting a new place in a week, I expect it will be fine, but the interview was a hr person and two techies, the hr person asked some questions and then asked if the techies had questions - no he seems fine. Which of course I am but still, having trusted techies is not the same as having trusted techies that are good at determining who not to trust.

The CEO of one of the companies I worked for hired without any consultation with technical folks. The company barely exists, and he is no longer the CEO.

“Move fast and break things,” they said. It turns out that’s a pretty bad idea when your business relies on a small number of large customers. Broken products tend to scare them off, which in turn tanks your business. There’s a lot to be said for building things that work, but “move slowly and steadily towards a goal” just doesn’t have the same ring.

In reality, there’s a balance between moving fast and and moving slow. It’s difficult to communicate that balance because every type of product demands a different balance. I suppose that intuition comes from experience, which is a terrible answer for someone trying to learn.

I'm guessing this continuum also depends on the nature of the business one is in. In a fast-moving, consumer-oriented world like social networking, "move fast and break things" is probably very good advice.

In a slower-moving, enterprise-oriented world, "move slowly and steadily without breaking things" is probably the right orientation. Those kinds of companies probably die in fast-moving consumer land.

A while ago I read an article about the programming teams that work on the space station, on satellites, and that sort of thing. Alas, I cannot find the link right now, but the gist was that those teams move very slowly and work very hard towards correctness. Any bug found is generalized and their entire code base scoured for similar bugs. That's a different environment than social networking.

Most enterprise customers I've dealt with don't even care about things breaking that much, they care about control. That's the much bigger thing. Things break. Let's just make sure we don't break things while our customer is dumping silly amounts of money into an EU-wide advertisement campaign. Or we tend to freeze updates during christmas, due to the christmas business of our customers.

With those guys, it's fine to execute risky changes on productive systems, as long as those changes are tested, communicated, scheduled to a non-critical time of the customers and include backoff-plans.

But yes, this naturally slows things down, or makes systems more complex and expensive due to redundancy and staged rollouts.

Enterprise wise, they are very forgiving in my experience unless it affects their customer. Then they are very unhappy, understandably so.

It really comes down to risk management. Engineering is the discipline of making trade-offs, one of which is deciding when a risk is acceptable.

If the risk to your end users is low, or the hazards are minimal, then yeah you can push crap code to production daily because nobody is actually harmed by it. If the crap code carries a risk of say bringing down your payment or accounts system, causing you to lose revenue, then you might spend more time mitigating the risk by testing, reviewing, and improving the code.

On the extreme end of the spectrum, where the hazards posed by bugs crash your spacecraft into the atmosphere or break your $300 million dollar probe, you tend to spend a lot of time validating and verifying your solution because a single mistake is very costly.

cannot find the link right now, but the gist was that those teams move very slowly and work very hard towards correctness.


Re: "Move fast and break things" - That strategy may indeed work for startups who depend on "bulk" customers. If you muck up say 5% of their accounts by taking high risks, you still have 95%. Growth is more valuable than retention at that stage. However, if a bank fouled up 5% of their customer accounts, they'd get sued to oblivion and/or make the front page. Know the domain, know the customer, one size does NOT fit all.

Somebody at our org converted everything to microservices because some fast-talkin' marketer got into their head. It's now a pasta mess. We didn't NEED microservices; wrong tool for the job.

> In a fast-moving, consumer-oriented world like social networking, "move fast and break things" is probably very good advice.

I'm not so sure about that particular example, given the privacy implications. (And beyond...)

If your mantra specifically encourages breaking things in production, you've chosen to be flippant instead of thoughtful; or you actually want your product to be broken. It's likely that a more thoughtful mantra could result in better outcomes without discouraging urgency and innovation.

Move fast and break things, I think, is good advice. Why on Earth you would ship said broken thing, I don't know. But the whole point of moving fast and breaking things is to see where things get broken, so you have time to fix it before you ship it.

Bugs will always happen in production. The important part is not developing everything correctly. "Move fast and break stuff" is a decent philosophy, as long as you can learn from mistakes and fix recurring problems.

When a bug in your application presents itself in production, something very specific should happen: 1) identify the symptoms, 2) isolate the causes, 3) address the issue in production, 4) perform a postmortem to identify the next steps, 5) fix the problem's causes throughout the entire supply chain so it cannot happen again.

A lot of organizations get stuck only doing steps 1-3. That's not good. Operational processes are just as important, if not more so, than development processes. They're what ensure that when you find inefficiency or bugs, that they not only get fixed, but that they never happen again. This means actually making policy changes, training changes, etc in addition to just fixing code.

This is the heart of lean manufacturing that supposedly influenced a lot of modern software development methodology. But some people may have forgotten that the lean manufacturing chain doesn't end at development.

Most orgs don't do 4-5 because they don't receive enough executive support to make meaningful organizational change. Sometimes the executives are clueless, sometimes they're absent, sometimes they're fighting budget battles against other executives and plotting high-level political plots with other executives and board members.

Surprisingly few organizations genuinely care about product or process. The market opportunities for companies which do, are staggering.

Can you talk more about the opportunities and how dev process is related to product and where the opportunity lies? Sound like a provacative yet vacuous statement to me?

Basic formula: 0) All companies are now software companies. See https://blogs.thomsonreuters.com/answerson/all-companies-are... etc. 1) Pick a well-established, non-high-tech industry, with poor technological adoption, which is not being dominated by a FAANG. Many options exist: banking, industry, food service, construction, transport, hotels, etc. 2) Start a small, local competitor. Differentiate through better technical service, using better dev process to respond quicker to market conditions. 3) Use that technical differentiation to ease scaling and enable growth. Ad infinitum.

Many enterprises are still stuck in effectively yearly release cycles, not to mention the number of enterprises still encumbered by legacy systems e.g. mainframes etc. Pick a "target" by making some phone calls, doing some networking, find a business which is encumbered this way, and start a competitor. It should not be difficult to steal business from a competitor which a) will take a year to release a response b) whose response will be handicapped by needing to support internal legacy systems with its response.

If you think easy targets like that don't exist, you need to get out of SV and spend some time with the rest of the world.

> 0) All companies are now software companies. See https://blogs.thomsonreuters.com/answerson/all-companies-are...

Technology != software.

This is a distinction that I've found the vast majority of software people implicitly assume doesn't exist.

A sibling comment points out that physical logistics sometimes can't be solved (competitively) by technology alone, but I posit that this is true even within technology. That is, not all technology problems can be (best/competitively) solved by software alone.

None of FAANMG depends exclusively on software, running on someone else's hardware and network.

G famously customizes their servers, and they had hardware frugality as part of their core strategy from the very start.

> encumbered by legacy systems e.g. mainframes

That may not be as great an encumbrance as you imagine, considering the kind of performance modern mainframes can deliver.

Are they over-paying for that performance, compared to best-case commodity hardware? Of course, since mainframes (for the latest models) are a monopoly. They may also be over-paying for the software and/or developers.

A startup with AWS-based infrastructure will be over-paying, too. Cloud may not be a monopoly, but it's at least arguable if the competition is robust.

That distributed computing system, with all the inefficiency added by the distribution, may be remarkably copetitive for someone that's paying below commodity prices for infrastructure. For someone paying 10x commodity to a cloud provider, not so much.

Sometimes there's side considerations which are technically simple but logistically hard to work around. Technical differentiation is only one way to gain a competitive advantage, but it's also not necessarily the most important advantage. For example, it would be incredibly difficult to compete with Walmart unless you had a network large enough to compete with its effects on the entire production chain. Only someone else like Amazon can compete in that space, and it's not because they're technically superior, it's because they're friggin' huge.

> 4) perform a postmortem to identify the next steps

It takes an effective management team to make this step work. If the postmortem isn't blameless, this stem becomes completely ineffective as employees (not unreasonably) prioritize their own careers over the health of the company.

Sometimes there is blame, though. Sometimes a root cause is simply that a person or group was not competent in their execution of a task. I think there's a middle ground between never holding anyone accountable and the desperate situation you describe.

In my experience, it's rare that any blame is sufficiently concentrated as to make assigning it sensible. A failure by a human is more commonly a failure in the process they were following, or in the supervision they should have received.

And there's a big difference between "blame-free" and "unaccountable". It's not the fault of a team of developers if they aren't experienced enough to take responsibility for a critical production service, but the solution is not to berate them for being inexperienced: it's to give them the support they should (in hindsight) have had in the first place.

Sometimes that is translated as “a lack of training” which is often a euphemism for someone did something stupid and we have to put in place reactionary consequences (training on what not to do) if they or someone else does it again.

I typically push back on conclusions that only involve trying to change people and don't also put technical safeguards in place.

There are definitely things you can't fully automate, but the goal should be that doing the right thing is really easy and making mistakes is hard.

Lack of experience is a nice word for incompetent.

It remind me of an outage a few months ago. Lost a pair of systems, couldn't login and run anything anymore.

Turns out, an operator ran a "rm -rf /home/user".

He was following a procedure that includes cleaning a subdirectory. The subdirectory didn't exist so he figured out let's clean whatever top path exists!

> Sometimes there is blame, though. Sometimes a root cause is simply that a person or group was not competent in their execution of a task.

It is true that some people are not good at their jobs. It's also true that the only way to discover that people are bad at their jobs is their performance.

However, I think the process approach is applicable to a much larger domain than many people give it credit.

One of the greatest software development quotes I've read comes from DJB [0]:

> For many years I have been systematically identifying error-prone programming habits—by reviewing the literature, analyzing other people’s mistakes, and analyzing my own mistakes—and redesigning my programming environment to eliminate those habits.

The difference in security bugs found in qmail and sendmail speaks for itself.

This is an example of the superiority of process: it shows that what some people consider a "root cause" is actually only a proximate cause - that the root cause is the programming environment that allows the bugs to occur.

[0] https://cr.yp.to/qmail/qmailsec-20071101.pdf

Yep. In many ways the "competency" of a person or group comes down to whether there was a designated person whose job it was to ensure they were following good processes. DJB does it himself, but most people don't do it for themselves.

There's "blame" and there's "fault". Faults are correctable, blame tries to place responsibility on actors. Few faults can be corrected without acknowledging where responsibility lies. There are multiple kinds of responsibility however. Responsibility for the cause and responsibility for fixing it. The same party who caused the issue isn't necessarily qualified (or authorized) to address it. The idea behind blameless postmortems is to be descriptive without calling out bad action, and evaluating the fault holistically, and acknowledging the reason most faults occur is a failure of processes not necessarily a failure of a person. Fixing processes is much more scalable.

Sure, sometimes someone made a mistake. But they're not to blame; the process was set up that someone could make a mistake. Change the process so if/when a mistake happens, it doesn't cause a problem.

> Sometimes a root cause is simply that a person or group was not competent in their execution of a task.

If an incompetent person is able to take down a competent person's system, the "competent" person is to blame.

In my experience that's quite rare, what is likely to happen is any postmortem will be used as a political stick to beat someone with.

> They're what ensure that when you find inefficiency or bugs, that they not only get fixed, but that they never happen again.

This doesn't work. You shouldn't focus too much on making causes of bugs to never happen again, they will happen and they don't matter for operations as new code will bring back old causes of bugs and introduce new ones. What you should focus on is making sure that when bugs do happen they don't cause any trouble. Be realistic about bugs, expect bugs.

Of course it helps to make your application resilient to failure, but it does not help to ignore the causes of bugs.

I would say most bugs are entropy, or a human doing something wrong, or processes that haven't accounted for some edge case. A lot of these can be prevented, and a common way to do that is to watch them happen, document them, and then implement a mechanism to fix the cause when it starts to happen.

It's not a panacea but there's a lot you can do that does work. Maybe not 100% of the time, maybe not even 90% of the time for some bugs, but stuff that works nonetheless. Better unit test coverage, regression tests, new linters and static analysis passes, fuzzers, explosive runtime checks that catch problems earlier and more obvious ways, ...


The important part is not developing everything correctly. "Move fast and break stuff" is a decent philosophy, as long as you can learn from mistakes and fix recurring problems

This strikes me as extremely arrogant and naive way of thinking. "Customers, stakeholders and investors. Your interests have been damaged but rest assured lessons have been learned. We will continue with this approach as it makes sense to is"

not developing everything correctly is about caring about not damaging interest of people.

Having data loss while migrating database to new release version is never acceptable. Db migrations should always work correctly, no discussion.

Having some css issues where some menu is displayed in a crooked way is perfectly acceptable.

The attraction of "Move fast and break stuff" is in its simplicity and focus.

Once you start qualifying it with "it's only for cosmetic changes not for serious stuff" it stops being worthy of a discussion.

I bet this blog post was about javascript :p

About your steps. The problem I often see around work is that A bug in prod doesn't get patched in the mainline then pushed all the way up ASAP so it can't accidentally promote up again as it exists somewhere in test or preprod as well.

I like the CI/CD approach of just having everything go down the same pipe as it seems to be a decent way to at address point 5.

"I bet this blog post was about javascript :p"

You should bet more often ;). That's quite correct. Mostly Javascript with a bit of python.

I would add, write automated tests to identify that the issue was fixed if it is possible.

> Bugs will always happen in production.

This is not some universal law. It’s true only when the project/team is not willing to make the trade-offs (like development speed) needed to deploy bug-free code. For most software it’s better to release with bugs, but not so for all software. For some software, like safety critical projects, things launched into space, etc, it’s better to take your time and release without bugs.

Besides, good development practices, good requirements, good change control, and good testing can prevent the vast majority of bugs from reaching production. Sadly, this idea that bugs in production are inevitable is pervasive in industry.

In order for that to happen, you would need perfect practices, perfect requirements, perfect testing and perfect ‘change control’, and an inhuman ability to never make a human mistake in the understanding of any one of those things within your team of equally perfect people, so add perfect communication on to the list.

Anything less than perfection at any stage will be a vector for bugs, whether that bug is a syntax error, unexpected behavior, or a fully tested and functional feature that doesn’t meet expectation.

Mistakes are inevitable and not necessarily bad when you handle them well (move fast and break things says nothing about what you do with all the breakage, so feels immature). Calling them bugs doesn’t change that.

Can you point to a violation of this observation? Operating systems, CPUs, compilers, and software at large have all had bugs. Can you point to anything meaningful that has not been susceptible to bugs in production?

> good development practices, good requirements, good change control, and good testing can prevent the vast majority of bugs

prevention of the vast majority of bugs does not negate the claim that bugs will always happen in production.

I am skeptical deploying a non trivial project to production without bugs is a viable strategy, even for safety critical projects. I imagine the bugs are just much less severe.

It is not even easy to tell whether a piece of software has no bugs or just none you have seen.

Arguably some of the most conservative and careful devs in the world managed to crash a spaceship into Mars and you know they fully subscribed to "take your time and release without bugs."

"Move fast and break stuff" is a decent philosophy if and only if the consequences of breaking stuff are survivable.

If breaking stuff means that your website looks weird, that's survivable.

If breaking stuff means that performance sucks for a while, that's survivable.

If breaking stuff causes unavailability during a critical period of end-user demand, a few incidents might be survivable.

If breaking stuff causes your company to have a terrible reputation for privacy, security, or competency, that might not be survivable.

If breaking stuff causes your company to divulge financial information, that might not be survivable.

If breaking stuff ends up costing your customers any significant amount of money, that probably will not be survivable.

If breaking stuff causes a pedestrian to be killed because "they came from the shadows"[1], that appears to be perfectly survivable... /s

[1] https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg

Cars killed people... they didn't stop creating cars. Nor would be a good idea.

Killing one person per 1 million miles driven like Uber's self-driving cars have done is drastically different from killing one person per 80 million miles like human drivers do.

It would be a great idea if there was a good alternative in low population density areas.

No, we got jaywalking laws and our rights reduced.

We got safer cars at the same time. It is a double-edged sword.

>If breaking stuff causes your company to have a terrible reputation for privacy, security, or competency, that might not be survivable.

I reckon this stuff is particularly survivable. You can be like Linode, fuck up everything over and over again and still do just fine.

That's the kind of thing that pisses me off. I'm more than willing to do business with companies who have had security breaches as long as they've learned from the mistake and corrected it. Target had a massive security breach and they overhauled their entire IT security infrastructure and replaced all their POS systems. Good. I still shop at Target occasionally. Home Depot had a breach and their executives said "this was expected, we aim for a C level of security not an A level" and didn't want to spend additional money on security. That's unacceptable to me. Linode had a security breach and said "it's fine, it only impacted a few people and they only lost tens of thousands of dollars" and then they got breached again and said "yeah but it's not our fault that we use an unpatched version of ColdFusion for our admin portal". They never took ownership of their security issues. That's unacceptable to me. Equifax got breached again during their other damn breach investigation and who here remembers that Equifax even made the news? No one. A year later it's a non-event.

Unfortunately, most people don't share my beliefs so Linode and Equifax get away with it, and Home Depot's "it's not worth it to invest in security because there's no ROI" is completely true. Unless companies pay a heavy (I'd say existential) penalty for lack of security, nothing will change.

Linode is a very cheap provider of a mass market product. The bulk of their customers should be cheap customers who either don't know or don't care. Possibly both and with a high turnover.

>You can be like Linode, fuck up everything over and over again and still do just fine.

Source? Because every time I've read about Linode they had glowing user reviews, including on HN...

It blows my mind that people still give glowing reviews to Linode when in 2013 hackers breached their systems and stole credit card numbers and customers reported their credit cards were being used maliciously and Linode got away with saying "nuh uh": https://www.theregister.co.uk/2013/04/16/linode_breach/

Or again in 2015 when Linode's poor security lead to PagerDuty being hacked: https://www.securityweek.com/how-attackers-likely-bypassed-l...

You can find some more discussion in previous HN threads about it: https://news.ycombinator.com/item?id=11136399

Yeah, others provided good links.

I think your comment somewhat proves my point :) Nobody remembers this stuff!

a starting point (surprisingly high result on Google while looking for a list) https://news.ycombinator.com/item?id=10845278

Or Facebook...

Or alternatively, if you're dead anyway.

Many of the worst - and most strategic - examples of moving fast and breaking things come from situations where the feature needs to be written or the product launched, or else the company goes out of business. You need to hit the milestone to close the next round of funding, or you're dead. You need to add a feature to sign a customer, or you're dead. You need to launch - at all - or you're dead.

It's rational for the milestone/feature/launch to be buggy if not doing it at all means there is no company. Remember that all startups start out "default dead", and getting to the point where they're "default alive" and killing the company is worse than the status quo requires a fair amount of effort and a lot of early decisions that may or may not be reversable.

e.g. Knight Capital.

Let's not also forget the consequences of "breaking stuff" when applied to industrial process control, avionics, car "autopilot" (cough) etc. "Survivable" is sometimes a literal term.

This something I don't understand about certain startup founders. If you're starting a company, you should have at least one absolute monster of a developer. Someone who is capable, communicative, and with a deep pool of knowledge. Someone who can attract, hire, and lead other good developers. Not your buddy Steve who made an iPhone app or two. Not that developer who has some PHP experience and built a React app once. A real, terrifyingly good developer. The aphorism "first class people hire first class people; second class people hire third class people" comes to mind.

Granted, I'm a pretty inexperienced developer, but I see companies being founded by people who are even less competent and less knowledgeable than me.

And sure, finding a competent person is hard. But it's like the movie Seven Samurai. Can't afford to pay a samurai? Find a hungry one. There are plenty of developers who are young, but still good. And if you can't even do that, then take any and all money you have, and pool it to find one or two (or seven) developers who can find other competent people.

> There are plenty of developers who are young, but still good.

I was young, and I was good. (Now I am moderately young, and I'm very good.) But I am a better talker than I was a developer. Only recently have I really become as good a developer as I am a talker. How are those startup founders supposed to evaluate me?

So a team of inexperienced developers made a system that mostly worked with some hiccups, could be debuged by an expert in 1 day, and fixed on a tight launch timeline. Looks like a success story!

It was definitely a success story. The company, their customers, and my company all benefited. It wasn't all rainbows (the story is a bit longer and more complicated), but it worked out well for everyone in the end.

If a consultant could jet in, diagnose and fix up some issues in a couple of days, that means the team did a pretty awesome job to begin with.

A truly inexperienced team would have created a spaghetti monstrosity that was practically unsalvageable.

Plus they apparently also managed to validate existence & viability of a market.

Many developers are commenting in here trying to fix a problem. I am a developer but I now run a business. From a business point of view, there is no problem here. Code has been delivered and it is running in production successfully. It might not have arrived at that point in the most efficient manner, but the point is that the business has executed successfully here. There is no issue. This is a lot further than most businesses get. If you see a problem to be fixed here then you are looking at a small picture of the business.

Are you discounting that technical debt is a thing/matters?

What gets delivered isn't just a set of features, it's a piece of software, which implements those features, but might have a whole bunch of baggage attached which can make it really hard (or easy) to change those features or add new features in the future, or to allow the software to adapt to changing external conditions.

I don't think this is specific to software. E.g., I'd imagine it'd be possible to build a building a lot faster if you didn't need to worry about it not collapsing in five years.


Though to be fair, the mere presence of technical debt on its own is not a problem; there is some amount of technical debt that's appropriate, and it's hard to know exactly where that is. You certainly can't get by just ignoring it completely.

To some extent that was just luck, they happened to hire a consultant who happened to be able to resolve the visible problems.

Hiring junior developers because they are cheap and keeping your fingers crossed isn't going to work most of the time. Not shipping will sink any business if it goes on long enough.

This is an excellent point, and I agree wholeheartedly. They did extremely well given the constraints they had, and at the end of the day everyone made out: their company, their customers, and my company.

I hope the article didn't paint the company's decisions in a negative light. They're a great company to work with, and I think they've done a fantastic job.

What about this?

>I’m almost certain the cost of fixing the code exceeded the margin on revenue due to writing it in the first place.

Does bussiness really not care that you have to build the same thing twice?

A rotting foundation eventually catches up with you. Things don't have to be perfect, or even particularly great... but when things are working by accident rather than by design...

There's a great book Apprenticeship Patterns which gives real answers and advice to developers. It addresses the situation described in a few different ways and offers different strategies. Most of them boil down to how to pick a team where you will learn the most quickly while not drowning.


Of all the books I recommend for junior/intermediate developers, this is the one I demand they read. I wish the the free online version was still available, but it doesn't appear to have survived O'Reilly's switch to the subscription model.

Thanks for the recommendation. I'll check it out.

One thing I see amongst many of developpers, is that they write code like it will work in production. OTOH, I tend to write code that will crash nicely : good error reporting, ability to continue after a crash, write code that handle only the data it needs, make sure queries address the smallest possible set of data, log a lot. Interestingly, that's a lot of code I have to write anyway because it helps me to write my programs (I mostly don't use debuggers).

I agrret with you, but want to say I think it's important to write code that is "brittle" enough to just satisfy the requirements. I've seen a lot of "edge case that probably never happens" code that ends up causing problems down the line because it does get hit, but for a unexpected situation and ends up mangling data.

Especially in the paradime of black box/microservices, if something comes in that doesn't line up with a known senario, there should only be enough code to log the data and raise an alert/error. Having something bad happen, and handling it so it becomes 'less bad' and doesn't get flagged, escalated, and tested properly is going to cause issues.

>>> I think it's important to write code that is "brittle" enough to just satisfy the requirements

100% agree and the idea behind my post was that writing code in such a way requires a lot of experience and an appropriate mindset.

Completely agree. One of the first problems I found was that the server wasn't running via a process manager, so when it crashed it never rebooted. Simple enough to solve, but non-obvious if you've never run into it before. It's a checkmark for anyone who has hosted any application before, but it's one of many important ones.

I can’t think of any well-known professional projects that operate like this. IMO, automatically restarting after a crash is a terrible crutch that ensures crashing issues never get fixed at the priority level they deserve. Can you imagine if Apache, postgres, postfix, haproxy, etc. operated this way? If you have so many crashing issues that you need something like this, you have serious problems.

Can you imagine if Apache, postgres, postfix, haproxy, etc. operated this way?

They do operate that way; when an Apache or Postgres worker crashes, a new one is started.

Postgres uses some heuristics to determine if it will restart. But that's a problem mostly limited to databases.

A process manager is one of many failsafes used in case of a crash, and it also has many uses other than recovering from a crash. A robust production system should have many layers of failsafes to prevent it from crashing, to recover quickly from crashes (increased load, outages, etc...), and to review crashes after-the-fact. In fact, most setup articles run every piece of software you mentioned in a process manager. They don't do this because they think the software will crash; they do it because when it does crash, they don't want to have to wait for a developer to manually reboot it while they diagnose the issue. Perhaps this is a good blog post topic.

Stateless servers can and should recover automatically after a crash. Erlang has built a whole philosophy of reliable software construction based on the idea of letting errors crash a process and be restarted by a supervisor - systems with higher reliability requirements than most database servers, e.g. upgrades with zero downtime.

I encourage you to read Joe Armstrong's dissertation of how a simple idea of restarting on crash can make you an impossibly reliable system.

Resilience is one of the main characteristics of a stable codebase. Many failures, probably most failures after your developers hit mid-level, occur because an assumption about something stopped holding true.

Just this last week there was a major outage on an internal tool because the connection to the LDAP server got flaky, and the internal tool didn't know how to handle that -- instead of recovering gracefully, it freaked out until the whole server crashed.

That could've been avoided with just a small amount of recovery logic, but instead, the original author must've thought "If we can't keep our LDAP server connected, we have serious problems". But in the real world, things go sideways sometimes. In the real world, there are bugs in Microsoft's July 2018 patchset that ruin the TCP stack (cf. https://support.microsoft.com/en-us/help/4345421/windows-10-... among many others).

It's simple naivete to assume that you won't have problems as long as you keep your nose clean. There are externalities that impact even the most meticulous of us. If you want reliability, your systems must anticipate this.

The problem is that you often can't anticipate in what ways other system may f..k you over. So automated restarts make sense. But you should then try to analyze the problem and handle it properly the next time.

"If you have so many crashing issues that you need something like this, you have serious problems."

You don't have to have many crash issues for a process manager to make sense. If you just have a crash from time to time it's nice if the whole system recovers without somebody having to restart it. However, every crash should be taken very seriously and analyzed.

They do by default. The linux package will install them as a system service and the service manager takes care of the restart. Your mileage may vary with the Linux distribution.

Apache goes further than that. When running applications like PHP, it has default settings to restart the worker after every few requests. Historically, PHP applications suffered from memory leaks.

> Can you imagine if Apache, postgres, postfix, haproxy, etc. operated this way? They all do. In the real world a crash or a forced restart is a pretty normal part of operating.

I can't think of any well-known professional project that doesn't work like that.

Your process manager can alert you when a restart happens. You can then fix the problem.

Just one thing to check. Have you at any point used Erlang?

Because I believe http://wiki.c2.com/?LetItCrash is primarily associated with Erlang/OTP.

Being the sole person responsible for a system will quickly teach you that lesson, and it’s an incredibly valuable one.

> all of these developers lacked experience.

This is nothing knew unfortunately.

Working for large corporations I've often notice this :

Either the devs on the project don't have the required experience and will deliver poor quality code creating immediate "legacy" , or either it'll be outsourced to another company with no governance plan to review the code quality creating a "black box".

Because there is very little standard and regulation in software industry about competence, you can have two devs claiming they are "Software Developer" with one able to deliver a full project by himself the and the other not capable of creating a simple web page in HTML.

Hence , I've noticed very little company understand the importance of "tech" , "IS Governance" and investing in their staff in general, they often see "tech" as a constraint like : "We have to do a website for our customers , otherwise we'll loose market share" and not "We have to seize the opportunity to create a platform for our customers, it will drive growth massively"

This is sad , but this is pretty much the standard in corporation these days.

There's no reasonable way to write code suitable for "production" in general, because there are so many conflicting requirements. Just among internet services, components need different kinds of robustness for content websites, communication, e-commerce, email, banking, social networks, internal corporate services, and more.

There do exist components suitable for most environments, like Postgres, but usually at the cost of 10x complexity over what'd be needed for the pure functionality. That engineering is worth it for things like databases that get used everywhere, but not for any custom component.

Can someone give an example of a blog post that gives example code yet warns "not to do it in production"? I've programmed and read about programming for decades and can't recall ever seeing anything like this.

It's hard for me to take this article seriously without even a single example -- I can't tell if this is a strawman or not.

Is it about code that isn't thread-safe? That relies on undefined behavior? That doesn't scale? That has race conditions?

"In dev" and "in prod" can mean so many different things (is it about scale? or running on different OS's? or running unattended? or running on different hardware? or running compiled?), out of any particular context they're essentially meaningless, and in case particular case, it would seem to be the particulars that matter...

First result from DDG. https://blogs.msdn.microsoft.com/canberrapfe/2013/02/16/powe... the thing in that example is that a thing is not properly configured but good enough to get started with in development.

Second result: https://stackoverflow.com/questions/1475297/phps-white-scree... describes how to put debugging statements in the code which is good for local development but which shouldnt be kept for production.

I recall seeing it many times (and ... mostly sometimes heeding the admonition).

A lot of times you’ll see a solution snippet with inadequate exception handling or resource cleanup (file handles, dB handles, etc.).

I’ve also seen a lot of regex examples that show a principle but don’t handle edge cases. See as canonical... email validation.

> It's hard for me to take this article seriously without even a single example -- I can't tell if this is a strawman or not.

I felt similarly: it just rang slightly false to me.

For one thing, a code review wouldn't be the first solution I'd reach for when diagnosing the kind of problems described (performance issues, memory leaks, random crashes). There's plenty of tooling available for most platforms that will generally get you to a solution more quickly and reliably (granted, the post mentions only spending half a day), and often highlight problems you might easily miss in a code review. I'm not saying code reviews aren't useful, but that measurement of real-life behaviour can be more useful in these circumstances.

Still, the point about not copy-pasting code directly from the internet is well made: been bitten enough times over the years that I've become extremely wary of it, even - and perhaps especially - when in a hurry.

I've seen it plenty, what I don't ever recall seeing is a book or article that says "this is how you should do it in production". Generally there is just some hand waving over important other stuff like error handling you should take care of in a real system. I've also seen lots of production code that doesn't look that different from the code examples you shouldn't use in production.

From my experience of real production codebases, the difference is mostly in accumulated bug fixes, which may or may not be documented but often make the code less clear and in lots of logging and debug code in parts of the code base that are particularly bug prone.

If I was a cynic, I'd be tempted to conclude that we don't actually know as an industry how to write "production code", all we know how to do is write non production code and then try to patch it up until it kind of works in production.

>Is it about code that isn't thread-safe? That relies on undefined behavior? That doesn't scale? That has race conditions?

It can be any of those. Not a blog post, but one example that immediately comes to mind as something you shouldn't use in production is the [Redis KEYS command](https://redis.io/commands/keys).

I searched for "don't do this in production". This article was the first hit, and several blog posts with example code followed.

Just recently, been looking for code samples on how to send APNS (old-style) push notifications in PHP. Every single blog post or SO answer had such a low quality that was barely usable at all, let alone in production. Ended up writing my own implementation that has been pretty reliable so far, but the learning curve and the distance I travelled from those code samples on the Net seems pretty big now. On warnings though, I vaguely remember some warnings in a few places, but others seemed confident that they had the production ready solution (they didn't).

I’ve seen A few tutorial style posts with disclaimers like that. Usually when you want to illustrate a specific point using code, and the stuff you would naturally do in production (such as cleaning up memory or closing IO handles) is excluded since it’s a distraction from the point the author is trying to make.

I know I’ve included a disclaimer like that in an internal presentation I had done in college and I’m pretty sure I did not come up with the idea myself.

I've seen it plenty, e.g. Django: "This method is grossly inefficient and probably insecure, so it is unsuitable for production." https://django.readthedocs.io/en/latest/howto/static-files/

"Do not do this in production" is always implied. Blog post code is always chopped down for readability. There is no proper error handling, architecture, performance considerations, etc. Blog post code is for presenting concepts. It is up to you to write production quality code.

I've seen it dozens of times, and dozens more read articles where they should have.

The entire Supercharged playlist on Chrome's Youtube channel does this, for instance.

You see it a lot in kubernetes tutorials.

I've seen plenty of C code examples that made disclaimers about forgoing error handling. That said, I can't think of any off the bat.

This blog post describes my motivation for writing a blog. It's mostly a written memoire for myself to remember in detail what motivated me to start writing, but I thought HN readers might have some good feedback, so here it is. Would love to hear any stories you have of production fiascos.

> If you read this far hoping for an answer, then I’m sorry: I don’t have a simple one. This is a difficult problem to solve. The solution is too large for a single blog post, changes every day, and differs subtly for every project.

Crazy idea, what if we found one of these people who writes these blog posts, an especially well vetted and knowledgeable one. Maybe even one who writes the frameworks and software these people are using. cough https://pragprog.com/book/phoenix14/programming-phoenix-1-4 cough

What if we took this person, and paid them money to write a bunch of blog posts about on topic, in a way that it read in a linear fashion? We could even maybe hire someone to help them, to proof read and make helpful suggestions.

We could give the knowledgeable developer a year or so to write the blog posts, then print out the blog posts, bind them together with glue, and SELL IT!

People could pay money for the well researched, well edited series of blog posts about a specific topic written by the expert. I think I'll name this blog-printed-out-onto-paper... a honkycat. After it's inventor.


Sarcasm aside, I find people's insistence on googling and sifting through half-thought-out blog posts infuriating. I am a HUGE proponent of a well written book about a software product. Compare "Programming Elixir" to their free online book and documentation. It's night and day. And I can always look in the same place for reference, the book. Written by Dave Thomas.

A book has: ( At least ) one author, one editor, and a publisher. All of these people work hundreds of hours to produce a useful manuscript. A blog post is something a script kiddie shits out because they're trying to get their first programming job.

Yet people say the same thing: "A book?!? I can just find it on the internet" and do the same thing: "I'm sure I'm smart enough to just start writing this software without any prior experience in the language, or forethought" and make the same mistakes over and over.

Skip the blog. Read a book.

I totally agree. Far too often, I come across people who are not willing to commit the time to learn something properly in the first place -- even if that area forms the bulk of their work. I come across the same reluctance to read any substantive resource, whether that be freely available online documentation, man pages, RFCs.

Blog posts and Stack Exchange questions are great and have their place but shouldn't be the foundation of one's knowledge when the work requires an in-depth understanding of that system, technology, language or framework.

Finding a good book is not particularly easier than finding a good set of blog posts. While the quality of books is on average significantly better, finding books where the material was written in the past year is much harder.

Particulary in the trendy JS world, where there might not be a good book for the stack you are using, and even when there is, you ask questions online and get responses that are roughly "It hasn't worked like that for 6 months, why are you trying to do it this way?"

You give the example of the Phoenix book; I don't know if that's what this team was using. If it was not, the fact that there is a good book for that stack is not particularly relevant. If they were using Phoenix, then the fact that there is not a good book for most stacks would explain why they wouldn't consider looking for one for the stack they are using.

> Particulary in the trendy JS world, where there might not be a good book for the stack you are using, and even when there is, you ask questions online and get responses that are roughly "It hasn't worked like that for 6 months, why are you trying to do it this way?"


One comment about books. I feel the quality of books has gone down a lot over the last decades. When you read K&R or Stroustroup's C++ books they are well written and give you a very deep understanding of how things work and why they are that way. Most modern books I have seen have 1000 pages and are just a bunch of recipes and screenshots without deeper understanding. Even user manuals from 30 years ago are excellent compared to what we have today.

I am sure there are exceptions but they are hard to find.

There were plenty of crap books about 30 years ago as well, it's just they are long forgotten now.

Well it's really both. Definitely a great example of survivorship bias (we only remember the good books), but also it's so easy to publish a book these days compared to 30 years ago -- publishing is cheaper and "print-on-demand" means that tail-end publishers want quantity over quality. And publishers can do mass reachouts to potential tech authors all over the world (and they do -- if you have even a small blog with a "how to" tutorial on, you've likely been contacted by PacktPub, etc.).

I find this a result of pervasive command and control structures which inevitably lead to low trust environment.

Something breaks, so the manager naturally thinks he must add a control that will prevent the person or a from ever being able to do it.

Very soon nobody can do anything without approvals by managers so far removed from actual decision that they don't even know the names of components being subject to change. The change then must travel from the only people that know what the change is but are absolutely barred from ever seeing production environment to people that being so focused on operations don't have time to actually understand what the change is.

Once you are being treated as untrustworthy it will show in the rest of your work. Why code correctly from the start when you are not deemed trustworthy to do it correctly and there is lengthy testing and approval process to make sure you are not going to bomb your company?

The correct step is to recognize, that introducing more controls can very soon lead to low trust between employees which is self-fulfilling prophecy. When you are not allowed to do something because you are not deemed trustworthy to do something, then soon you have no experience and then this serves as a proof that the initial decision was correct.

On the flip side, it seems that many young devs think that they should be allowed to do anything they want, without oversight whatsoever. I don't know of any other industry where junior people have this sort of arrogance. Usually people recognize they are junior and work to both learn from and gain the trust of their seniors.

In development land, the attitude is "seniors are old, useless, and outdated" and any form of oversight at all "removes trust." E.g. A young dev suggests building their own database. After being told politely that this is an enormous task full of seriously hard problems and that it's not feasible to try, they yell about their "ideas being shot down" and that the work environment "lacks psychological safety."

That's generally true.

What is important, though, is how the very young devs are molded the first time they come to an organization.

I have long experience working for various enterprises. One outlier company I worked for is Intel. Intel or at least the organization I worked for seems to be aiming to hire almost exclusively fresh out of university just to be the first one to "mold" the young engineer. Young engineers are immediately treated as adults trusted with their projects. Managers (not all, but I would say majority) seems themselves as shepherds rather than commanders. They will ensure projects are assigned to engineers and that engineers create value and they will try to debug if there is any cause of concern but will not assume the typical "single point of contact, single point of decision and authority on everything" role seen in most corporations.

This of not true of all managers at Intel but it was very striking to me when I joined the company with some 15 years of experience in dev under my belt.

Junior engineers seem to adjust to this immediately and seem to genuinely try their best to do best and try to seek wisdom which there typically is a lot of in the close vicinity.

I’ve been in both situations.

I really liked the high trust environment. We knew what we needed to do and we did it. It worked great.

But I had really good people on my team. Some of the other teams weren’t like that, or had a very different definition of when something was ‘good enough’, and the results were predictable.

I’ve also been in the ‘high control’ environment, and it’s hard to get anything done at more than a snail’s pace. It helps tamp down issues from people who need supervising (due to inexperience or quality) but it puts a very low cap on everyone. There is no way to prove yourself enough to change things much.

End result is everything moves slow, including necessary fixes that would speed up development.

Of the two I think the first one is far better, but you really have to be careful to get good people with the right attitude.

I do think there are people who prefer the right controls because it absolves them of some responsibility.

Now, the question is, were they good people because they were hired well and this enabled high-trust environment or were they good people because high-trust placed on them caused them to rise up to the responsibility?

I personally think, with no experience yet to prove it, that it is a mix of both. Meaning, you need people capable enough to react to the trust placed on them (meaning, really, normal people) but then the trust placed on them causes them to further improve.

I think most people react positively when given genuine trust, feedback, reward/punishment, encouragement.

It is also my opinion that most dev managers bring NEGATIVE value and their teams would function better without them except when there is extreme disproportion in experience and the command/control manager can carry the team that is not really allowed to do any high-leverage tasks. These are most miserable teams I have observed in the past.

This happens because most people get corrupt very quickly. Once they are promoted they suddenly get to think they must be better than their team and their decision must also be better than everybody else in the team and so they should be taking all decisions. The team feels powerless peons, stops feeling responsible for the project, morale sinks, new hires are immediately indoctrinated that this is how things work around here. This happens extremely quickly, in matter of days even, and requires much, much more experience to fix. Managers that are capable to understand and fix this are typically promoted further assuring development team stays miserable.

I’d agree with the ‘combination’ option.

I was the second person in the team, and the one who hired me was very good. I’d learned discipline at my first job where I worked on (and later solely owned/ran) two systems that handled money. If something went wrong you could trivially figure out how much your screw up just cost the company. So you had to be very careful, which was a great lesson.

The people hired after me were generally quite good (to excellent), but even without much experience they came into an environment where production was very stable and we were careful to keep it that way.

If the people weren’t good enough they wouldn’t have been capable of keeping up that standard. Some didn’t cut it.

I really lucked into a lot of opportunity and fantastic lessons at my first jobs and had great bosses who did good mentoring. I know that’s hard/impossible to replicate on demand.

I agree about negative value. Having seen so much through my jobs I can clearly see bad decisions that make the product less reliable being made, or at least horrible problems being ignored because customers aren’t complaining so let’s make tiny feature X instead.

It seems like most people in the job just have no idea how to manage software development for the long term and can’t see past short term goals/ideas.

That’s probably common across all types of projects/industries, I just wonder if it’s even worse in our industry.

Trust is important but if you trust people who arent capable of doing a job that trust is misplaced.

Copypasta from other codebases happen to work better in non-production environments because the input space is smaller in virtually any non-prod runtime space than in prod.

This makes for a larger intersection of inputs to the copypasta that produce outputs that are expected by the blogpost/googling developer and inputs to the copypasta that produce outputs that are generated at non-prod runtime.

When you don't quickly see what's semantically wrong with the code (the problem mismatch) because you pasted it and it just worked (maybe with a few small tweaks), then you quickly forget about it and black-box the snippet in your head. Then, when input space gets real-world and things break, because the copypasta never had much cognitive gravity (because you didn't think much about it), it doesn't attract log dump elements in your log-scanning brain as much as stuff you had to think through yourself does.

I guess you could automate some static checks for this by some massive trawl of snippets from a few websites (at least popular SO posts in the same language?) and finding verbatim matches in your codebase. Sounds massively impractical to do this at scale though. Does anyone know tools that do this?

"Move fast and break things" only works if your company has a stranglehold on a monopoly and your user base is used to bugs in production. Terrible advice for just about everyone else. You want your users to have the best experience possible, rather than do your QA process for you.

EDIT: I should say my point is: "moving fast" and "testing things" are goals that aren't contradictory. You can and should do both.

This post was completely not what I expected it to be.

The "do not use in production" is the programming equivalent of taking a strong stance on an issue but then appending "but hey, I don't know" or "don't take my word for it".

They completely defeat the purpose of the entire statement and in the production comment's case, the code samples.

Let's ask a different question, what am I supposed to use in production? I had an email discussion with some blogger who wrote about a very narrow problem which I needed to solve. Their code samples worked, but poorly. The entire thing was a dumpster fire but it solved the problem and were the only one I could find after days of Googling and Stackoverflowing.

I did not care about their "do not use in production" comment. I needed their solution in my production system. What was I supposed to do?

Sorry for ranting, but I believe it is irresponsible to put code out there which shouldn't be used except for when it is the point.

Some time ago I was writing some software in Go in my spare time (just for fun and I could spend my time how I liked it). As I wasn't completely sure what it should look like in the end, so I thought it would be reasonable to skip writing tests in the beginning and just write the code and add tests afterward where they would be necessary.

Fast forward I had to write the whole code twice. The second time I wasn't completely sure how it should look like either, but I put a much greater emphasis on interfaces and modularity. That way I could guarantee that parts of the program were actually doing what they were supposed to do, which made the bug hunt so much more effective.

So sometimes the longer way is actually the faster one and experience matters when it comes to recognizing such situations.

Building a good team is hard. Yet companies think they can hire an engineer which had a senior position somewhere and be fine. Senior or 10y experience can mean literally nothing, it doesn't qualify for a central role in a team, nor does it tell anything about quality. Expecting something grand from a bunch of rookies is simply dumb. Writing better blog posts or smart essays won't change anything. A rookie needs to follow the wrong path and fail with it. Good mentoring can speed things up, but you still need to let them fail. Team building is vastly underdeveloped in software engineering, it is not even a serious thing. Yes if you start out with a good guy that carries the newcomers, then you're golden. But that is mostly just luck.

Even for an experienced programmer this stuff happens, and it's the most frustrating experience I've ever had. It's the stuff that happens when management and C-level just "wants stuff" to be buzzword compliant. And, every day/week/month the requirements change. This weeks feature X has "top priority" and then amnesia kicks in and now feature Y is the next best thing. As a programmer-now-manager I try to shield people from this, trying to find the gems among the rubble of business meetings. But it's tricky, so this really resonated with me:

"I have an overwhelming empathy for developers in this position. They have more information than they will ever need, but it’s completely disorganized. It’s like trying to build a ten piece puzzle, except you have to find the ten pieces within a pile of 10,000,000,000 pieces, all of which are square, and you don’t know what it’s supposed to look like at the end. Good luck."

Even with a team of highly experienced developers things quickly turn to a mess if it's not perfectly clear what to work on. I don't mean things like "button should go 1px north to align with header". That's trivial. In fact, I would go so far as to say: most production-ready programming is trivial. Trivial in the sense that someone, somewhere, solved it in a way that's correct. But, when it is not /crystal/ clear what the requirements are, what the business case is, or what the edge-cases look like: oh my. I've been in the incredibly unlucky position of only being in settings like this. Start-ups, academia, R&D positions. And, as a result even with the best intentions and the best teams you'll still produce production-quality garbage. Mostly because incentives were misaligned and there was no coherent vision. Agile story-driven development only works when there is leadership that has the guts to say "no", and when there is something well defined to work on. Sometimes I yearn for some CRUD webstore ... at least then the things are knowable. Anyway, just my rant. I'll probably go the meme-way of becoming a plumber/welder/etc at some point ;-)

> helped mentor their developers

This is typically missing from startup culture.

Anecdotally, mentorship improves everyone involved, but I don't know if there's data out there showing if engineering mentorship within a startup improves its chances of success (or is otherwise beneficial [1]).

To some extent, groups like YC participants and alumni end up being mentors to each other, but I doubt that extends beyond the founders. Might the OP have an opinion on something like a rent-a-mentor program/startup?

[1] Not that certain benefits, like productivity, would be enough to motivate a change, as evidence by the open plan office situation

> If you read this far hoping for an answer, then I’m sorry: I don’t have a simple one. This is a difficult problem to solve. The solution is too large for a single blog post, changes every day, and differs subtly for every project.

[Cough] - Hire a few more (expensive) experienced developers? It might not be a silver bullet, but it goes a long way towards avoiding the situation described in the blog post. Also, get help from a trusted technical friend to hire the initial one.

FWIW I see something similar with almost every client I get in my own neck of the wood. Growth is flat. Turns out nobody is worrying about whether they're building a product anyone wants - they're doing it on paper but nobody is actually talking to end-users. Marketing is under-delivering. Turns out it's all juniors worrying about bringing traffic without wondering if it's the right type of traffic; or they're not measuring anything, plain and simple. Sales aren't selling enough. Turns out it's all juniors who aren't selective or systematic enough; or sales are doubling as pseudo-project managers and/or pseudo support-engineers because nobody is managing operations. Projects take forever to deliver. Turns out the project "manager" holds nobody accountable to deliver any specific task let alone by a certain deadline. The list could go on, and on, and on.

Naw, you can totally write garbage code and still get acquired for $100 million. Happens all the time.

MSDN used to make their code samples as simple as possible for educational reasons, until they realized everyone was just pasting them directly into production applications. So they started adding error handling, etc.

Doesn't this highlight a continuous issue,that folks in our trade, do not have any meaningful and up-to-date professional certification?

Is there a clear indication that certification helps in other fields?

(Admittedly this seems like it'd be very hard to verify, since most fields either legally require certification or don't have it, and there's probably tons of other variables that would get in the way too.)

Most professions don't eg when you have passed your bar exams taken silk or what have you - you do have cpd but these are self certified.

Another angle to look at this through is that they hired a team in spite of only having business, non-technical minded people. It all worked out in the end when they added consulting to clear up production issues near launch. From a business perspective this is surely a win in all ways, even if not as efficient as it could be. Not every business is a Google.

> They hired the first developer, and he vetted the second developer, and so on until they had a development team

This is hard for a startup in their early stage. Not sure how mature the company was, but let's assume it was young enough to have this very problem.

Direct team hire is a double-edged sword. It works well when your team is very competent and good at weeding out false positive. Basically, when you have a bunch of senior people with maybe a couple junior and mid-level teammates. A normal distribution is actually perfect for a team. Having a direct contact with your future teammate allows you to know the person ahead of time, especially for a very team. Furthermore, direct hire means you are hiring not just a generalist. Your SWE may not know what to ask a network engineer, even from a coding point of view. The downside is what happened in the story.

The way I'd like to go about it is to make sure we have two interviewers per round, even from the same team would be far better than one person vetting.

> Before I go any further, I’ll continue my story.

It seems to me that that sentence has no significance at all and can be removed safely ;)

Point taken, and sentence removed. Thank you!

If you’re an amateur or a hobbyist then these excuses are all just fine. Stuff is hard. But a professional should not being doing this sort of thing - we’re literally talking about copying something labelled do not do this. The moment you ignore that warning is the moment you’ve forsaken your professional development. This problem only compounds.

That said, a professional can still cut corners as and when appropriate (and clearly label and explain such things) but that’s not what’s happening here, is it?

The advice I would give in this situation is simple: stop and think about what you’re doing. If you can’t do that, you’re not a professional, you certainly shouldn’t call yourself an engineer and without that skill you stand no chance of improving.

This is interesting - many of the comments here are correctly asking "what if you don't know an expert to help vet the engineers and the system?" I feel the same way when trying to find dentists or accountants. I simply don't know how to know who will perform completely!

So, here is my pitch: if you don't have technical expertise, and you realize that you need someone to help understand just how the heck to get started with technology, what to build, and determine if you even need engineers, then hire me! I'll help!


I usually leave a URL for stack overflow answers or other advice as a comment in what I'm working on. I know some areas of my code are more sensitive than others. It is good to have an understanding of what you are doing of course. Sometimes a stackoverflow link is removed simply because the mode of thinking is wide spread in the application and I no longer need the note to it. Might be worth searching for my links in the code when it is time for release incase there is any funny business. Stackoverflow usually tells you if you "shouldn't do it in production".

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact