Specifically, you need to slow down for corners in order to maintain your control over the vehicle. Try to drive too fast and you end up wrecked. Even if you don't wreck, correcting from having entered a corner too fast loses far more time than just slowing down properly. Similarly to technical debt, if you start a straightaway from a slower starting speed because you botched the prior corner, that damages your lap time throughout the entire stretch.
In the overall scheme of being able to deliver software you're confident in, it is faster to consider stability inherently in your development process than try to blast out features and bolt on stability later once you're committed to the initial brittle implementation.
We recently did a major refactor of our angular code (~1.5 months) and we've been pretty fast, releasing every 2 weeks. We've begun to outpace a lot of the backend development teams, so we now have more time to squash our backlog tickets. We have the prettiest Jira issue reports out of all teams.
Company values are only meaningful when they differentiate that company from others. Everyone wants to "Move Fast and Not Break Things".
"Move Fast and Break Things" was a value judgement. For Facebook, moving quickly was worth the penalty of sometimes breaking things.
This is why most companies have lame values like "Do your best work" or "Be honest". Great values, but nobody disagrees with them—so they're meaningless.
That being said, I'd also add that most values have those implicitly baked in. For instance, if you do say you value integrity, you're saying you're devaluing growth/money for the value/benefit of integrity.
FWIW I think integrity is a pretty okay company value. Depending on your industry, valuing integrity may be a trade-off that other companies are unwilling to make
I don't believe not thinking is going to help you move faster. You're just going to spend most of your time fixing the problems you hadn't thought of before you started shipping code. And yet this is what we're encouraged to do. We're asked to fix the bugs, do the least amount of work to get it to go, and fix the problems later. We're asked to not think and just code. There are probably a handful of people in the world who can write complex systems in code with consistently correct results. The rest of us need to be more mindful.
We have wonderful tools available to us that can exhaustively check our designs. Using a system like TLA+ allows you to model your system in some pseudo-code and comes with a program that will exhaustively check your code for dead-locks, invariant violations, termination, etc. I've seen it used to find errors in graduate students' binary search algorithms 9 out of 10 times. Imagine what it can do for non-trivial components of your application.
Part of moving fast is knowing how to avoid making errors in the first place. The only way I know how to do that is to think. You can't do this properly in code alone. You need higher-level tools to check your assumptions and find holes in your thinking.
You don't see structural engineers slapping together the first thing that works and patching the building later when parts of it fall down. You don't sign waivers before crossing a bridge that claim no responsibility on behalf of the designers if it falls down while you're on it. Nor do you see engineers accepting more money from people to build private bridges that are less likely to fall down. So why do software engineers have no such liability for their creations?
Why not take a few hours to write a proper specification for your distributed locking mechanism and save yourself days or weeks worth of debugging -- or worse -- liability suits when your mistakes harm the interests of your customers or the public at large? Why not use tools that check your thinking so that you can chase more interesting optimization and performance problems?
That's because if they mess up the first time people can die.
I'm guessing the likelihood of death due to a bug in most web applications is probably much less than that of a bridge :)
Software is great because it's possible to iterate quickly. But ultimately, you have a point. Developers should be willing to fix bugs they've created and/or come across in their projects. We should also try to limit the potential of bugs by releasing small chunks of functional work as often as possible. "Releasing" in this sense can mean to production or just to your qa/test/dev/stage environment. It's just important to have many eyes on the product before it's live.
Besides it's actually not too hard to write simple specifications for the critical components of your system. It has practical benefits like helping you to write good, rock-solid software. It helps you to clearly and precisely communicate your ideas. You can shake out errors in your design long before you write your code. And there are some errors you will only find by modelling and checking your system.
Web applications have come a long way since simple CGI scripts. They often require sophisticated orchestration mechanisms for managing distributed state and processes. If you get that wrong you end up with corrupted data, deadlocks and race conditions, etc. If you had a tool which helped you sift out those potential problems that exist in your design, would you not use it? Or would you rather risk introducing those errors in your code and hope that your customers never encounter them?
The approach of throwing code into production and "iterating" is precisely the problem I was addressing in my OP. In this approach we're asked to not think and to fix our inevitable mistakes later on after users find them for us. We have software to find a reasonable majority of them for us instead so why not use it? It's already fairly common to use unit and integration tests (to make sure our code operates as expected). Why not have a system for testing our designs?
As I mentioned, it's shockingly easy to write an incorrect implementation of binary search that may appear to work under your unit tests. Writing a specification of the algorithm and checking it with a model checker will show you problems you hadn't thought to consider. This is an algorithm that is taught in the first algorithms class you take and yet even graduate students, who scoff at writing specifications because it's so boring, often get it wrong. That's because programming is hard and it is hard precisely because you have to think. And thinking is hard. We should be using tools to help us think.
I think the reality is a virtuous cycle of thinking, coding, rethinking, recoding that gets the best results.
I don't think the GP is suggesting 100% of spec must be completed before the first line of program code is written. I think the implication is that there should still be a dividing line between the design portion (made up of think, code, rethink, recode) and the production development portion (also made up of think, code, rethink, recode). The dividing line does not need to be months, but design as a phase of a project should be considered differently (since it has different priorities) than delivery.
If you don't have firm requirements, then each new fact you learn about your market potentially means you may have to invalidate large swaths of work you've already done. The more planning and testing you've done, and the more thoroughly you've covered your edge cases, the more there is to invalidate.
Most of the big business ideas in the last decade were in areas of extreme market uncertainty. Hence, companies that "move fast and break things" have been at a big advantage. This may change in the future - VCs today are enamored with Perez, and IMHO one of the signals that we've crossed over from the installation to deployment phase of the Perez model is when performance, stability, and security become valued more than features & speed of execution. But I'd guess that we have a few more years of moving fast and breaking things first.
When I first started my current startup idea, the product design was changing literally multiple times a day. I didn't even bother writing any code, it was all in pencil & paper notes & diagrams. Things have slowed down to about a requirements change every 2-3 days now, and I write code but no tests. If you spend 20% of your time fixing bugs and 80% of your time writing or rewriting features, tests are not the bottleneck; it's not an improvement to spend 50% of your time writing & rewriting tests so you can avoid the 20% spent tracking down bugs.
When I was at Google Search, working on a mature product with billions of users, every change got 100% test coverage, and you were supposed to break a test with every change because that's how you know your test suite is comprehensive. And then it went out through a QA department and full release process. But adding a link to the page then took 2 weeks and 600 lines of code, while adding one to my startup takes 5 minutes and about 3 lines of code.
That is a good line of reasoning about when auto-tests are needed.
However you did not take into account time that is required to discover that something is broken. Without tests you do not know if your code change breaks important features. So you should also consider how expensive it is to test your product after every code change. You do not have to test yourself and delegate testing to end users. But turning users into testers could be pretty expensive too (loosing potential customers).
So auto-tests should be introduced much earlier than when you hit "50% maintenance" threshold.
There must be a non-zero number of requirements that are not changing every 2-3 days. Every time you don't have a test to capture a desired behavior that will probably stick around for awhile, that's a risk. Granted, everything is risky in an early stage startup, but I've found all my coding is so much better when I have tests that define the requirements - even vague ones - on which to hang my code.
Otherwise, why am I writing that function I just made? Is it really the most minimum thing required to meet the requirement change that just cropped up? If I shouldn't be writing the most minimum change possible, what other requirement am I silently signing up for?
That is interesting, without giving away in proprietary info, what sort of changes did you have to make to add a link?
The real problem comes when the same mentality is applied to software running locally, perhaps with patches never applied after initial install.
This likely because developers are now so used to being always connected (this thanks to thinks like GIT that allow them to work from just about anywhere with a power outlet and a net connection), they can't grasp that not everything else is.
Even if you don't have QA, when you write a bug, at least ask yourself, "what could I have done differently to avoid that bug in the future?" In some cases, it's as simple as, "put repeated code into a function so it doesn't get written twice." That will cut your chances of getting a typo in half.
I do see lots of cases where people didn't test it or used the wrong mechanism.
I have to go slowly because somebody already went fast and broke it.
And a friend once told me "LZW can compress code faster than you can." :) Which does not go to the value of orthogonality of purpose in source code, but I enjoyed the irony of the comment very much.
As such it's incredibly common to end up with bugs from assigning to "x, y, y" and not "x, y, z" etc. Sometimes a clever compiler will warn you that you have done something odd, other times you'll be left to discover the runtime error.
You're right though, with care and skill, typos really shouldn't be an issue
The vast majority of bugs I've ran across is an incorrect mental model of how something works: your system, some library, or an external system you're interfacing with.
Aside from building a more understandable system (whatever that really means/entails), from my experience, the best way to counteract this is to, while writing code, always ask: "What if my understanding is wrong? What if someone else's understanding is wrong? What can I do to make the system fail in the loudest way possible if someone's understanding is wrong?"
As an example, I have a system for which almost all state is represented as XML/JSON/CSV text when it is outside the system. Inside the system it's tables of tuples with a master table of names cross references.
Each "object" has an instantiator, a "process" callback and ... that's it. It's all done in a timer-based polling cycle ( after a select()/epoll() loop ). Each "object" owns a timer of its own that specifies the minimum poll time. There basically are no parameters after object creation, so you can't get parameters wrong. You have to send XML/JSON/CSV to it to change state. XML/JSON/CSV for initial configuration is controlled and shipped with the executable. For all settable state in the tables, there is a formal validation method and a controlled script to test them.
The the client-writers, I provide correct scripts for each use case that they can then steal or modify.
This way, I just don't have problems.
I believe that churn isn't progress, things with lasting value generally take time, and some problems can't be solved quickly. I enjoy working on problems that benefit from careful, considered, time-consuming thought.
You might desire something different. That's OK, but nobody is obligated to accept fait accompli the universal necessity or value of moving "fast" (and hopefully not breaking things).
For everyone else, a certain amount of breakage is expected. When things break, there are generally manual processes that can be put into place to keep business moving.
'The action led to comments where life was actually being put at risk by the unilateral action: "I needed to set up my department's bronchoscopy cart quickly for someone
with some sick lungs. I shit you not, when I turned on
the computer it had to do a Windows update."'
Question 1: Labor-intensive business or capital-intensive business?
Question 2: Whose elaborate safeguard procedure failed?
 - http://www.theinquirer.net/inquirer/news/2450852/updategate-...
Why the hell wasn't someone checking to make sure such a critical piece of equipment is always at the ready?
Since Windows was never intended to serve a life support/saving function - it was the fault of whatever OEM chose to use it in that capacity. They took on the onus to have it work correctly when they chose it as a platform.
Does anyone know when they took it out? And why?
> “The Microsoft software was designed for systems that do not require fail-safe performance. You may not use the Microsoft software in any device or system in which a malfunction of the software would result in foreseeable risk of injury or death to any person.”
Or if your business is safety-critical, or financial, or storing other peoples' data, or...
Customers do follow dependability and lasting value; nobody trying to their solve day-to-day problems actually wants a treadmill.
Middle managers and ICs, not necessarily, but a company generally can't afford to have its executives be out of step.
Part of it is if they survive, they thrived... But I think the other part is that the code starts to become sharded across people's brains. The technical debt piles up so high in some places that it becomes impossible to maintain productivity with the same number of people.
In other words, try bold things, bold refactorings, etc. within bounds.
If a system is designed with good modularity, this can be done with very low risk.
When someone deploys new code at Facebook, I recall reading that it initially goes out to internal users, then to a small percentage of actual users, then eventually to the entire system.
This is simply a systems-based approach. The same applies when thinking about code in a testing environment, staging server, etc. At each level there are additional failure modes or adverse interactions.
When done stupidly, a cowboy approach can lead to downtime, poorly-reasoned quick fixes, and blame within an institution.
Often the same cowboy/girl who acted stupidly and caused the bug is lauded as a hero when he gets things working again a week later after the bug is discovered... when it was his/her bad judgment (or "move fast" mentality) that led the offending code to be shipped in the first place.
We must be honest with ourselves and our teams about bad decisions (even in hindsight) and build processes that let us be bold when we need to.
Mass transportation is an example of industry where half life is decades/century. Fast for this sector is slower than what the life expectancy of a coder represents.
Games made to support a product ad are ephemeral, 1 month. Faster than a new release of firefox.
Some dangerous radioactive isotopes are 10000 years. So nuclear plant related automation should not be using any CPU.
A citizen data should last for the government as long a citizen live. Government if they aim at remaining stable should consider the half life of their automation to be long, way long. Hence slow changes. Very slow
So do banks. But banks have to adapt with the speed of trades.
Fast and slow can be set by doing something agile hates: careful business analysis.
The problem is how we coupled all the businesses with insane costly rhythm that are forced down on every economical activities by the mean of noncompetitive business practices. Like useless obsolescence driven by HW monopolies, cheap energy, cheap regulated and poor education.
Do we really need 1 Pb hard drive when entropy predicts we will not be able to find any relevant information given we have to much data without increasing the means involves in costly ways?
We don't need fast changing technologies, we need boring, slow changing technologies that are reliable.
This may be true for a web site (or app as some like to call them these days). But the mentality has filtered down to actual software running locally, or even whole OSes (see Android for instance, where more than once Google has pushed a feature into the world before all the proverbial edges have been filed down).
> Be Aware of What You Break
I work at Rollbar (www.rollbar.com), and that point is probably the key to our existence. You can't afford not to know what's going wrong. And the best way to know what's going on is to get a full stack trace with all the information you need to reproduce into a platform that will notify the appropriate people of the breakage.
Try this search: https://hn.algolia.com/?q=fast+break
It sounds like the author could use a dose of his own medicine. The first sentence contains a (common) grammatical error. The third sentence is a fragment. The second is nearly unparseable.