Nobody has pointed out the fault in this reasoning yet, so I will.
The linear relationship lines-of-code vs. total-bug-count is based on independence of bug introduction in different parts of the code. That is, introducing a bug in line 10000 is more-or-less independent of adding line 20000. For product code that is arguably the case, but adding test code into the mix, this basic assumption doesn't just not hold, but it's turned upside down.
Test code is not customer-visible, but only dev-visible, with the sole purpose of finding bugs. Thus, adding test code to a code base decreases the average probability of a bug for lines of product code. More formally, if you have N lines of product code and thus c * N bugs for some fraction c, then adding M lines of test code does not increase the number of customer-visible bugs to c * (N + M). Instead, it reduces that number to c' * N with c'<c and the difference being caused by your test coverage. (A 100% test coverage, i.e., an exhaustive test without bugs or, equivalently, a formal verification, would bring c' to 0.) Sure, the M lines of test code may well have bugs on their own, but that only increases c' slightly while keeping it below c, and more importantly, those test bugs are not customer-visible. They only annoy developers.
I agree with the rest of the post though.
If you have one person entering data into a computer, then the odds of them introducing an error and failing to spot it are fairly high.
If you have twice as many people entering twice as much data data, then the odds of an error getting introduced are roughly doubled.
However, if you have those two people entering the same data, then their mistakes cancel each other out. If person A and person B both entered the same thing, it's extremely unlikely that it's incorrect. If they differ though, the a problem has been identified, and can now be fixed.
The odds of both of those people entering the same piece of data incorrectly is tiny. Likewise, accidentally introducing a bug into both the production code and the test is pretty unlikely.
That said though, if those two theoretical data entry people above are given the wrong data to enter, then they the system cannot protect them. They will both correctly enter incorrect data. "Garbage in, garbage out".
Likewise, if the requirements of a piece of software are poorly understood, then it is quite likely that both the test and the production code will implement the same "bug". Writing tests won't fix a failure to understand the problem you're trying to solve. And they're not supposed to.
This is just not correct. There may be a systematic reason why they are making a mistake (e.g. a miss-pronounced word) in which case increasing the confidence intervals does not increase the accuracy. Check out the concepts of accuracy, precision etc from physical sciences.
What you're actually saying is that reaching full exhaustive testing is near impossible.
If I understood func(5, 6) should return 9, but the client actually wants 10, no amount of tests I write will reveal the error.
Of course, you will never get full coverage, but tests are a programmer's tool. It allows you to reduce the amount of time you spend inspecting parts of the code you aren't working on. For example, when I'm writing code with good test coverage, I can find out what happens if I change an attribute in a structure simply by doing it and watching the tests fail. Each failure tells me where in the code I'm likely to have to make a change.
If you make a small viewpoint change that tests are not to find bugs, but rather are a tool to help you understand the code base, they make a lot more sense. (IMHO)
A test introduces the same constraint. Instead of implementing the function, it runs the function. No matter what you do, the test will run certain code and get a certain result. Tests pass when the constraints introduced by the production code match the constraints introduced by the tests.
Because of this, if there is a software error in the production code, usually you need to have a corresponding error in the tests in order for the test to pass. This definitely happens from time to time, but the odds of it happening are much lower. Similarly, if you have a software error in the tests, you need to have a corresponding software error in the production code in order for the test to pass -- assuming that the test is actually exercising the code (something you can't always assume, unfortunately).
Then you just get blank stares when you try to explain why what they've just done is completely pointless.
I'm currently writing on a secure handshakes generator. https://github.com/LoupVaillant/Monokex It generates specifications and source code from a Noise pattern. A bit like Noise Explorer, only with less features. Previous versions of the code were hand written, and the generated code is doing its best to look like it was hand written. Here's the result: https://github.com/LoupVaillant/Monocypher-Handshake/blob/ma...
Now how do I ensure the code match the specs? (The correctness of the specs itself is currently checked by hand.) I could write another implementation, but I'm only me, and I can't go erase my own memory to make a clean room implementation and compare the two.
Instead, I took the specs, and wrote code that takes all the inputs, and spits out all the intermediate buffers and outputs, paying no heed to stuff like order of execution, or who does what. I only concentrated on generating the test vectors: https://github.com/LoupVaillant/Monocypher-Handshake/blob/ma...
The structure of the vector generating code and the actual production code are very different. This is how I decorrelate mistakes, and make sure that if the two "implementations" agree, I'm very likely to have something that works.
Of course, this is all a lot of effort, so I wouldn't do this for non-critical code.
That is unproven conjecture. It feels right, but just because it is written as a formula doesn't automatically mean it is correct.
If your testing is such that this isn't true then you are doing testing wrong, in a self-evident sort of way.
Hmm. Sounds like you are doing some sort of testing.
If adding test code makes your code under test more bug-prone, something has gone horribly, terribly wrong.
How do you define the meaning of it, and could you illustrate how it could, on average over a code base, increase the number of bugs in the product code under test?
In bad cases you get no return on the time spent writing tests and simply end up with less product code (similar bugs/LOC, so fewer bugs and reduced features). In terrible cases you end up with rushed product code and more bugs/LOC.
That's really superficial analysis. Yes there is some fundamental tradeoffs. But it is possible to change the properties of a system or combine them in intelligent ways and move the curve.
Or it's like saying that speedometers make your GPS less accurate due the Heisenberg principal.
I'm not advocating for TDD (the programmer methodology in the IDE) but the author's explanation about "test code" isn't correct. Code written for explicit purposes of a test to exercise other code has been shown to increase correctness. E.g. SQLite database has 711x more test code than the core engine code. (I made a previous comment why this is possible: https://news.ycombinator.com/item?id=15593121)
Low-level infrastructure code like database engines, string manipulation libraries, crypto libraries, math libraries, network protocol routines, etc can benefit from a suite of regression tests.
It's the high-level stuff like GUIs in webpages being tested with Selenium or LoadRunner that has conflicting business value because altering one pixel can have a cascading effect of breaking a bunch of fragile UI test scripts.
I distinctively remember once posing that question in a meeting about testing, and a manager replying --- seriously --- with "then perhaps the test code should itself have tests." Someone else must've come up with that before too, because (at a different job) I've also worked on a codebase where a surprising number of tests were basically testing the function of another test.
Case 0: No bugs in the test code. All is well.
Case 1: Bug in the test code that causes some bugs in the real code not to get caught. That's bad, but you're no worse off than if you didn't have the test at all.
Case 2: Bug in the test code that causes correct real code to look buggy. Result: the test fails, you look for problems, most likely you find that the problem is in the test code and fix it. Going forward, you have a working test.
Case 3: Bug in the test code that makes something else break. This can happen and is genuinely bad, but (1) it only affects testing, not your actual product, and (2) most bugs don't behave that way.
The test code is a net win if the bugs it catches in your real code are worth the effort of writing and debugging the test code. That's no less true on account of the possibility of bugs in the test code. It just means that when you estimate the benefit you have to be aware that sometimes the tests might be less effective because of bugs, and when you estimate the cost you have to be aware that you have to debug the code, not just write it the first time. And then you just ... decide what's the best tradeoff, just like everything else in engineering.
(And no, you don't need tests for your test code. The test code is a test for itself as well as for the code it's testing.)
Code | Test | Result
fine | fine | a) We have a regression test, yay!
fine | buggy | b) Someone breaks the code to make
the test work. Oops.
c) Someone fixes the test, we have
a regression test, yay!
buggy | fine | d) We'll fix the code, and we have
a regression test, yay!
buggy | buggy | e) Bug remains in code. Oops.
Code | Test | Probabilities | p = .01 (one in ten)
fine | fine | a) (1-p) * (1-p) | .9801
fine | buggy | b) (1-p) * p * .5 | .00495
c) (1-p) * p * .5 | .00495
buggy | fine | d) p * (1-p) | .0099
buggy | buggy | e) p * p | .0001
no harm done .9801 (a)
bugs found + fixed .01485 (c,d)
bugs introduced / not found .00505 (b,e)
Also the probability on introducing an error in the production code and the test code, might actually not be statistically independent, which I assumes here. So take with a grain of salt.
[Edit] Actually d) could also end negatively. Guess a working model would have to take into account that on failing test cases, a sensible developer should take a step back and reason about why this happened. So the negative outcomes would be (hopefully) less likely than the positive ones here.
To get some actual numbers, I choose an error rate of 1 in 100 lines of code (0.01). This is totally subjective and probably larger then in reality, but it does not hurt to be pessimistic here.
a) => the production code, nor the test code contains any bug. So if the probability for an error is `p` (.01), the probability for error free code `1 - p` (.99). If both events are independent, we can multiply them to get the probability of both happening at once `(1 - p) * (1 - p)`.
Good unit and integration tests are a rarity. Instead tests like the above ones which actually are being developed like a side-project to the real project are the norm and they are bogging down the whole project. But you cannot deliver any code that is not "covered" because that would be against the current 100%-TDD bible/best practice/call it however you want.
So the next dev/maintainer has to work out a badly written test (probably bearing untrue assumptions about the programs intended business logic) and after some fun hair pulling he does the reasonable thing which is working around the test or pampering the test to get it to pass. And that is how you end up with test code that is actually buggy and problematic in itself and does not really test much but it is increasing the test-coverage holly counter.
Where instead the amount of test code can become a problem, in my experience, is with maintenance. Efficiency is very important.
If the UI tests catch bugs during development or help the team during a data migration, they're probably still worth having.
Your UI being off by a pixel won't break your application, so if a test hangs on that, then it is not a good test.
However, your business logic, or network protocol routine, those should not break even if you heavily refactor or add new features (especially business logic where a broken behaviour might seem correct), so those need to be heavily tested.
If it is hard to test the juicy parts like business logic without also dragging in the UI, different OS/platform/db parts, etc, then you should look at how your application is structured and if it is really optimized for writing good tests.
You should also write tests for things that are already broken before you work on the fix so that you can be sure it's actually fixed. Basically the red/green/refactor cycle from TDD.
For brevity in the previous comment, I didn't fully flesh out the background on why fragile UI tests get created. It happens accidentally.
What sometimes happens is the the UI tester uses a "macro recorder" to record mouse movements and clicks. But then, a programmer shifts the position of a zipcode field by one pixel which throws the script off because it expected UI elements in a different spot. Fixing the broken UI tests is time consuming and can leave a bad impression that tests create a lot of effort for very little payback.
The return-on-investment of UI tests depends on the business circumstances. I'm guessing Boeing and Airbus have automated UI tests that sometimes break when programmers change things which causes rework. However, the pain of fixing the UI tests and keeping it sync'd with the UI code is worth it for avionics software.
I think in an abstract sense control theory is a reasonably good bet.
I can't say I know it deeply, but a lot of the ideas resonate when I think about software engineering. If you think of everything as basically an n-dimensional vehicle with an interface to control it and the control mechanism is used to set all the parameters relevant to the system then a few things follow:
every system has a safe operating envelope
parameters are usually linked to eachother such that turning one up turns another down
With every decision we make about systems we build and run we are essentially trying to steer them, albeit clumsily, in this manner.
while I find it hard to find anything I'd recommend about the authors article as most of the reasoning seems a little off, I can understand the sentiment of the article .
Agile and TDD are really recognition of control theories ideas of feedback loops keep things in better control and can adapt faster vs long feedback loops going out of control far easier. This is more targeted at the human side of creating software. Nothing to say there aren't better strategies than TDD and Agile techniques, however I think that principle of feedback loops to give confidence will stay in some form. I think there is a LOT more to be said about engineering / designing correct, robust, and secure software.
This in my experience is the most important factor.
We all think our pet subjects are the right lens to view the world with.
> So here’s the punchline: if you want to be a good programmer then learn a technology and language agnostic formalism. Logic, statistics, and game theory are probably good bets. As things stand that kind of skill is probably the only thing that is going to survive the coming automation apocalypse because so far no one has figured out a way around rigorous and heuristic thinking.
I think there's a lot of support there.
I don't think using Kubernetes as an example of "sequestering the complexity behind a distributed control system" was a good follow up to TDD generating more lines of code. Containers are a step in the right direction and Kubernetes _is not_ the best option for using containers in production but it _is_ the most popular option and so if you want community mindshare and support then it probably is the best choice if you can manage it or use a managed service.
"Serverless" is real, it's here, and containers / k8s are just a step along the way.
Also it may not really matter as you said Kubernetes is the most popular and is only getting more so at this point and the network affect is so strong in tech like this that the "technically best" most likely will become a moot point.
Kubernetes has for sure won the popularity contest but the overhead involved in running it The Right Way™ on your own is a lot. Given what I've seen I would advocate for OpenShift if you like RedHat products / projects or sticking with Kubernetes from one of the well-known cloud providers.
Some of them had already started doing React for frontend and just need the backend work.
For example you can use the same React components for (pre-) rendering pages on the backend so the site feels quicker to load.
As an aside, I’ve never been able to get anyone to explain to me why k8 and not something like mesosphere’s DCOS, other than “google”.
Anything you can share there?
However I agree with the article's general idea. In the aviation industry there already are languages abstracting computers' internals and allowing programmers to reason about safety-critical programs using more high level constructs.
Due to its nature, I think there won't be such a technology for general purpose languages -- in order to be general enough, you can't have too much things abstracted away. Maybe we couldn't go much farther than what languages like Basic allows us.
On the other hand, I wish we had such languages for more specific tasks like ERP-like software, business web applications and so on. It's worth noticing that many of the biggest ERP companies in the world have their proprietary domain specific languages.
So here’s the punchline: if you want to be
a good programmer then learn a technology
and language agnostic formalism.
If you want to be "a good programmer", then learn how to define the problem for which you are tasked to solve. The technology is irrelevant. The "language agnostic formalism" is irrelevant.
Unless a person/team knows what must be done, then the rest really doesn't matter. Techniques which help to elicit repeatable delivery certainly are worthy to learn, even to advocate for. But without understanding what is needed, what use are they?
PS: I don’t disagree that in some cases, academics can use the formal methods to find a (near) global optimal solution. But I don’t think it’s practical in a daily context, nor necessary. Our evolution is the best proof that local optima can lead over time to fantastic solutions.
> It takes a particular kind of masochist to enjoy reverse engineering a black box from just poking at it with simple linear impulses.
These are great observations and brilliantly put. In particular the second one I think rightly explains why some very smart people definitely do not take to programming as a profession.
I've worked with teams who have had to suffer through a 45-minute standup every morning that was immensely painful. They were just following the process as best as they understood it from a couple of days with an "agile coach" and didn't really understand what being agile really meant.
> Managers who don't themselves write any code, but micromanage their workers with their version of Scrum, are the worst.
I don't think anyone would disagree on the evils of micromanaging, but as a manager that doesn't code, I think that managers who do code are depriving their teams of the most valuable thing they have to offer--their support. If you take on a coding task and put yourself on the critical path of that sprint's work, you have to commit to putting the hours in to get it done. This is not a good use of your time.
This is the difference between a manager and a Tech Lead.
XP is better simply because it's based on engineering practices first.
The trouble with any approach is that it's a template and some teams / people rest on those templates in place of actual thought.
Like the others, I think the thoughts on TDD are somewhat fallacious but I certainly can't fault the conclusion.