Part of the problem is caused by all sides using the same terms but with a different meaning.
> You just don’t know if your system works as a whole, even though each line is tested.
... even though each line has been executed.
One test per line is strongly supported by tools calculating coverage and calling that "tested".
A test for one specific line is rarely possible. It may be missing some required behavior that hasn't been challanged by any test, or it may be inconsistent with other parts of the code.
A good start would be to stop calling something just executed "tested".
I like the term "exercised" for coverage of questionable assertative value. It's rather pointless in environments with a strict compiler and even some linters get there, but there are still others where you barely know more than that the brackets are balanced before trying to execute. That form of coverage is depressingly valuable there. Makes me wonder if there is a school of testing that deliberately tries to restrict meaningful assertions to higher level tests, so that their "exercise" part is cheaper to maintain?
About the terms thing:
Semantics drifting away over time from whatever a sequence of letters were originally used for isn't an exception, it's standard practice in human communication. The winning strategy is adjusting expectations while receiving and being aware of possible ambiguities while sending, not sweeping together a proud little hill of pedantry to die on.
The good news is that neither the article nor your little jab at the term "tested" really do that, the pedantry is really just a front to make the text a more interesting read. But it also invites the kind of sallow attack that is made very visible by discussions on the internet, but will also play out in the heads of readers elsewhere.
> Makes me wonder if there is a school of testing that deliberately tries to restrict meaningful assertions to higher level tests, so that their "exercise" part is cheaper to maintain?
That's a nice summary of the effect I observe in the bubble I work in, but I'm sure it is not deliberate but Hanlon's razor applies.
With sufficient freedom for interpretation it is what the maturity model of choice requires.
> You just don’t know if your system works as a whole, even though each line is tested.
... even though each line has been executed.
One test per line is strongly supported by tools calculating coverage and calling that "tested".
A test for one specific line is rarely possible. It may be missing some required behavior that hasn't been challanged by any test, or it may be inconsistent with other parts of the code.
A good start would be to stop calling something just executed "tested".