I've seen bugs that user's don't notice.
I've seen bugs that user's have just gotten used to.
I've seen bugs where that failures it caused were dismissed as the system being a bit temperamental, for many years.
I've seen bugs that financially affected thousands of customers, and remained unnoticed over the course of several years.
I've even seen bugs that only remained because, by luck, the preconditions to trigger it had not been met.
I can assure you, you do find bugs in legacy code.
Any bug in a large codebase that's older than a certain point becomes a feature. ;)
That's an empirical claim and I doubt it's true. Sure, sometimes you'll break something else, but 50% of the time plus? I'm gonna need to see some evidence.
I've seen a lot of similar stories. General advice: if you want to find out whether anyone really uses something, the relevant time span is a year and a day. It's theoretically possible longer would be needed, but a year and a day will cover almost all cases.
In the past I considered that a bug in human behaviour, but actually it is admirable. You give people a broken tool, but instead of failing, they find a way to succeed using that broken tool. And of course after the effort they are not as willing to change to anything else. Don't change a running system and such.
There is an old engineering truism: "if it ain't broke, don't fix it"
When Lindbergh flew across the Atlantic in the Spirit of St. Louis, the Wright Aviation Company which made the engine for his plane, wanted to do its best to make sure he made it. They carefully built and checked the engine and hand delivered it with a note: "don't monkey with it" (or words to that effect). Lindbergh was a careful guy and he didn't.
Businesses and developers need to distinguish between warranted or needed changes to working systems (software or otherwise) and ego trips.
Real systems are built in limited time, with limited resources, even if they have a multi-billion dollar budget. Invariably there are things someone can make an issue about even in a system with a perfect performance record. Why did you hard code 3.14159267 instead of having a general variable for PI initialized to 3.14159267, and by the way why do you have only 8 decimal places -- wouldn't 16 be better? It is an ego trip and it is the ancient territorial imperative, marking your territory by deprecating the previous person or team -- or a colleague.
Joel Spolsky wrote a famous and widely ignored blog post on this sort of thing:
Take a deep breath and ask yourself: is this something we really need to do or am I (are we) showing off/putting someone down/behaving like a spoiled adolescent?
Consider Chrome's recent change to mark sites as insecure if they have a password field without TLS. Will my grandma's bank page be reported as insecure? There's no way to know a priori, or through isolated unit tests. The Chrome team has to go out into the Internet jungle, reaching out to the site owners and pushing until the major sites are ready, and any holdouts can be considered acceptable casualties.
With that understanding, whether code is "legacy" isn't about age or budget or level of support; it's about how much influence you have over your clients. Some code is never used and so never becomes legacy, no matter how old it is. But other code is born legacy: it gets clients immediately and pays the price for being popular.
I have a piece of old code like that myself kicking around that tends to overestimate a certain value. I also have some new code that uses a better model and gives a more accurate value. The problem is I cannot really roll out the new code since it will make all the new things look worse than all the old things, even if they're actually the same or perhaps even better. So I use the old code, because even if it's wrong, at least it's wrong in that same way as all the other results and people can make relevant comparisons.
That's his point. If you have a test-case for the behaviour, you know that this is how it should actually behave.
Anything not covered in tests however, that will naturally be covered in uncertainty for any weirdness, quirks or odd behaviour. Should it be there or not? Hard to tell when it's not formalized in a unit-test.
Unit tests, which don't operate on that abstract high level, give you relatively little safety. Sometimes they can ensure that an algorithm is not broken, but algorithms is usually not where the legacy code has issues. Often unit tests are at the level of "assert that X function calls Y stub Z times".
That's a terrible test that should never pass code review. It's worse than no test at all - it's brittle and guarantees nothing.
Writing good tests is _hard_.
But in that specific case, no amount of tests would have helped us...
Later, when you have some test coverage, you can add unit tests. In some cases, you can even try to refactor without tests first, and then add tests once you have a testable API. That's the whole point of the "BabySteps Timer" kata I have linked in the post...
Still, in the case I described in the post, a test harness would not have helped us. The only thing that would have helped us would have been to ask one of the few users who needed the features. And we didn't know them...