I've had to change an implementation that was tested with the moral equivalent to log statements, and it was pretty miserable. The tests were strongly tied to implementation details. When I preserved the real semantics of the function as far as the outside system cared, the tests broke and it was hard to understand why. Obviously when you break a test you really need to be sure that the test was kind of wrong and this was pretty burdensome.
"..trace tests should verify domain-specific knowledge rather than implementation details.."
More generally, I would argue that there's always a tension in designing tests, you have to make them brittle to something. When we write lots of unit tests they're brittle to the precise function boundaries we happen to decompose the program into. As a result we tend to not move the boundaries around too much once our programs are written, rationalizing that they're not implementation details. My goal was explicitly to make it easy to reorganize the code, because in my experience no large codebase has ever gotten the boundaries right on the first try.