But a 50x win that, as you note, goes from "let's have lunch" to "let's change this one thing ten times and re-do the analysis to see which gets us the best results" is a huge game changer; it's not just that it saves time, it's that it makes possible new ways to work with data.
A little bit after I have made the change, an user filed a bug  to the library about slow behavior when you passed a large amount of data to the library. They were using the public release instead of git master, so my code wasn't at fault thankfully. Due to the knowledge I attained from doing the change, I could quickly confirm that precisely this slow part was cause for the performance slowdown, and was able to file a PR to reduce the library's complexity from quadratic to linear .
It wasn't very complicated from the algorithmic point of view, I only had to create a lookup structure, but the lookup structure had to be created in a certain way so that the library still behaved the same way as tested by the test suite. It also had to support things like duplicates as the original implementation also supported them, as a very important feature in fact (toml arrays).