Hacker News new | past | comments | ask | show | jobs | submit login
Power of small optimizations (maksimkita.com)
167 points by stacyz on Feb 10, 2024 | hide | past | favorite | 73 comments



10 years ago SQLite made a release dedicated to lots of microoptimisations that resulted in massive speed ups https://sqlite-users.sqlite.narkive.com/CVRvSKBs/50-faster-t...

Don't underestimate accumulation of small improvements.


Hmm, but HN assured me that "premature optimisation is le root of all evil" (nevermind that that's an incomplete quote) :)


Sadly, HN was not able to help you understand what premature optimization is.

What the SQLite team did was systematically measure the performance of their system, identify potential bottlenecks/inefficiencies, create and measure improvements to these bottlenecks and choose the better option.

This is what we call optimization. Premature optimization is when you have no idea how your system really performs, but spend time trying to make it "better" without having a way to determine whether your change made any difference.


I thought premature optimization was when you decide to optimize a bottleneck but later end up rearchitecting and tossing it all anyway.

I'd refer to what you're describing as messing around.


I'm not going to pretend to be some sort of authority on the subject, but personally I think both fit the bill. It is premature to optimize when you don't know how your system performs, and it can be premature to optimize if you might throw the solution away. However that last part isn't always the case. Maybe the reason you throw it away is that another solution is more performant - you had to optimize both to find that out.

To me, the core of the issue is knowing what you're doing. If you know what you're doing then it's generally not premature. You have a reason for what you're doing. You're sure that you're working on the right part of the code and you are properly measuring the difference in a meaningful way.


What part of SQlite 3.8.7 are you saying is premature?


I don't think they were saying that SQLite's optimizations were premature (since clearly they benefited the program as a whole), but that people conflate micro-optimization with premature optimization.


Yeah, premature as in when you're designing the first prototype and haven't yet had real life usage data on what's the bottleneck.


Premature slogans are the root of all evil.


That optimization can be effective doesn't change that spending time on it too early is often problematic


This guy took one look at some scientists' code and instantly got a 14000x speedup:

http://james.hiebert.name/blog/work/2015/09/14/CS-FTW.html


Yeah. We had an in house report generator that was epicly slow (about 7 minutes to gen a single report)

We frequently have strange issue come in cause fucking vendors can’t stop making changes, then saying “nobody changed anything that would cause that”!

Anyway. I told my boss I’m going to spend a day optimizing this, and got back “we had people try, it’s not really a valuable use of time”.

Long story short, I know that processing 10,000 lines of data should absolutely not take 7 minutes and now it doesn’t. Now we can run our entire day in 30 seconds. No threads. No async. Half a day of optimization.

I think part of the problem with optimization today is that people are completely clueless about how fast computers are, and thus stay happy with being slow just as a matter of ignorance. The FP community pushing that all optimization is premature optimization doesn’t help either.


Urgh, reminds me of a production error that was caused by a job running on a timer, assuming that some other job would have finished by then. That lead to two obvious questions:

- Why is it on a timer, instead of being triggered when the first job was done?

- Why was that first job taking over an hour to process XML data?

Turns out the second one was accidentally-quadratic, since it was parsing a whole document to process its first record; then re-parsing the whole document to process its second record; etc. and this made it very slow when the XML was reaching several hundred MB.

Unfortunately, the solution we went with was to set the timer to run 20 minutes later...


I remember when I was in COMPSCI-101 or something like that and was introduced to sorting, and after coming up with a basic bubble sort algorithm, thinking that there was no way that could be improved because it looked self-evident that you needed to do all the work the algorithm was doing or you couldn't be sure the list was really sorted. Then, of course, I was introduced to multiple better algorithms like merge sort and quick sort which showed just how wrong I was...

Perhaps, for the "scientist" writing that initial code, their algorithm was also already the best that could be done because they just don't know much about algorithms and techniques to make them faster?! But once you learn about things like divide-and-conquer, work avoidance etc. it becomes pretty much self-evident when your algorithm sucks... it's really not that hard - definitely not for people whose job is stuff like climate science who have a very good understanding of maths.

Maybe by teaching scientists the basic "smart" algorithms and just a little about algorithm analysis (Big O notation), massive computation improvements could be had.


For you, me and James Hiebert, this sort of optimisation is exhilarating. Getting a half-hour job down to milliseconds? Awesome!

But for the scientists that wrote the original code, maybe not. Maybe they think of this sort of thing as drudge work, something that doesn't really hold their interest. Their fun is in designing the mathematical concept, and turning it into code is just a chore.

So yeah, we could teach scientists. But even better would be if we could provide scientists with tools that are just naturally fast when expressing problems on their own terms.


> Their fun is in designing the mathematical concept, and turning it into code is just a chore.

It's not about fun. Sometimes it can be the difference between something being even possible or not. In this case, the author said they ran this algorithm hundreds of times. So changing it from 30 mins to 0.1 second makes things that were impossible before, possible. I don't find it fun at all to optimise, but I can see where I may need to in order to make things better, or possible at all... what I am suggesting is that anyone writing code, scientist or not, need to be aware of this - and know when they just MUST optimise.


As an ex scientist, I think basic algorithms theory should be incorporated into scientific computing classes. I took a few of these but none of the concepts from this area was covered. I remember well discovering some code of mine was slowing down with system size and finally realizing it was because “append” was creating a new array each step… had no clue that would happen. Was enthralled by the online algorithms course when I finally discovered it - hash tables!!!


Also, spending three hours optimizing away thirty minutes you only run four times is a net negative of one hour.

You have to optimize the system, as a whole.


The irony is that his solution has an unnecessary log(n) term, because it's finding each point independently instead of taking advantage of the fact that the input points are also a dense grid. ;) so it can probably run 2-10x faster than the optimized version, assuming that the function call to bin search is probably a lot more expensive than filling in the grid, which R can do internally.

(You don't have to binary search each time, you have a known starting point from the previous point. So you binary search once to find the top-left point and the rest is just checking if you moved into the next global grid cell.)


"Someone improved my code by 40,832,277,770%"

https://www.youtube.com/watch?v=c33AZBnRHks


Thanks for the link, this was very entertaining.


With a bit more digging, but very similar https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...


Interesting article!

> Furthermore, there is literally no way to tell whether your program will ever actually terminate without actually executing it. And there are many, many, many, ways to write a program which make it “slow” to execute. “Slow” being… like really slow.

Nitpick: there are some languages where you can tell whether your program will terminate without running it. There are even languages where you can tell whether your program will be slow.

And there are even more specialised languages where any program will be 'fast'.

Obviously, these languages aren't Turing complete. It's an interesting exercise to explore the design space to find languages that are both useful and avoid being Turing complete.

Agda is an example of a language that allows you to run approximately any practical computation, but it's not Turing complete. Oversimplified: in Agda a program is only considered valid by the compiler, if it comes with a computer-checkable proof of halting on all inputs.


> Obviously, these languages aren't Turing complete.

Interestingly, a Turing complete language without non-halting programs could exist conceptually (rather, many do); it's just impossible to know it has that property (hard to use in practice if you don't know it has the guarantees you'd like), and it can't be finitely describable (impossible to implement a compiler for it which doesn't sometimes "succeed" at compiling invalid programs).


You can also construct programs in Turing complete languages to always halt (e.g. by bounding the number of iterations of all loops, usually with the tradeoff of limited input size).


Yes, you can program in a subset of your language that is not Turing complete.


You'd also need to forbid (even indirect) recursion, i. e. one function cannot appear in a callstack twice.


Isn't that overly restrictive? Some programs are guaranteed to terminate even with mutual recursion.

For example, these (obviously inefficient) functions always terminate for all values of x, since both terminate given an input of 0, and 0 is the minimum possible input to these functions.

  bool even(unsigned int x)
  {
    return (n == 0) ? true : odd(n - 1);
  }

  bool odd(unsigned int x)
  {
    return (n == 0) ? false : even(n - 1);
  }


The point of these restrictions is that you can formally verify that a given program will terminate in a way which can be checked by e.g. a compiler or some other kind of checker.

While statically verifying your snippet is possible, it is also significantly more difficult to do than enforcing rules like "all loops must have an upper limit". I also think that you'd need dependent types to make this approach viable. Your simple example is IMHO rather an exception, just having "unsigned int" parameters / return types would mostly not suffice to statically verify that a recursive method will finish or not. (plus things like global state etc.)


A language with your suggested restriction (and a few more) might be possible.

But it's not the only way to get a total language (that always halts), contrary to what your original comment claims.


Agda doesn't need that restriction, as far as I know. And neither does Dhall. See https://dhall-lang.org/

Neither language is Turing complete.


I think you can allow finite recursions, eg, a maximum stack depth.


But then reaching it will mean a crash. I could use a similar approach to say that any program will terminate if I set a fixed time (instruction count) limit.


Yes — that’s correct.

Programs with a fixed amount of execution time always terminate. That’s why we add timeouts to executions, because they’re a solution to the halting problem, which guarantees execution flow of our machines always returns at some point.


I once took some matlab code and got a 200x+ improvement for a couple of days work, the code was atrocious. So it really matters to understand your tool. I think the person I took it from was afraid to change anything that worked after it was prototyped.


Awaiting “something, something premature optimization” person to show up in the comments.


People always seem to think "premature optimization" means "premature optimization for speed". But there are lots of otherwise-good things that you can prematurely optimize for:

    - modularity
    - readability
    - abstraction
    - maintainability
    - scalability
    - reusability
    - testability
    - cost
We've all seen codebases that are over-abstracted in the name of code reuse, or to adhere to this or that software design philosophy, and are nightmares as a result. "You Ain't Gonna Need It" is a restatement of the premature-optimization adage, just for a different metric.


People deal with the cognitive dissonance by calling these optimizations other things, like “architecture”. But at the end of the day, if you’re changing your architecture for any reason other than correctness, then you’re optimizing. And if you’re having a hard time doing it, then you probably waited too long.

“The Last Responsible Moment” is subtle. You have to include user benefit (they may not care yet) versus snowballing consequences of delay, versus having more and better information later on.

One of the things that drew me to CI/CD was the notion that you’re not going to get better at handling your problems by avoiding them. You can’t learn to pick your optimization battles if you never fight them. You won’t learn something I’ve been saying for almost 25 years now and I still haven’t heard come out of anyone else’s mouth: there are code changes that improve readability and performance. Making those changes as refactors is hardly ever premature.


And the "premature optimization" excuse is used almost every time someone argues that over-abstraction can cause performance issues.


Or other forms of laziness.


"Premature optimisation" refers to speed or size. Using the phrase to talk about those other things is a stretch. (Except maybe cost.)

You can design for modularity, readability etc., but optimising for them doesn't sound right. Something to do with them not having well-defined, agreed-upon definitions of where the optimum is.


I think it's fair to talk about optimisation (premature or otherwise) for anything fitting Blum's axioms. The classic ones are number of instructions executed, wall-clock time, memory usage, network requests, user interactions, calls to some slow function (e.g. the comparison function, for sorting algorithms), etc. but I think it's also applicable to fuzzier concepts.

For example, modularity could be modelled as the time/edit-distance/etc. it would take to swap-out N out of C components, where C >> N. If we're defining things centrally and passing them around via dependency-injection then it's O(N) (each component is defined once, so change N of them); if we're hard-coding references everywhere then it's O(N*C) since we'll have to check for references in all of the components, and each one needs to be checked for the N possible swaps.

On the other hand, dependency injection is less optimal than hard-coding for a metric like API size: for N components, the number of references grows as O(N^2) (since new components need to reference existing ones, and existing ones will need to reference them); so APIs using DI will have O(N^2) arguments.


Fair enough. You can optimise for lots of things.

But when people warn against "premature optimisation", specifically, they are advising you to give higher priority to things like modularity, readability, abstraction and maintainability, and not sacrifice them all for a singular goal, typically speed.


I work with a Principle of Maximum Power guy. He’s insufferable. Too smart to browbeat into behaving, and too old to chalk it up to youthful exuberance or lack of experience. He’s made big chunks of the project comfortable for him and miserable for everyone else, so of course he’ll never leave, and people who know they deserve better do.


Lol, yeah I get this quoted sometimes merely for suggesting the right data structure to use.

"Use a set instead of a list, it's faster". "that's premature optimization!"


Usually this criticism doesn't even hold water if you put performance aside, since a set will often have nicer ergonomics if you find yourself reaching for it. I see far too much code that is like items.remove((item) => item == value) and if you're using the proper datastructure you don't have to handroll your implementations each time, because they are provided by default.


The real premature optimization is using List instead of Set for guaranteed small sizes, since List is going to outperform Set in those cases.


That depends on how list and set are implemented.


Premature pessimization.


Except in your example I’ve had numerous cases where bugs have been introduced because the developer’s primary thinking to pick a set was “fast” and “no duplicates”, only to later realize that order of items is important. And looking back at their choice, dealing with 5-20 items, using a list and checking contains() not only had no performance impact but better reflects the intent of “no duplicates”. Instead the set is left in place and the dev fixing it usually copies the results in a list to be sorted the same order prior to being in the set...


I've never seen this example. There are set implementations which maintain order, but in my experience it's fairly rare to both need "no duplicates" and "items inserted maintain order".

It's fairly easy to remove the set and replace it with something else if this does come up.


Maybe I’m misunderstanding, but isn’t it pretty common to e.g. have a list of non-duplicate items on screen which have a specific order they’re supposed to be displayed in which can be changed as a result of user actions (like adding new items or editing existing ones)?


Maybe? Perhaps that's just a part of my daily dev that is lacking. I generally work on the backend and not the frontend. I could envision how keeping stuff in the same order on a UX would be important.

Doesn't seem like it'd be the general case though. And it seems like it'd be fairly obvious when something like this applies.


Hash tables contain a list internally and sets are hash tables with just the keys in them so they are already lists. With a level of indirection, it's easy to make them insertion ordered just like normal lists. Just make the hash table a map of keys to list indexes. That way the table can be rehashed freely while the internal list remains ordered.


"In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering." - Also Knuth


I agree. But it is critical to ensure that what one is improving is something worthwhile. 12% improvement in compute costs for a project with lots of compute spend = massive win. 12% improvement in compute costs for a hobby project running on a single VM = probably negligible. 12% improvement in function runtime some arbitrary place in the stack = possibly net negative, taking into account developer opportunity costs. It is almost always useful to talk in terms of outcomes, ie benefits to customer/users/team/company.


A 12% overall improvement might be worthwhile. But you have to look at the bigger picture: if part A of your project takes 99.9% of the time, and part B takes 0.1% of the time, even a 100% improvement in part B (ie making it take 0 time) might not be worthwhile.



The original quote is not by Knuth, though he made it well known, it is by Tony Hoare.


Any improvement is significant!! I spent a week implementing an optimization for my language (pypy's storage strategies) and ended up significantly slowing it down instead. It's not guaranteed.

I have massive respect for anyone who can pull off steady improvements, no matter how small.


You’d think they would catch on. But then a foolish consistency is the hobgoblin of little minds.


I love this stuff. My time is split 50/50 between my IDE and Compiler Explorer.

Unfortunately not a lot of employers consider this a good use of time.


That's the sad side. The flipside is that for the employers that _do_ care, being able to use a profiler and repeatedly identify optimizations is considered some sort of superpower that very few of your coworkers possess. :-)


Thats pretty cool, what kind of stuff do you work on ?.


Am I misunderstanding this or is this an article about how Chars[].resize works in a library they wrote by themselves ? Or is it a misunderstanding of how resize works in general ?

I am not trying to take anything away from the author but digging into assembly instead of simply looking up the docs seems like a long road to go for calling it an optimization.

Another thing is that why does this even exist ? The tests are showing some sql executions but is that real production executions or simply to test a possible optimization in a different context.

I am not mentioning in as a premature optimization because I am not sure whether I am bright enough to tell what is going on.

Please clarify


From the article: "SerializationString improve performance" (10.12.2023) https://github.com/ClickHouse/ClickHouse/pull/57717.


The deeper truth is more that impact derives from leverage.

Leverage might reside in architectural changes, but one can also imagine other dimensions (not just altitude in application level) that also confer leverage.


Maybe it's because I am experimenting with improving my posture through small and gradual optimizations, but I expected the article to be about small optimizations in your life, rather than code.


> improving my posture through small and gradual optimizations

Would appreciate it if you could share more.

(I am suffering from back pain from several years)


Small but consistent improvements in general seem like a great idea! Wishing you the best.


Thank you! Yeah, I seem to have come to the same conclusion.


Look at video codecs if you want examples of small optimizations adding together.


> Ratio of speedup(-) or slowdown(+): -1.662x

Wait, what? negative 1.6x?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: