
How PostCSS became 1.5x faster by changing 2 lines of code - iskin
https://evilmartians.com/chronicles/postcss-1_5x-faster
======
struppi
This is _so important_ :

    
    
      Do not even try to write what you think is effective code before 
      benchmarking. The VM has many clever optimizations. And even if 
      you will somehow learn of all of them, in the next release they 
      still can be changed. Instead, write a simple and clean piece of 
      code, make a benchmark, find the real bottleneck and rewrite code 
      in small parts.
    

Every now and then I see some strange code at a client I work with, and the
justification is "Performance". Because somebody _thought_ it might be faster
- But they didn't check it back then. And they don't have an automated
performance test that continuosly validates if their "dirty but fast" code is
still faster than the clean version...

Don't even ask how many hours of developer time I saw wasted because of "dirty
but fast" code. Where the code often was not much faster then the clean
version. Or it was faster, but speed was not crucial in the area where the
code operated.

Also, I guess a lot of the optimizations from old blog posts or the first
"Effective Java" will not yield amazing results anymore, as compilers and
runtimes are getting better.

I also like this one:

    
    
      Do not think that programming in C++ or any other lower-level language
      is a must for having good performance. Good architecture, benchmarking 
      and profiling are far more important.
    

But I don't think it is as clear-cut as the author writes it. For some
problems, having total control over your memory layout and what gets exectued
when (i.e. C or C++ or ...) can lead to huge benefits. And sometimes, the
runtime is just more clever optimizing the code than you are.

~~~
pjc50
There's one thing you can do without benchmarking: approximate complexity
(O(n)) analysis. You need to be sensible about what N is though: even O(n^3)
isn't too bad if it's an operation on a user-displayed list of half a dozen
items.

With a bit of thought this even leads to cleaner code: do you really need to
pass a large chunk of data around when you only need a particular
precalculated quantity?

But generally benchmarking and applying Amdahl's law is the way to go. First
identify which part of the system is slow!

------
adrianN
Things like this make compiler or jit optimizations a little scary to me. It
might not matter to a CSS preprocessor, but if you write performance critical
code, optimizations become a leaky abstraction. Suddenly you have to
understand exactly which circumstances allow your compiler to, for example,
use autovectorization and be wary of updating your compiler, lest these
criteria change.

I think there is some opportunity here to improve compiler output. I know that
GCC can already explain why it doesn't vectorize loops (-ftree-vectorizer-
verbose=2), but many other optimizations without such output can still make or
break the performance of your program.

~~~
hvidgaard
I pretty sure halting problem reduces to figuring out code reordering and
optimization like that. We cannot solve it exhaustively, so instead we do the
next best thing, approximate and provide a common case. I agree that it's not
optimal, but it's the best we can do.

~~~
adrianN
Obviously it's impossible to perfectly optimize every time. I would just like
some more diagnostics from the compiler that help me debug issues with wrong
of missing optimizations.

------
terinjokes
I'm not a v8 expert, but if instead of

    
    
        delete this.indexes;
    

wouldn't the author avoid the performance regression with

    
    
        this.indexes = 0;
    

and still avoid external state?

~~~
arcatek
Probably (or maybe by setting `null` .. i'm not sure how engines deal with
changing data types). `delete` implies that the underlying object schema has
to change. It's an heavy operation.

------
thameera
Isn't saying "1.5 times faster" misleading? From what I see it's gotten 33%
times faster.

Although, you might say that the previous version is 1.5 times slower than the
current version.

~~~
J_Darnley
No. The original wording is accurate. It can do 1.5 times as much work per
unit time exactly correlating with real world speed.

"33% times faster" doesn't make sense. You need to drop the "times" or use
"0.33 times" for that to make sense.

------
mjevans
More interesting would be actually knowing /why/ this made things faster.

~~~
fenomas
The short answer is that v8 has several different ways of storing objects and
resolving property lookups, and deleting a property causes that object to be
handled in the slowest mode (i.e. "dictionary mode", with property data stored
in a hash table).

If you want gory details I recommend Vyacheslav Egorov's blog (mrale.ph).
Here's an old slide of his that lists all the ways you can force an object
into dictionary mode (at least ca. 2011):

[http://s3.mrale.ph/nodecamp.eu/#54](http://s3.mrale.ph/nodecamp.eu/#54)

~~~
mraleph
Some of this slide is a bit outdated by now.

Using Object.seal, Object.freeze and Object.defineProperty with writable,
enumerable, configurable not set to true does not cause object named
properties convertion to dictionary mode. However they still convert elements
storage (one containing properties with names "0", "1", etc) to dictionary
mode.

Similary accessors don't cause object properties storage conversion to
dictionary mode unless there is a transition clash.

    
    
        var obj = {};
        obj.__defineGetter__("foo", function () { return 0; })
        print(%HasFastProperties(obj));  // => true
        
        var obj = {};
        obj.__defineGetter__("foo", function () { return 0; })  // Transition clash
        print(%HasFastProperties(obj));  // => false
    

Nothing changed with respect to `delete obj.foo`: this always converts named
properties storage conversion to dictionary mode. However `delete arr[index]`
doesn't (at least for arrays that were not in slow mode already), rules for
objects depend on the amount of holes in the elements part of the object.

------
nickcw
Doesn't V8 have a profiler? I don't use it so forgive the ignorance. Having to
bisect your code to find a performance regression seems like a slow way of
doing it compared to using a profiler. Could a V8 expert explain?

~~~
fenomas
V8 has a very nice profiler. Functions are the smallest granules it
understands, but it's very good for finding slow functions, and (perhaps even
more importantly) for finding out which functions v8 has given up on
optimizing.

But I'm guessing it didn't help for this particular problem. Deleting
properties from an object can hurt performance in v8, but it's not the
deletion that's slow, it just makes the later accesses to that object slower.
So the line of code that needed changing may not have been anywhere near the
code that profiled poorly.

------
fibo
Nice article, pray this mantra everyday

> never optimize, always profile

~~~
yoklov
While theres a small but of truth in that, unfortunately reality is much more
nuanced than this -- at least, it is when performance is non-negotiable.

Optimization, when done correctly, is a feedback loop between your assumptions
and the profiler. Neither one of these, on their own, is enough -- your
assumptions can be wrong, but the profiler has an extremely poor signal to
noise ratio. (Note: Profiling wouldn't have caught this issue, since the
problem spot was not actually slow, it just made everything else slower)

I work in game development (not primarily HTML5, although I have done
HTML5/WebGL projects), and performance is a hard requirement for me. Not
making 60fps reliably for the hardware we need to support is a show-stopping
bug. The only way to avoid this reliably without needing to rewrite sections
is to design with the optimizations you might need to perform in mind. Putting
optimization off until the end would be irresponsible.

------
kentor
should have used git bisect?

------
qwerty0000
who cares? it's already fast enough

~~~
thedz
C'mon, basically the first paragraph:

> PostCSS, libsass and Less are already fast enough for any real-world task.
> Running benchmarks like these is like listening to audiophiles comparing
> gold-plated cables for their hobby systems. Don’t use these benchmarks as
> the main criteria for decision making when you are choosing a tool.

~~~
15155
If libsass is the benchmark, I can state wholeheartedly that it is not fast
enough (to not do incremental compiles).

