Hacker News new | past | comments | ask | show | jobs | submit login
JavaScript- Lodash vs Js function vs. for vs. for each (github.com/dg92)
63 points by denomer on May 11, 2018 | hide | past | favorite | 62 comments



There's no description of what's actually being tested but you can find it in the formulas.js file: https://github.com/dg92/Performance-analysis-es6/blob/master...

Even on one of the first lines, there was already a mistake with how reduce is called:

    let avg = 0;
    console.time('js reduce');
    avg = posts.reduce(p => avg+= (+p.downvotes+ +p.upvotes+ +p.commentCount)/3,0);
    avg = avg/posts.length;
    console.timeEnd('js reduce')
The way he's using it makes it no different from a forEach. It should be `posts.reduce((p, accumulation) => accumulation + blah, 0)` instead.

Also the timing isn't very sophisticated. Doing microbenchmarking right is basically black magic but the mistakes he's making are really basic (only running each version of the code once, not preventing dead code elimination, etc). Someone mentioned elsewhere that the timing for the large case is faster than the small and my guess is that the entire calculation got JIT'd out since it's unused.


Thanks for pointing out a mistake on reduce, I missed that.

The data collected is not on running it once, the results are avg. of running same code at least 15 times.


Maybe, but OTOH if you need to know about the micro benchmarking intricacies to use some native methods maybe there is something wrong with the native methods.

The difference with a simple for() are quite significant.


Not mentioned here, but worth considering: Lodash's forEach works on "array-likes" not just proper Arrays.

It works on `arguments`, it works on strings (iterating through characters), it works on objects (passing the key as the iteratee's second parameter in place of an index value), it works on HTMLCollection pseudo-arrays. It also doesn't throw on null or undefined.


As of ES2015 you can convert an iterable like HTMLCollection into a regular array by using the spread syntax:

    [...iterable]


You have always been able to convert them with:

  Array.prototype.slice.call(htmlCollection);
Which has been the canonical way to cast them. But it will still throw on null or undefined.


That, or `Array.from`. You can even create an array from a string:

    Array.from('string')
    [...'string']


A simple Object.keys(str|obj|etc...) call on your iterating object makes the other functions work on those data types too.

Lodash may be fast, but recently I've been avoiding the basic "lodash function with native js equivalent" for one particular reason: stepping into js native functions when debugging (node inspect) is a breeze, and a complete nightmare when using lodash.


No. Object.keys() does not allow iteration on strings or `arguments` or HTMLCollections. It only returns an array of indices not values. You can call Array#forEach() on the resultant array, but you'd need another iteratee signature to process them than you would for a proper forEach or for Lodash's method, or you'd have to dereference each element manually. At that point, you might as well run a for loop, since the code will be nearly identical.


In the context of a large client application, I often advise engineers that if we're optimizing things like the types of `for` loops we use, we've won the performance lottery. That is, I've never found a critical performance issue that is the result of using `forEach` instead of `for`.


Agreed. I know that in some small use cases these differences are crucial, but in 95% of situations arguing these differences just feels like a waste of time.


I had the opposite experience. We had a large client application that was too slow, with no obvious bottleneck on the flame graph. I replaced all the functional iterators by for loops among other similar optimisations, and improved the performance by a factor of 50. If you use programming constructs that are 50 times slower on average, your program will be 50 times slower on average.


> If you use programming constructs that are 50 times slower on average, your program will be 50 times slower on average

Well, yes, but the relative difference between a for and foreach loop is miniscule - not 50x difference. In absolute terms, the overhead of both is barely measurable, and is extremely unlikely to be a bottleneck in any client-side application.


I've had it happen once, sort of, on a relatively small collection. Several seconds' delay due to the use of foreach, that were actually annoying production users.

The issue was that the collection being iterated over was a non-generic .NET 1.0 DataTable. Using a foreach loop would implicitly box and then re-cast each object, while the for loop directly accessed the correctly typed .Item() and did not need to do that.

Ironically, the body of the loop was a fairly tricky logistics algorithm I had just written, so I had every reason to assume the problem was on the inside. Imagine my surprise when I changed it to a for loop - strictly to access the index and print out some timings - and watched the procedure suddenly become instant...


I haven't had a chance to dig through the code yet, but some of these results seem a bit off, especially surrounding the for loop.

For example, here are the results for the for loop 'reduce' on the small data set:

100 items - 0.030

500 items - 0.574

1000 items - 0.074

That doesn't make sense to me. How can a reduce on 1000 items be drastically less than on 500 items. Unless I'm misunderstanding something, I can only conclude that it's either A) a typo or B) they only ran this test once and the random data for the 500 test was exceptionally tough to deal with.

Either way, I would love a little more detail with the data before I trust it.


I've seen that too, but I think that measurement is probably just garbage. Either the garbage collector worked during that period, or the whole PC had something else to do. It's just way too far off to make sense.


Haven't tested it (or even read the article :) ), but maybe some kind of JIT optimization kicks in the 1000 element case (ie afer the "loop" ran enough times) but not in the others?


The best case would then be that at the 501st element, the JIT has suddently made the code so fast each subsequent item is practically instant. In that case, the 1000 run would be as fast as the 500 run, not significantly faster.

EDIT: Maybe what happened is that the 500 case made the engine conclude that the function is called frequently enough to be optimized, and the 500 run both spent a while with the slow version and spent some time optimizing it, while the 1000 case got to exclusively use the optimized version.

Just goes to show that benchmarking complex multi-stage JITs can be hard.


True! That was just off the top of my head. I suppose you could imagine some kind of really smart dynamic analyzer which looks at the list length before starting the reduce and goes "oh, I'm going to repeat this over 500 times, better JIT it now". In that case the 1000 element case could be faster. But I don't know much about JIT compilers/analyzers and this is just theoretical, so you're likely correct :)


I've only browsed the source, but I think it's more likely that most of the results are random noise, and there's no particular reason why one case is faster than the other.

There are a ton of things one needs to do to make reliable microbenchmarks (make sure the functions are getting optimized, make sure they're not eliminated as dead code, etc.) and I don't think this repo does any of them.


yes, so I will be taking in consideration the JIT, GC and optimize function implemented in formulas, in the next push


It does not specify a vm or version. I assume it's v8 on node, but there's no way to infer what version was used.

If using node, use process.hrtime() rather than console.time()/timeEnd(): https://nodejs.org/api/process.html#process_process_hrtime_t...

Then, computing the length multiple times is not a good idea. You should save the length in a variable:

    // no
    for(let i=0; i<posts.length; i++) {

    // yes
    for(let i=0, n=posts.length; i<n; i++) {
Finally, it is not recommended to analyze performance in this manner. A slight little change elsewhere in your program can affect the performance very abruptly.

This is because the gatekeepers of performance are: inline caching, hidden classes, deoptimizations, garbage collection, pretenuring, etc.


So, after working on this for some time and reading a lot, i realized that this example is more of a practical analysis for day today js code writing that we do. so the result is more kind of related to it and what should be the choice from those 3.

However, you are right about the performance benchmarking factor. Good news, I have done analysis on the inline cache, warm cache and working on how to get GC in place and hidden classes to get better results.


While caching the array.length had been important, it probably does little with modern engines. I remember tests from a few years ago, which actually favored the first variant. (Probably, because engines could more easily identify the local context and optimize on this. Also, in terms of runtime optimization, there's much to win and you'd want to tackle this issue as one of the very first things.)

That said, it's still a good idea, even if it's just for pointing out that the constraint on the loop won't change.


Caching the length property appears to have only a small impact on performance across Chrome, Firefox and Safari (caching is faster in Firefox, slower in Chrome, and about the same in Safari). Perhaps it's better to recommend the non-cached loop iteration instead?

The quick microbenchmark I checked this on: https://jsperf.com/for-to-length/1


    // yes
    for(let i=0, n=posts.length; i<n; i++) {
There's usually no need to cache the length of the array this way. Modern JS VMs are plenty smart enough to do it automatically (unless there's code in the loop that looks like it might change the array length).


Thanks :)

Working on it.


Performance aside, also consider Ramda.js

Although Ramda has forEach, I augment it with a version of each(func, data) where data can be an array or map, and func(val, key) where key is the key of the map item, or the index of the item in the array.

I feel this abuse of notation makes for more readable / smaller / uniform code [ having no explicit for loops ]. Also takes less conceptual space.


+1 for Ramda -- I've been using it for around 2 years now and once you get comfortable with how to compose it's various functions (and some of your own) it's super powerful. Might not be the best choice if your team is allergic to FP though, some people have a difficult time wrapping their head around it (or just getting used to the syntax). I've gone out of my way to document everything thoroughly knowing people who are mostly unfamiliar with FP will be looking at it though, and that's kept everyone happy.


Lodash is fairly ubiquitous, you can sneak in some lodash fp into projects. Dalton even has it aliased for ramda. The artity for certain functions might be different I didn’t check.

https://github.com/lodash/lodash/commit/6b2645b3106b0ed9ebec...


I looked at https://github.com/dg92/Performance-analysis-es6/blob/master... briefly and saw that the different algorithms always run one after another.

Does the first algorithm get an unfair cold cache disadvantage?


um..i am not sure, i will debug this over weekend and update :)


In the results shown there, why is the "For loop" row highlighted in each of the tables?


I was wondering why it was red when it was the fastest. Red implies slow commonly right?


It's not the fastest in all of the tests, but it is the one highlighted in all of them.

Take a look at the array size 500 results for example. It is slowest for Reduce. Or take a look at 5000. There it is second fastest for Reduce, and slowest for Map, Filter, and Find.


Even more confusing then.


It's just for reference, I will update soon, thanks for point out the issue. :)


It's really confusing that it's in red.


It seems to be the most performant out of the methods tested.


Why not highlight the actual most performant for that specific method?


What is going on with the highlighting here? The first few, have the for loop highlighted as red and red scores the lowest time (best score). After that, it's all over the place. But it's always the for-loop that is highlighted red. This makes it look like the for-loop always wins but that is not the case. What the hell?


This confused the hell out of me too - I've no idea what it's meant to show.


It's just for reference, I will update soon, thanks for point out the issue. :)


For completeness, I feel like for...of should be included. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


JIT compilation cannot perform expensive optimizations by design, it has to be low-overhead. Functional iteration can only be fast when its optimized in to a simpler form - with JIT and dynamic variables everywhere. For loop is already the simpler form and is easier for JIT compiler to reason about. Sure the imperative code is often longer and less cool than a functional one-liner that chains everything to be done, but it has no hidden performance costs and is obvious to read and debug.


The real lesson is to avoid iterating over large result sets in JS by filtering them in your DB.


This often isn’t possible.


Often?

There are cases where large data sets are generated in the browser, but that is not even close to often.


Don't forget that JS != browser necessarily... If you're writing a Node app you could very well end up doing this kind of thing a lot (think any kind of ETL process).


True, but if you need to handle large data sets in the server Node is definitely the wrong choice.


Games spend a lot of their time iterating over lists of entities, and there's a lot of javascript games.


Games are not a very common use case of JavaScript.

Even in the world of game dev, JavaScript is not common at all.


JS isn't used mostly for games, and most games aren't written in JS, but JS and Canvas has largely taken over the niche of browser games.


Isn't there a jsperf of this?


not yet but yes. i will add it soon and share :)


Missing the Exec Summary.


I am still analyzing the results, also as mentioned in the above comments, i need to consider more things before making a conclusion, i am working on it and will update soon.


Ah the good old fashioned early release


His for loop modifies the data in-place, compared to map, which returns a new array. This is just one of many things wrong with this test.


However, this is a thing you can do with a for loop. Why cripple it artificially? Isn't this also introducing a bias?


fixed :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: