Nice to see this hitting the front page. None of my submissions of it ever did. Happy to answer any questions about it here should they come up.
Certainly the biggest thing I took away from this was that the GC in Go is a far larger overhead than you would think, even for something that runs in 30ms.
I was a C++ developer now working on a Go project. I'm always suspicious of garbage collection, but people say it is an exaggerated claim. Glad to hear someone confirm this.
This is a case of "speed" meaning more than one thing. Go optimizes for latency, not throughput, so even though each collection pause is very short, you end up with a lot of them.
Steel Bank Common Lisp would be one open-source tool that has a GC optimized for throughput at the expense of latency. Full GC pauses (which are rare, but do happen) are large fractions of a second even with moderately sized heaps. However the throughput is great to the point where many workloads are just as fast with heap allocation as stack allocation, and the gc overhead is actually less than malloc/free (or new/delete).
Obviously the SBCL garbage collector is totally unsuitable for video games (there are games designed to be build with SBCL, but the allocation is done during non-interactive parts of the game).
Depending on what you are working on you can pre-allocate everything you need and then turn the GC off. You can even do it in code which is nice. Similar approach to Java.
The problem with garbage collection in most cases isn't really performance but correctness.
The only thing the garbage collector does is allowing you to not think about memory. And you can not write good code if you do not know how your memory is being used.
It is a massive disservice with barely any tangible benefits.
I just tried switching from mmaping everything in loc to always just reading all bytes in the file and it went from about a second to ~530ms on the linux kernel, with tokei at around 750ms.
Tokei is definitely more accurate though, by probably a pretty wide margin. I'm hoping to get around to handling nested comments correctly soon and maybe strings.
I'll definitely have to take a look at this in detail when it's not 6am. Or on a day where I wake up at 6am instead of stay up until 6am.
Not that I didn’t believe BurntSushi but I wanted my own validation. Not suprised you got the same result. I think with whitelisting there is probably no need to mmap for these tools.
The nested comments and strings will probably slow you down a lot. I know it did for me. I’m looking forward to the new GC settings in Go so I can tweak it for faster performance.
Did you ever work out why loc was performing so badly on multi core systems? Sounds like you did but curious what the bottleneck was. I don’t understand rust well enough to be able to guess sorry.
I thought I similar utilization in tokei on my machine. I copied the concurrency pattern straight from ripgrep. I'll have to take a look tomorrow. I assumed I was still bottlenecked on reads.
Interesting. Actually being blocked on reads sounds about right. I’ll try running the benchmarks again one of these days. Probably after I get access to the Go GC controls that are apparently on the way.
I was reading this and I was wondering that perhaps it would be better define a line as "80 characters of code", and measure by characters, then divide by 80 to get lines of code.
The whole point of measuring lines of code is to get some sense of the complexity of the code base, but if what if one code base has lots of short lines, and another code base has lots of long lines. How would this be resolved?
Number of characters is a meaningless metric because it depends on identifier lengths, which can be very different between coding styles. A much fairer metric would therefor be the number of lexer tokens in my opinion.
Could you try and reorder the state checks to occur with decreasing state-transition probability? I.e. if I am currently in code, it is most likely that I will be in code next state as well, a bit less likely that I'll be in a single line comment, and even less likely that I'll be in a multiline comment. If I'm in a multiline comment, I will most likely be in a multiline comment next, or code, but unlikely to be in a single line comment.
Amongst a few equiprobable state transitions, ordering the checks by increasing expense might also help.
I did toy with that a bit but don’t remember it making any difference. I might try it again though. From memory though it might work better to move he switch to a hash which should allow it to compile down to a jump table which would be faster.
One thing the linked ripgrep post doesn't tell you is that line counting is also done with SIMD since it started using bytecount (https://github.com/llogiq/bytecount) some months ago, which sped up some workloads with line numbers considerably.
Certainly the biggest thing I took away from this was that the GC in Go is a far larger overhead than you would think, even for something that runs in 30ms.