
Sloc Cloc and Code – What Happened on the Way to Faster Cloc - msangi
https://boyter.org/posts/sloc-cloc-code/
======
boyter
Nice to see this hitting the front page. None of my submissions of it ever
did. Happy to answer any questions about it here should they come up.

Certainly the biggest thing I took away from this was that the GC in Go is a
far larger overhead than you would think, even for something that runs in
30ms.

~~~
isaachier
I was a C++ developer now working on a Go project. I'm always suspicious of
garbage collection, but people say it is an exaggerated claim. Glad to hear
someone confirm this.

~~~
boyter
Depending on what you are working on you can pre-allocate everything you need
and then turn the GC off. You can even do it in code which is nice. Similar
approach to Java.

~~~
donatj
How?

~~~
boyter
You can control it through an environment variable or through code
[https://golang.org/pkg/runtime/debug/#SetGCPercent](https://golang.org/pkg/runtime/debug/#SetGCPercent)

------
fouc
I was reading this and I was wondering that perhaps it would be better define
a line as "80 characters of code", and measure by characters, then divide by
80 to get lines of code.

The whole point of measuring lines of code is to get some sense of the
complexity of the code base, but if what if one code base has lots of short
lines, and another code base has lots of long lines. How would this be
resolved?

~~~
gmueckl
Number of characters is a meaningless metric because it depends on identifier
lengths, which can be very different between coding styles. A much fairer
metric would therefor be the number of lexer tokens in my opinion.

~~~
e12e
Number of "words of code"?

~~~
gmueckl
I guess you could call it that.

------
superdimwit
Could you try and reorder the state checks to occur with decreasing state-
transition probability? I.e. if I am currently in code, it is most likely that
I will be in code next state as well, a bit less likely that I'll be in a
single line comment, and even less likely that I'll be in a multiline comment.
If I'm in a multiline comment, I will most likely be in a multiline comment
next, or code, but unlikely to be in a single line comment.

Amongst a few equiprobable state transitions, ordering the checks by
increasing expense might also help.

~~~
boyter
I did toy with that a bit but don’t remember it making any difference. I might
try it again though. From memory though it might work better to move he switch
to a hash which should allow it to compile down to a jump table which would be
faster.

------
llogiq
One thing the linked ripgrep post doesn't tell you is that line counting is
also done with SIMD since it started using bytecount
([https://github.com/llogiq/bytecount](https://github.com/llogiq/bytecount))
some months ago, which sped up some workloads with line numbers considerably.

------
merb
well the cost estimate is a little bit funny. Probably inaccurate :D

> Estimated Schedule Effort 25.879043 months

> Estimated People Required 18.118657

Well I'm working alone on the project for 4 years...

~~~
boyter
Certainly could do with some tweaking. Its based on the simplified COCOMO
model. Its entirely possible I made a mistake in there.

