"Also, the whole "write code faster" has always been perplexing to me. The speed...

marcosdumay · on March 25, 2014

> Note the "by the programmer" clause; lines autogenerated... well... correctly autogenerated tend not to count against the programmer, which is an important part of doing Java, or so I hear.

Sorry, but unless you have an architect dissect the problem to exaustion and freeze the architecture after it, no piece of software creates the correct autogenerated code, and those lines still have to be changed by the programmer. Several times.

And if you have an architect dissect the problem to exaustion and freeze the architecture after it, that's already a bigger problem than dealing with all that autogenerated code. No win.

bryanlarsen · on March 25, 2014

If your autogenerated files have to be changed by the programmer, they're not autogenerated files, they're templates.

ssmoot · on March 25, 2014

Do you have a citation for that?

Because I'd totally buy that the number of expressions matters.

But I really doubt actual lines matters much.

jerf · on March 25, 2014

In theory, I have a citation. There have been actual studies done that show roughly equal productivity in several languages as measured by lines of those languages. However, I can't google them up through all the noise of people complaining about line counts being used for various things. And I phrased it as "one of the very few bits of relatively solid software engineering" on purpose... that phrase isn't really high praise. You can quibble all day about the precise details, not least of which is the age of the studies in question.

Still, I do stick by my original point... if you think lines of code are irrelevent, it becomes very difficult to understand the current landscape of language popularity. A language in which simply reading a line from a file is a multi-dozen line travesty is harder to use than a language in which it's two or three, and that extends on through the rest of the language. I know when I go from a language where certain patterns are easy into a higher B&D language where the right thing is a lot more work, I have to fight the urge to not do the lot-more-work, and this higher level "how expensive is it to use the correct pattern?" is a far more important, if harder to describe, consideration across non-trivial code bases.

ssmoot · on March 26, 2014

I wasn't attacking your comment. Just curious about the citation since it doesn't intuitively sit right with me I guess.

beagle3 · on March 25, 2014

That's a distinction without a difference.

1) How do you count "expressions"? is (b+sqrt(b * b - 4 * a * c))/(2 * a) one expression or 14?

2) Assuming reasonable coding style and reasonable definition for what an "expression" is, the variance of the measurement "expressions per line" will be very small - thus, "number of expressions" and "number of lines" are statistically equivalent as far as descriptive power goes.

I don't have a citation, although I do remember this conclusion mentioned in PeopleWare - specifically, that "number of bugs per line" tends to be a low variance statistic per person, with the programming language playing a minor role. I might be wrong though.

But I can offer my personal related experience ("anecdata"?) - when you ask multiple people to estimate project complexity using a "time to complete" measure, you get widely varying results that are hard to reason about. However, when you ask them to estimate "lines of code", you get much more consistent results, and meaningful arguments when two people try to reach an agreement. YMMV.

ssmoot · on March 26, 2014

I feel like you probably haven't coded IO in a language like c# (in earlier versions anyways) or Java if you think I'm playing semantics.

Expressions are distinct from Compositions, and both influence LOC. I wouldn't suspect that Java software is of generally lower quality than Ruby code on average for example even though in Java you might see a Reader around a Buffer around a Stream instead of Ruby's `open`.

I guess what I'm getting at is what you might loosely call boiler-plate. Java has a lot more boiler plate. Which could easily result in 2X higher LOC. Having worked with more Ruby than the average bear I feel very confident being skeptical of the assertion that Ruby libraries are generally of higher quality/fewer bugs.

I think your last anecdote is more getting into Five Whys territory, and it's probably reasonable to expect a greater degree of consensus then.

Final note: Scala is typically less verbose than Ruby by a fair margin (at least if you leave off imports). Idiomatically usage is also Functional to a significant degree in a way that no Ruby library I've ever seen comes close to. So does that automatically mean that Scala is the superior language? (Well of course it is ;-D, but is that the reason?)

beagle3 · on March 27, 2014

The question is simple, and it's about math and statistics.

How do you count lines? On unix, "wc -l"; if you insist, sloccount, but "wc -l" is a good approximation.

How do you count expressions? The fact it will take you a few paragraphs to answer (you haven't, btw) indicates that it's a poor thing to measure and try to reason about.

I've done some IO code in C# (mostly WCF, bot not just), and I still think you are playing with semantics as far as statistics is concerned.

Figure out an objective, automatable way to count your "expressions" or "compositions" or "code points" or "functional points" or whatever you want to call it. Run it on a code base, and compute the Pearson r coefficient of correlation. It's likely to be >95%, which means one is an excellent approximation of the other.

And I have no idea what you were trying to say about Scala. I wasn't saying "terser is automatically better". I was saying, (and I'm quoting myself here: "number of bugs per line" tends to be a low variance statistic per person, with the programming language playing a minor role"). Note "per person"?

ssmoot · on March 28, 2014

So backwards first I guess. "per person". Ok. But given the range of programmers I guess that's not an incredible surprise. Yes the person is more important than the language. I'd buy that.

I guess "expression" seems semi-obvious to me since it's a standard rule in SBT. Variable assignments, return values and function bodies might get close.

  val a = 1 + 1

That would be an expression. Instantiating a DTO with a dozen fields, using keyword arguments and newlines between for clarity would be a single expression to me.

An if/else with a simple switch for the return value would be an expression for example. A more complex else case might have nested expressions though.

It takes some charity I suppose; one of those "I know it when I see it" things. I don't do a lot of Math based programming though. It's all business rules, DTOs, serialization, etc. So maybe not something that could be formalized too easily.

I guess where I'd intuitively disagree (and would be interested in further reading) is that LOC as a measure just doesn't feel like it works for me.

Considering only LOC to implement a task it's likely: Java, Ruby and Scala in that order (from most to fewest). But in my personal experience bugs are probably more like: Ruby, Java, Scala from most to fewest.

Hopefully that helps clarify and not just muddy what I'm trying to express further.

What confuses me is that you appear to be claiming that fewer LOC should correlate strongly with fewer bugs, but then go on to say that terser is not automatically better (in this context (sic?)). Maybe I'm reading more into it than you intend, but I'm left a bit confused.

beagle3 · on March 28, 2014

> one of those "I know it when I see it" things.

Which is a confusing use of the term "expression", since it is very well defined when talking about languages - in fact, most formal grammars have a nonterminal called "expr" or "expression" when describing the language.

Your description, though, more closely correlates with what most languages consider a statement.

Regardless, it's just pure statistics - if you calculate it, you'll notice that you have e.g. 1.3 expressions per line, with a standard deviation of 1 expressions per line - which means that over 1000 lines, you'll have, with 95% confidence, 1200-1400 expressions -- it wouldn't matter if you measure LOC or "expressions".

> What confuses me is that you appear to be claiming that fewer LOC should correlate strongly with fewer bugs, but then go on to say that terser is not automatically better (in this context (sic?)). Maybe I'm reading more into it than you intend, but I'm left a bit confused.

What I'm claiming is that, when people actually measured this, they found out that a given programmer tends to have a nearly constant number of bugs per line, regardless of language - that is, person X tends to have (on average) one bug per 100 lines, whether those lines are C, Fortran or Basic - the variance per programmer is way larger than the variance of that programmer per language.

Now, PeopleWare which references those studies (where I read about that) was written 20 years ago or so - so the Java or C++ considered wasn't today's Java/C++, things like Scala and Ruby were not considered. However, I'd be surprised if they significantly change the results - because those studies DID include Lisp, which -- even 20 years ago -- had everything to offer that you can get from Scala today.

So, in a sense - yes, you should write terse programs, regardless of which language you do that in. If you wrote assembly code using Scala syntax, and compiled with a Scala compiler - Scala is not helping you one bit.