The idea of attempting to refactor a 100k+ loc codebase, without static typing a...

rbanffy · on Feb 14, 2012

First, lets assume your 100k+ loc codebase would be about 10k+ loc if coded in most of the "overhyped" languages. That's the primary reason there are not that many huge codebases in them - huge codebases aren't needed. That's also a reason no multi-year development is done with them - it just doesn't last that long.

Next, remember not having to upcast and downcast objects considerably simplifies the code, simplifying the tests you need to write. Less code means less places a bug can hide.

And, finally, understand that, while not passing compilation proves your program is wrong, passing it doesn't make it any closer to correct.

Trust me. I've seen enough "enterprise" codebases in my life. They aren't that special.

hello_moto · on Feb 14, 2012

Hm...

Assumption #1: 100k+ loc codebase in X == 10k+ loc in Y (somehow... magic)

Assumption #2: Assumption #1 is the primary reason not many huge codebases in Y

Assumption #3: Static type implies upcast/downcast objects (despite the presence of generics...) == simplifies code == simplifies tests.

Fact: Less code == less places a bug can hide

Fact: BTW! we all are using tried-and-tested libraries to cut down code to be written

Fact: FindBugs (or static code analysis) does actually find bugs to eliminate most technical bugs and leave you with business logic (and quite possibly race condition bugs).

Argument: Compiler isn't special, see, I just show you it doesn't proved your code to be correct, there's the logic bug in your Business Logic. (Except that happens as well in dynamic language, AND on top of that you gotta work a little bit extra since you ain't got no compiler).

I think this has become your typical nerd holy war between static vs dynamic with no end.

Here's my biggest problem with Ruby or Python community: your libraries tend to have short-lived (i.e.: orphaned).

When I looked at Java community, they're quite stable and rock solid since 2004-2005 (Apache Commons, Maven, Ant, FindBugs, Eclipse, EclEmma, Spring Framework family, JUnit, Hibernate). Newer libraries tend to have good internal codebases (GWT, Google Guava, Mockito). Most APIs are stable and solid.

The stupid ones tend to get kicked out of the crowd (i.e.: Struts, JSF)

This is what I'm looking for. I can focus on writing code that matters, instead of refactoring my code once every 4 months because things changed and I have to keep up.

rbanffy · on Feb 14, 2012

> Assumption #1: 100k+ loc codebase in X == 10k+ loc in Y (somehow... magic)

Have you ever worked with any significantly complex (as in "does a lot of stuff") in Ruby, Python or Smalltalk? Do you sincerely think you could express that same level of functionality with C, C++, C# or Java?

> Assumption #3: Static type implies upcast/downcast objects (despite the presence of generics...) == simplifies code == simplifies tests.

Generics are nice until you have to implement something that accepts them. I gave up once.

> Fact: BTW! we all are using tried-and-tested libraries to cut down code to be written

And language features that do just that (see Assumption #1)

> ... static code analysis) does actually find bugs

Actually, they are one step above syntax errors.

> Except that happens as well in dynamic language

No. Not really. Of course, I may pass a list to something that expects a file but, provided I embraced the idea of duck typing, that would probably work just fine. And yield correct results.

And avoid implementing a similar function that receives a string instead of a file.

> AND on top of that you gotta work a little bit extra since you ain't got no compiler

Most errors a compiler catches are traditionally caught in tests with dynamic languages. But having compilers doesn't allow one to skip writing proper tests with statically typed languages. In fact, considering you'll have to deal with more situations, you'll have to write more tests to go with your less concise code.

> I think this has become your typical nerd holy war between static vs dynamic with no end.

I think statically typed languages have their place. I have written tons of C and Java (and a bit of C#). I just admit my ST code was not significantly better than my DT code and was considerably more complicated, with more functions, classes, interfaces, configurations, indirections, and much more verbose. The 10x loc difference is very real. In the end they were all, ST and DT, correct.

> Here's my biggest problem with Ruby or Python community: your libraries tend to have short-lived (i.e.: orphaned).

I'm nos as familiar with Ruby (the language mentioned in the article) but I can tell you Rails (the framework mentioned) is very solid and very well maintained. Compatibility can be broken from time to time, mostly for good reasons. And you don't need to rush to update your code - you can be perfectly happy using older libraries. I can tell you, however, Python libraries are remarkably stable. The impending move to Python 3 is the only significant example of code breakage, but, again, it's for great reasons.

> they're quite stable and rock solid since 2004-2005 (Apache Commons, Maven, Ant, FindBugs, Eclipse, EclEmma, Spring Framework family, JUnit, Hibernate).

A couple of them gained significant functionality in the past couple years. If you want to use the new functionality, you'll have to refactor your code, often extensively.

> I can focus on writing code that matters,

Ditto here. I just write less code (in less time) for the same amount of "matters"

> instead of refactoring my code once every 4 months because things changed and I have to keep up.

With the extra time you get by writing less code in less time, you certainly have the time to refactor. With good tests, you have the certainty the refactoring was successful. Besides, nothing is compelling you to switch to incompatible versions every 4 months (I don't even think that's possible unless you are doing so deliberately). What is the problem here?

hello_moto · on Feb 14, 2012

I use Guava extensively and it has helped me quite significantly to reduce some of the code normally Java developers wrote.

I use Rails for CRUD app and I use Spring MVC too.

I used Python a lot a few years ago from writing tools (testing, scripting, automation), to small web-apps.

Saying that 100k can be cut to 10k (or 1k) is like pulling number out of thin air, that's what I find... "magical". It's like some CEO of a startup giving a deadline to the developer to write Hadoop in 4 hours; pulling the number out of thin air.

Compiler => helped syntax errors and types (95%, 5% consist of reflection hackery that might bite).

Static code analysis => One level up from syntax errors, focused on common pitfalls and bug patterns

One step, two step, three step, doesn't matter, it helps, that's the bottom line.

What I refer to "except that happens as well in dynamic language" is to the business logics, the app logics, not syntax and whatnot, you _got it wrong_. We all got business logics bugs that can't be caught by anything other than testing the app themselves (QA, automation, whatever).

So we agree that dynamic languages requires you to write more testing code because there's no compiler? Gotcha.

I never say anywhere that using a static language requires NO testing.

What I found is that a bunch of old-timers who wrote Java back in 98-2003 and didn't use more modern Java frameworks and tools but already jumped the band-wagon to Rails or Node.js keep singing the same tune.

When I use Rails, then Spring MVC + Spring Data, I noticed that the amount of business logics that I have to write is more or less the same. 10x is again out of thin-air. I admit writing code in Java requires more typing albeit Eclipse helps a lot (and please don't bring the old argument of "But You need an IDE!", who cares what I need as long as it helps me to do my job).

Picking one example to refute the epidemic that occurs in the ecosystem is probably not a strong argument (Rails is well maintained, I give you that, although it takes Rails up to version 3.x to realize that they need to cool down and stabilize but I give you that, but that doesn't mean the other plugins and libraries are held the same level of quality and commitment with Rails).

rbanffy · on Feb 14, 2012

> So we agree that dynamic languages requires you to write more testing code because there's no compiler? Gotcha.

No, we don't. I never said that. Since there is less code structured in simpler ways, there can be less tests (there are less code paths). Since the tests themselves are written more concisely, the tests themselves are smaller. The syntax errors will be caught in the tests because most dynamic languages actually compile the code. They just do it as needed.

Again, the 10x number is not out of thin air - we keep a lot of Java applications here and, more than once, we rewrote them in dynamic languages. We saw better than 10x ratios, although with the more recent, nicer Java code is more like 5x.

> Picking one example to refute the epidemic that occurs in the ecosystem is probably not a strong argument

I am not very familiar with the Rails ecosystem, but if most of the libraries someone picks tend to be abandoned, maybe the problem isn't in the libraries, but in the selection process.

fckin · on Feb 14, 2012

True, Rails is such a terrible and unsupported framework.

radicalbyte · on Feb 14, 2012

If 90k of your 100k codebase is generated code - and there's a good chance that it is - then the size of the codebase is no reason to switch to an interpreted language.

rbanffy · on Feb 14, 2012

Are you sure generated code is a good thing? Isn't it just support for limited expression power, something you shouldn't be able to get away with in your language of choice that your tools help you with by writing lots of code that are required to express what you intended?

Wouldn't it just be nice if you could get away with just the code you wrote?