
Ask HN: How do you manage your codebase once it gets large? - cmelbye
Hi all! I'm working on a project, and lately I've noticed that it seems to be getting harder to navigate and develop as it gets larger and more complex. Is this a sign of it maturing, or am I doing something wrong? What do you guys do to help manage your codebase when it gets somewhat large?<p>(This is my first "real" project using Rails, so I have no previous experience on this question :P )
======
patio11
This is one of my periodic pain points with Rails, as compared to Java. At the
day job I work with multiple code bases with more than X00,000 lines of code
each. My Rails site has 5k lines as reported by rake stats, and due to a
combination of poor IDE support compared to Java and the toss-it-all-in-one-
folder Rails conventions, that sometimes results in headaches.

My suggestions:

1) Refactor like it is going out of style. Rails classes (particularly
controllers and models in my experience) have a tendency to accumulate cruft
over time. Periodically evaluate whether something should be broken into two
classes, or at least two files. Mixins and require are your friend here.
(Breaking a controller into two changes URLs by default. I think letting Rails
assign publicly visible URLs is a mistake, but most people do it. Consider
carefully whether you can accept URLs changing as a result of code changes.)

2) Encourage internal code reuse, and better separation of responsibilities,
by breaking things into plugins when appropriate. I extracted code from my own
classes to make A/Bingo for example. It keeps my own classes readable and
makes it much easier to publish and reuse A/Bingo.

3) Coding conventions are your friend. Since the IDE lacks a reliable jump-to-
definition-of-this-function feature, I tend to try to put predictable things
in predictable places. For example, model classes always have a consistent
ordering of filters, validations, and associations at the top of the file
prior to business logic. (Those could be externalized if they got realllly
messy, via the magic of require.)

4) Put partials into thematically coherent subdirectories. This saves you from
having to find the right form partial from a directory with 20 views and 50
partials.

~~~
j_baker
I _slightly_ disagree with point 2. The problem is that sometimes making
something be adaptable to multiple use-cases makes it more complex than simply
having two very similar pieces of code.

~~~
bguthrie
It's true that you lose some time by doing this. But my experience has been
that it inevitably improves the quality of your code if you and your team are
forced to think about interfaces and common pain points by moving towards
reuse. Your team members are code consumers too.

Plus, you end up in a position where you can give back to the community, which
is nice.

------
kvs
I maintain a codebase that is about 300KLOC C++ and about 900KLOC Java.
Abstractions become important at this scale.

We follow a simple MVC principle, C++ side is model, JNI-glue is control, and
Java is view. Then, individual pieces are broken into functional modules and
there is documentation on slides (flowcharts) on how these functional
components interact.

You should be able to let go and trust fellow developers. I don't know all of
the code; There are four others who have their own pieces of functional
modules that they are responsible for. I trust their judgement/decisions on
their modules just like they trust mine-- we still question which is
important.

Rest of it sort of fall into place as we progress. Hope this helps.

------
Kliment
Segmentation, interfaces, documentation.

You want to make changes that do as little damage as possible. By structuring
your project so that independent functions are really independent, you can
think about them independently. Providing interfaces to your own code also
helps there, you can ignore one part of the code and just use its
functionality, and only have to keep the interface in your head. Also, notes-
to-self often help provide context and memory cues to reload something you've
forgotten. Either way, you should be using source control. That way you can
look at the changelog for a file and reconstruct what you did with it, again
helping you recall structure.

------
peterhi
Well I've had a look at some of the project that are sitting on my hard disk
(all are current by the way) and here is the Code LOC values for them: 50,
225, 261, 346, 402, 572, 857, 980, 1099, 1213, 1475, 1677, 1841, 2002, 2123,
2242, 2887, 3138, 3338, 3421, 3421, 7812, 7812, 46757, 57495

The most used projects, used daily by our clients are in the 2000-4000 range,
the smaller ones are supporting tools and the two massive ones are that way
because the business rules are completely insane. Also they are quite old and
could probably be trimmed down quite a bit.

Some of the more actively developed code has been trimmed down through
constant development, the two monsters do not get touched too often.

Actually I'm quite surprised has to how small our core applications are, I
would have guessed more in the 8000-12000 LOC. But then Ruby tends to be
rather concise.

What sort of numbers are you seeing?

~~~
cmelbye
Actually less than I would've thought: 831 Code LOC.

It's not impossible to manage yet, but I'm beginning to notice that it's
getting harder. I think doing some refactoring and general cleanup will help
with that.

------
jsankey
First: you need good test coverage, so that you can refactor and restructure
with confidence.

Then, I would look into dividing up the code base a bit. Usually once you get
to a certain size you'll be able to identify useful bits and pieces that could
be extracted as standalone libraries. This will also render these more generic
parts of your code ready to reuse.

You might also find it useful to divide the application-specific parts into
layers and ensure lower layers don't depend on higher ones. These types of
divisions allow you to consider lower layers independently. And when you're
hunting for something, you should have a feel for which layer it is in first,
which narrows your search scope.

------
rufius
If you can help refactoring into sub-folders, thats a small change that helps
a lot. When I can, I break down components of my application as much as
possible. In Java thats natural with the packages idiom. In C++ or C I just
force it upon myself to do it.

For example with the language/VM I've got built for doing some DSL work, I've
got folders for the garbage collector (src/gc), the compiler (src/compiler),
the runtime (src/runtime) and so on. I'm not entirely familiar with doing any
sort of large-scale ruby development so I don't know how much that helps you
(most of my ruby code are small 100-200 line all inclusive scripts).

Hope that helps.

------
intellectronica
One important thing to notice as a codebase gets large is that it's going to
slow you down. Accept that, and pay more attention to maintaining a high
quality codebase. Invest in reviews of new merges, refactoring of stale code
and documentation. Automate as much as you can - make sure you can create good
API docs, and that there are tools for working with the existing codebase.
Finally, try to remove code as much as you can (refactoring, or simply
deleting code you no longer use).

------
TallGuyShort
As your codebase gets large, modularity becomes more important - so it might
be time you refactor your code with greater size in mind.

Another crucial aspect is unit tests - as long as you know the project well,
you can go in and make a change relatively easily, as long as you know that a
system of unit tests will catch any unintended side-effects.

------
audionerd
This video was pretty good -- "Living with legacy software": David from
37signals talks about how they refactored the Basecamp code as it grew:

<http://railsconfeurope.blip.tv/file/1555560/>

Breaking pieces out into modules (and even further into plugins) can help keep
it manageable.

~~~
audionerd
Erhmm.. this was the one I actually watched:

<http://www.vimeo.com/1752667>

Same presentation.

------
jimfl
This tends to be the job of the IDE. I am working on a largish .NET codebase,
and Visual Studio, combined with a plugin called Resharper is the thing that
keeps that sane, giving us context menus that allow us to quickly find usages
of a method, or implementations of an interface.

For Java code, Eclipse performs similar functions.

------
known
I think your project's automated testing scripts are _not_ keeping up with the
development.

------
kylebragger
what scm system are you using?

~~~
cmelbye
Git, hosted by GitHub.

