Hacker News new | past | comments | ask | show | jobs | submit login
Strategies to quickly become productive in an unfamiliar codebase (nestoria.com)
126 points by sabarasaba on Sept 3, 2014 | hide | past | web | favorite | 31 comments

I'm surprised I didn't see something similar to what I do: the deep-dive.

I start with a relatively high level interface point, such as an important function in a public API. Such functions and methods tend to accomplish easily understandable things. And by "important" I mean something that is fundamental to what the system accomplishes.

Then you dive.

Your goal is to have a decent understanding of how this fundamental thing is accomplished. You start at the public facing function, then find the actual implementation of that function, and start reading code. If things make sense, you keep going. If you can't make sense of it, then you will probably need to start diving into related APIs and - most importantly - data structures.

This process will tend to have a point where you have dozens of files open, which have non-trivial relationships with each other, and they are a variety of interfaces and data structures. That's okay. You're just trying to get a feel for all of it; you're not necessarily going for total, complete understanding.

What you're going for is that Aha! moment where you can feel confident in saying, "Oh, that's how it's done." This will tend to happen once you find those fundamental data structures, and have finally pieced together some understanding of how they all fit together. Once you've had the Aha! moment, you can start to trace the results back out, to make sure that is how the thing is accomplished, or what is returned. I do this with all large codebases I encounter that I want to understand. It's quite fun to do this with the Linux source code.

My philosophy is that "It's all just code", which means that with enough patience, it's all understandable. Sometimes a good strategy is to just start diving into it.

Your strategy is similar to mine, except that I only focus on the types.

Which types are required to define the function, both as parameters and through any global/inherited scope/other state? Which types are directly related, eg, containers to string, containers themselves, indexing, etc?

By the time you have some sense of what things go where in the program, and what they turn in to, you usually have significantly narrowed down what the program can be doing.

Note: This works considerably better with some languages than others, but can work reasonably well on weakly typed things like Python.

You inevitably spend much of your time understanding the types and the concepts they represent, but I find using a specific, high-level function to drive the understanding helpful. One, it provides a "theme" for the deep-dive, and two, the logic of that functions gives you particular paths through those types to focus on.

This, more than anything. It's easy to get bogged down in a large code base and lose sight of what you're trying to accomplish. Cue the anxiety at that point, and watch your tasks become even more difficult. Your note that "it's all just code" is absolutely perfect: you just have to trust in your experience and knowledge. Figuring it all out from there is just a matter of tracing things out a point at a time.

The really cool thing about that is that it's really no different than how we handle other new experiences in our lives. Nobody walks into something new and just knows what to do. Without even consciously realizing it, we make observations and draw inferences as we relate those observations to previous experience that allow us to make better-informed judgments. And thousands of years of evolution have made us rather good at this.

You already have the skills to jump into the deep end without being immediately overwhelmed, it's just a matter of applying them successfully.

Am I the only one reading this thinking "as written by someone who seldom dives into foreign code bases".

After a few months you'll be familiar with it?

Wow, I'm lucky to get 8 hours on a new code base before I have to ship a bugfix. Months?? O_o luxury. Sip your pina colada as you write your immediately out dated documentation. Great advice.

No, that code base was probably written by several people some of whom knew what they were doing, some of whom were just 'getting the job done' and some of whom were idiots.

Probably, the only useful suggestion in that list is code reviews. Hack a fix in, get someone who seems clued up to review and suggest a better way.

Look for logging, thats the first thing I do; you're probably not the first person to be given this code to work with, and if you're lucky the last folk(s) made some debugging tools. If not, be a pro and leave some good ones behind when you go.

...not documentation.

Wow, I'm lucky to get 8 hours on a new code base before I have to ship a bugfix. << spotted your problem. If you're constantly working in completely unfamiliar code with at best an 8 hour lead time to get a fix in, you're being setup for failure. Yeah, sometimes you have to just get a fix fast, but spending your whole life coding like that is going to burn you out. I only have my own anecdotal evidence to go by, but I'm inclined to think your experience is not the only way.

Most teams I have worked with deliberatly introduced new members to their code base by:

A) Making sure they can compile it, and showing them where the various locations for documentation are.

B) Giving them simple bugs to work on while they get familiar with the codebase. 8 hours is more than enough time for a competent developer to fix many simple bugs.

You also have situations where the nature of the problem dictates that you fairly often need to make changes in unfamiliar code-bases. For example, at my previous job I was developing an Android fork. Android has a fair amount of its own code, but also includes a large number of external projects. It was a fairly common occurrence where I would have to fix a bug in one of those projects that I have never seen before (and will likely never see again).

The point is, if you need months to become familiar with a code base, something is seriously wrong.

These 'here is some generic pointless advice; got some useful advice? Leave it in the comments!' posts are social media posturing and lazy writing.

Certainly not everyone dives into different code bases every day / every week; but everyone does it sooner or later; and being immediately productive is hard. This article could have been an interesting collection of tips on how to do that... but the author couldn't be bothered researching the subject.

From my experience, the "be humble" strategy is by far the most important one. Many developers tend to dislike whatever piece of code that was not written by them before even looking into the actual code or they get discouraged because they have a hard time learning how it works. For whoever is in this situation, be humble... pretend your way of writing software is not the only way and see if you can learn something. You may get surprised.

I once had to get my head around over 120k lines of complex, concurrent, buggy spaghetti Java code (for a real-time trading system).

My first attempt was to reverse engineer the code into a UML diagram. For some reason I keep making this mistake. A messy code base will result in an extremely messy diagram. It can give a few insights, but between finding a tool which will work and trying to make sense of a tangled mess of lines, visual diagrams usually aren't worth the time.

I found that a tool called Chronon was somewhat useful (google "DVR for Java"). This tools just records a single run of a program. It is great for going forwards AND BACKWARDS and you step through the code, take a look at different threads, state of various objects, etc.

My strategy was to run the server and have it execute a small and simple bit of functionality (execute a single order). Follow it all the way from input to output. Make the scenario a bit more complex and follow that through to completion. This way you get to understand the core functionality, edge case code and start to get a sense of performance enhancements, etc.

I found myself making steady progress and fixing a number of bugs, until I hit heisenbugs, caused by overly clever concurrency/object pooling. It is enough to drain your soul :)

Another strategy that I've often used is to just fix a few bugs that seem to be in the vicinity of the area you want to familiarize with. Heck, I even do this on my own codebases that I haven't touched in a while.

I find it helps focus the mind, provides a clear definition of success and forces you to think about a specific area of the code without requiring too much context.

I've done this a lot in my career. My strategy is simple: start by finding a trivial bug, and fix it. Then find a slightly less trivial bug. Do this four or five times. Don't ask anybody about anything unless I'm really lost, just chase the thread and let the bugs lead me through the active code paths.

After doing a few of these I've got a fair understanding of the structure of the code and can figure out where to go next.

I find it helps to focus on a 'successful' path through the code, starting with an important entry-point and ignoring error conditions and validation. Once I've covered a fair chunk of what the software does, I can go back and delve into the parts I didn't understand.

I'm also wary of comments as I find they can often have 'drifted' in old code and become misleading. I make detailed notes of things that look dodgy or could be improved separate from the code.

I'd add that if the project has poor-to-no build scripts that require a lot of manual steps it's worth at least bandaging it up with a shell script early on.

I recently worked on a project that had many separate modules with independent build scripts, requiring copy-and-pasting built artifacts that others depended on, and several manual configuration tweaks post-packaging. That stuff is very tedious and sucks your energy. It's not generally a priority to rewrite all the build scripts, but if you're editing several modules it's worth being able to do a one-shot build from early on.

I think that the two most important strategies are to actually pair with someone and ask questions after doing some research.

Being able to do some pair programming helped me to understand a new code base, in a language I never used before bits by bits. I could ask questions and actually helped on the issue. Asking questions is important, but as stated on the post, you should try to find to answer by yourself first. I'll actually time myself for 10 minutes, and if I can't find an answer, I'll just poke the most prominent person based on a git blame of the file that is related to my question.

This is my second week working on a Ruby codebase, without any prior experience with Ruby, or Rails (I've mostly been doing Python with Django and Pyramid for the past 3 years). I managed to get quickly up to speed this way.

What types of tests—or testing software—could be run on a WordPress site? For example, after we update a plugin on a WordPress site we manually check that the site hasn't changed in appearance. What software – or methodology – would you recommend for automating this game of spot-the-difference?

The nearest I've found (but not tried) are:

Screenster: http://www.creamtec.com/products/screenster/index.html

Wraith: https://github.com/BBC-News/wraith

There's also Huxley: https://github.com/facebook/huxley

There's a good episode of Software Engineering Radio on this topic, as well:

Episode 148: Software Archaeology with Dave Thomas http://www.se-radio.net/2009/11/episode-148-software-archaeo...

I recently started a new job with a new language and did a few of these steps. The only one that has really helped me learn this codebase is "make something".

Making unit tests can be great if you already understand the language (and the IDE), but when you're new to xcode, dont expect a unit test to make any more sense than the code.

Pairing can be a great tool, if you are pairing with the right person. You can't always find the right programmer (with the time or care) to sit down and work with you like that.

I was certainly humbled by this process, whether by choice or not, and asked my share of dumb questions, but I don't think you can really understand a codebase until you work on it

Working Effectively with Legacy Code, by Michael Feathers, is still my go-to reference, even though it's now ten years old. https://cmdev.com/isbn/0131177052

Nice 10kf overview, but more hand on tips:

1. Run the program, look at stacktraces/profiler. Sometimes it's easier to analyze runtime and figure out what the program is doing and what are the main/common patterns.

E.g. Java - jstack, C++ - gdb, gprof

2. Use some static tools to get call/caller graph and be able to browse program quickly. Jump between definitions, see what's used, what's not.

E.g. C++ - docgen, Java - most modern IDE will do just fine

Tools like gdb and docgen are useful for smaller open source projects, but I think they quickly loose effectiveness on a large code base. For example I think the original post's high level view would work great when starting a new job as well as when diving into an open source project.

That said one step down the chain from stack traces and profiler output might be to look at log files, or run the code in verbose/debug/trace mode if it exists.

How can you write tests if you're familiar with what the code is supposed to do?

The tests are one way to express your understanding of the code. If the tests you have written fail, then you can focus and improve your understanding in that part of the code. And if they pass, you can be reasonably assured of your understanding.

For my current employer I landed on the job during their regression test sprint.

Just running through the tests and communicating a lot with testers and developers helped me understand how the app (ought to) behave.

"Ask questions", I found that if you ask too many questions, management starts to get annoyed and actively avoids you and or has attitude in their response.

Somewhat adjacent to the conversation, but Bowery - http://bowery.io/ - is building a toolset for getting a full dev environment in a few seconds.

Saw them on the "Made in NY" Product Hunt collection yesterday. Has anyone played with Bowery?

Can't over emphasise the importance of a good IDE for this kind of thing, even if you're a hardened vim or emacs zealot. A tool like IntelliJ IDEA makes navigating new/large codebases a million times easier.

With Vim and Emacs you can get this if you have a plugin that autogenerates a ctags/etags file. Then (in Vim) just <C-]> to an identifiers definition and <C-o> back.

for many languages ctags doesn't come close unfortunately, for code exploration a good IDE is essential. Even if you don't use it for actual editing.

This is very useful, and I think I will post or link to something like that on my own open source project's page.

You should also try to make sure your project has tests, is clean, commented, and easy to read. :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact