Hacker News new | past | comments | ask | show | jobs | submit login
How to join a team and learn a codebase (2020) (samueltaylor.org)
252 points by minicaionut on Jan 16, 2021 | hide | past | favorite | 56 comments



3 things that helped me become productive on new codebase faster

1. Start with a goal to fix a tiny issue. It will help you not go too deep, too early and yet give you an overview of the codebase.

2. Document the steps to setup dev environment in your own words and highlight the issues that you run into

3. Take time to learn the new Library or the tooling you encounter. Learn with the goal of familiarise yourself with the keywords/concepts of that library or the tools. It's okay even if you don't understand exactly how they work. When you really need to understand to solve a problem or a piece of code, you can use the keywords to quickly go to exact documentation references to learn more. A thumb of rule for me is to not invest more than a day to learn these new concept continuously(I can always come back to it if I see the need for that)


I just joined a new team as a lead and #2 is killing me, to the point where I have felt like giving up. The documentation my team was given was a hastily thrown-together hack that was full of missing and incorrect steps. It should not take more than a few days to set up a local development environment, but I’ve had to fight for admin rights, ask a million questions, and have other leads work with me for days on end (who themselves struggled with their own dev env). These are not very hard problems to solve, but they do take time and dedication. I’ve been on many other teams where I was either given a pre-configured VM image or I had comprehensive, clear set up instructions, allowing me to be up and running in hours instead of weeks.

This is also a project management issue. Too many times, PMs or tech leads are not at all technical and have little to no comprehension of the complex environments they oversee nor “technical empathy” for the engineers. It matters not to them what can empower a developer or make them more productive, and too often devs are told “you’re a developer, you should be able to figure it out” or “just use the tools we gave you”. Some of this does come from heightened and constantly changing security requirements that everyone is expected to blindly implement, but there is also an inherent laziness where leadership doesn’t consider what those kinds of changes mean for everyone involved.

Many, many projects needlessly waste ridiculous amounts of time and money on these issues.


Yeah, it's crazy to me how common it is to just let basic dev environment polish languish.

In a small enough place, I usually will just fix whatever is slowest and most annoying about it myself.

At bigger places... good luck. Sometimes you can get away with hacking together some scripts that do things for your personal setup, but if there isn't buy in to fix the problem you're often out of luck.


But I think this one point is the one item that ultimately decides on the lifetime of a project and whether it will turn legacy. If the README doesn't even describe how to start the project or successfully invoke a single unit test, how is anyone going to deploy this or change anything of the core code? Also not having this usually means unreliable deployments, broken CIs and constant firefighting. This might be considered acceptable for 1-person-projects until that one person might change projects or jobs.


My team has a story that is dedicated to setting up a `dev` environment for when a new repo is started.


2. Document the steps to setup dev environment in your own words and highlight the issues that you run into

This is the most significant thing for me getting up to speed on a new or old project. I absolutely hate reading some whacky custom dependency, jboss, hell like project setups. They just kill morale. And I feel really bad when I see new starters wade into the tall grass on this stuff. I can almost see the moment they go to lunch and start thinking about quitting tech.

My personal reaction to being given really bad 'quick start' doco so many times, is to try and leave every project I work on in a state where "mvn clean install" will do everything needed to get things running.

You may join a team, but your team is also getting a new member. They should be working HARD to make you productive asap, and crazy dev setup is like a code smell.


I recently joined a new team and codebase.

There was no 'getting started', no docs. I was left on my own. So I muddled through. And documented everything in Makefiles. Not readme, but working code. Now `make clean` `make install` etc, all run, deploy, install, clean etc the project.

A great way to learn. But unfortunately worthless to others in the team, as they all were entrenched in their own ways and setups. I just added a sixth' way of 'working with the codebase'. I'm no to sure if it is the team, me, or Make, but I certainly won't spend such time a next new codebase arrives.

(I do add makefiles on each of my private and opensource projects though, and will keep doing that)


> they all were entrenched in their own ways and setups.

I consider this a bad behavior. If people espouse the benefits of CI/CD and cattle over pets. Then I firmly believe that thinking needs to make it's way to the dev environment.

And what you did isn't worthless, the next person get's to go quicker and spend effort on features that make money.


> I consider this a bad behavior.

Part of it is probably that I fixed something that wasn't broken for them personally.

I understand than when a new person joins and starts telling you that you should now start using this "new", "Makefile" thing, you'd probably be annoyed and get back to work instead.


Your team sounds a little broken. If a new starter came in and did that for our project, I’d be really pleased.

Once you have that stuff setup, and you’re doing “day to day” coding, it’s hard to find time to go back and sort that stuff out. It’s something a new starter can take on that ultimately gives good long term outcomes.


Exactly. Sounds like rot has settled in and there’s a serious lack of empathy for coworkers (who have to wade through that tall grass like those before did). Cut the grass. Better yet, pave it. Docker compose up your dev environment. Make it so going from clone->pr is as quick and painless as possible.


What I've done in the past and worked is as follows

- learn from others what they are doing and why. It may look inefficient but there may be good reasons.

- document everything / make sure your understanding is correct and others can follow your documentation and use the old flow.

- start automating and improving while keeping existing things that worked still there.

Doing things this way I was able to improve processes for very bureaucratic companies and have had people appreciate and put in their time to help once they started using.


What is the benefit of makefiles over bash scripts?


- Make is a standard utility, so anyone that has used make before will know what 'make', 'make clean', etc will do without having to read any documentation.

- Make has the ability built-in to only run certain commands based on the files that have changed, which can be handy.

- Makefiles usually have a lot of handy variables for configuring the project like $CC or $DESTDIR. While you could have these in a bash script as well, there's a standard convention for variables in Make[1].

In my experience, Makefiles are usually better documented and follow more conventions than scripts. There are of course many exceptions to every rule.

Make files are very similar to bash scripts, and while I would prefer a Makefile any day, it's really a matter of preference.

[1]: https://www.gnu.org/prep/standards/html_node/Makefile-Conven...


I think we also need to be careful. The OP on this thread did not specify what kind of project. Make may be completely appropriate. It might also not be and just what they are used to. The proverbial hammer.

If someone came in and tried to build my Java project with Make, I would definitely call them crazy and not use it either. If someone came to my Java project and tried to put Gradle on it to build it, I might also not use it but at least its appropriate for the language and if everyone on the team/project was OK with it, I would probably be OK with it and use it.

If someone tried to build my FE Javascript project with Make, I would not use it. If they used yarn instead of npm that would be a Gradle vs. Maven situation as above, so either's fine if there's nothing yet.

If there's a set of bash scripts that are perfectly adequate to build the project you have, fine, why use Make?

And let me tell you from experience, conventions are great but don't always work, i.e. no $CC and $DESTDIR are definitely not something every user of Make does appropriately. I've had to add that to a few back in the day. You install like you're used to with DESTDIR only to find that it just installed itself into your system. You swear, fix it and learn to _always_ check a newly downloaded project for whether they actually have the 'standard' features implemented before you build something.

Contrast that with many of the newer build and packaging tools, that basically take this choice away from the user, which is a great thing if you ask me. Sure you have to get over the "but I want to do it my way" defiance reaction but in the end it is awesome when each new project you come to you do _not_ have to check and learn how exactly that project does things because every Maven project has its stuff in the same place (unless someone re-configured it, which unfortunately does happen). But maven is bad nowadays, coz it uses XML :(


This was a Node, Rails and-then-some project.

Rails comes with Rake, the Ruby make. Node comes with 'scripts' and then gulp, grunt, or whatever the node-js-buildsystem of january 2021 is. 'and then some' is a bash script or four, custom rake tasks, bash-scripts+follow-some-wiki and so on.

Make's other great advantage is that it makes a great 'overarching' general setup. Whether I have a rails, node, ansible, cargo/rust, jekyll-project, `make install` installs it, `make [build]` builds it, `make test` tests it and `make deploy` deploys it. It solves the "yea, but here you need npx" and "this will only work with bundle exec prefixed" or "you have to run this with `python2` as `python` is python 3 and... We all know these cases.

Edit: or, in your examples: `JAVA_HOME="." maven build` is fine. As is `PDF_BUILD=true maven run test {integration,unit,model,jobs}` already less so (I don't know much about maven). All require the dev to know how to operate maven, to learn that this project uses maven and to know the intricacies of maven. But they could all be abstracted away beind one predictable `make build` and `make test`; and whould therefore work for your maven project, node project, that ancient ant-setup and whatnot: all the same (well, if they all have a Makefile, that is).


You seem to be assuming that people know make. People don't any longer. People that grew up with C, built their Linux system from scratch etc. know Make. For someone that doesn't know make, none of what you say comes natural or 'makes sense'.

Also, make is a very general purpose tool. You don't have to have a `test` target. I could easily call the target `analyze`. Do you store your compiled files in a `dist` folder? Or do they just end up right next to the source files? Where are your test files stored vs. your main project files?

In the Java world the equivalent of Make would have been ant at some point. People did exactly what you said you would do with make. Most people had the default ant run build the project, test would test it etc. but everything else was a complete free for all.

What Maven did was standardize a lot. The test goal (goal being maven speak for make targets) is always called test. All build output always goes to the `target` folder. maven always expects your test files nicely out of the way of regular source files and they never end up in the resulting package etc. The package(s) always end up in the target folder too. Most if not all of these things can actually be changed but it's work to do it and you can easily see it being reconfigured vs. build tools like Make or Ant, which are much more general purpose like shell scripts.

I could make the same assertions about maven btw. as you do about Make. Once you know maven, you can build anything you like with it and it takes care of all of the same things. In a previous life we actually regularly packaged non-java projects with maven, since that was the hammer we had and understood.


A Bash script is a linearization of the dependency graph, a Makefile is a description of the graph itself.


Indeed, crazy dev setup is definitely like a code smell. I remember at one place, it took me 3 weeks to get to the point where I could merge my first PR. Another place I was at, they handed me a laptop that was already set up to run the code, and all I had to do was set up my own editor. Guess which place I was more productive.


I find that very nice: last arrived on a project (on a similar role) will setup a working environment for the next one to start with. It fresh enough that you remember it, and you could even find interesting to clean up a little.


That's not a half bad idea if the setup is any more complicated than "clone these repos and run the setup scripts."


Came here to say the first point: fix a bug!

Usually a team with a good lead will have a bug that’s not too hard, and will help the newcomer learn the codebase.

It’s easier to learn something when there’s a purpose than in the abstract. And a quick win is motivating.

For more senior developers, assigning a simple feature addition is another option. Eg, having a front end developer add a filter option to some search panel in an application. It may require some UI work, which has the added benefit of getting one involved with people on the UX side — helping to onboard in a team sense. It also may involve some server communication components. Either way, like Muir said — tug on one tiny leaf and you’ll find it is connected to all of nature. (Or something like that.) Similarly, investigate a bug or new feature and you’ll find it brings you through a large portion of the app’s infrastructure.


Document the steps to setup dev environment in your own words and highlight the issues that you run into

Lots of people are commenting saying they have problems with this.

The first thing I do when working on an unfamiliar project is to write a SHELL SCRIPT that records everything I did. I keep that at the root of the git repo, usually as "run.sh".

For example here is what I did when hacking on Kernighan's awk 5 years ago:

https://github.com/andychu/bwk/blob/master/run.sh

So now 5 years later I can see exactly where I downloaded the source from. I count how much source code there is in a repo to get a feel for it, and I have that exact command recorded.

And I was trying to figure out how much test coverage there is, so I ran a bunch of gcov stuff, which involve Python.

The shell script may have some problems now, but the point is that I can tell within 10 seconds what I did 5 years ago. And I can fix it in a few minutes.

It costs so little to write down these commands that it's worth it. If you try to make it really rigorous then you're not going to do it. Th

In summary, I suggest becoming SHELL LITERATE and checking in shell script with comments. The point is that shell is ALREADY what you're typing, so you can save it exactly like it is in a file, and run it later. (You can also use a Makefile, but then you lose that property, and that matters. Make has all the gotchas of shell plus some more.)

https://news.ycombinator.com/item?id=25400278

(I have a couple upcoming blog posts that mention this. Another view on this, regarding releases: http://www.oilshell.org/blog/2020/02/good-parts-sketch.html#...)

----

Another meme I use is "never remember a port number".

I have had the experience of pair programming with people and they are trying to remember port numbers. Sometimes the server doesn't print it out to the console.

Sometimes they go digging through their notes, or they go digging through Python source code to find the port number.

I always know the port number because I put it in a shell script at the root of the repo.

If you automate stuff like this you can get to the meat of the problem a lot faster.


4. Teach the next hire after you, how to do 1-3.


I would recommend that when someone is new in a large project with an older code base, to pay attention and learn from others.

A mistake I often see is too many “suggestions for improvement” too early. This behavior, when excessive as it often is, is perceived to be invalidating tough choices that were made before your time and chances are that the people who made the choices had good reasons for it and are more experienced than you are. Learn from your seniors - it may be hard when you’ve just read about this new “paradigm X which solves all problems” and you believe you are smarter than others - but please try.


A good senior would explain the "good reasons". A bad senior would be annoyed that the choices are questioned. As a leader I tell my "apprentices" to always question me what the reason is for doing something. Being in a leadership position is not just about your pupils progress, it's just as much about your own learning. You learn a lot when you have to explain stuff. It can be very humbling when you try to explain in-front of your team, being questioned, and then discover that you where wrong.


Often the reasons are not clear because they are not documented it, often only the dev who did it would really know.

Also, it's probably worth taking note of 'fresh eyes' because they have the advantage of hindsight which is often good. That said, there usually complex elements of incumbencies so I do think it should be some time before people have too much lean in.


This is a pretty one-sided take. I think it’s quite possible to lose perspective when working on a large codebase. It’s also possible to have small bits of technical debt accumulate without anyone feeling the total weight of it all. Old code can use whatever fad was most popular when it was written or it can carry the cruft of old dependencies or api changes. Just because something is old it doesn’t mean that it’s good


Of course. But as a newcomer, you cannot know which one it is, and as the OP said this behaviour has a lot of chances to be perceived as invalidating. I've been on both sides of this situation, and in both cases I felt that it was almost disrespectful, it's sending the message that you don't think the team is competent enough to have considered them and carefully picked trade-offs.

When I was on the receiving side (there was a newcomer to the team doing a lot of "suggestions for improvement"), I made a lot of effort to explain the context that led to a different decision, but it was very draining and rarely the good time for it (it distracts from the conversation), it was really hurting collaboration. Eventually we had a one-to-one in which I explained exactly that, and we found that it would be much better to raise these as "can you tell me why is this one like this" (rather than "improvement suggestion" that can be invalidating), and do it in one-to-one so that it's not distracting. Collaboration was much better after that

For sure, it's great to have a new pair of eyes and you certainly don't want to tell somebody to keep their challenges and suggestion for improvement for themselves. But it doesn't mean that there's no good way or bad way to do it


I typically advise new hires to keep a “dirt doc” of things they think could be improved, and have them come back to it after they are more up to speed (say a month or two in).

The insights that you get from coming at the problem with fresh eyes are invaluable and you really don’t want to waste those. But many things on that list will evaporate once you have a bit more understanding of the project.

(I still like to make “check in a fix / improvement to the dev environment documentation” the first task for a new starter though, as there’s usually a typo or update required somewhere).


Yeah, I find this a huge problem especially with junior devs (by which I mean those within the first five or so years of their career. When you have people calling themselves senior devs after six months the job title has become meaningless). Until you've been around long enough to understand how large code bases (in general, not the one you're working on!) evolve over time and why common trade offs are made you're not going to be able to grok the weaknesses of an existing code base quickly. Start off by being humble and asking why certain things have been implemented in the way they have, there is usually a reason. Sometimes it's even a good one!


All of this is true. To me this article mostly reads like 'water is wet'. What one often sees, though, is that younger developers often have the idea that they need to read through portions of the code as a step of getting into it. I am not really sure that it is a helpful step for all but the smallest code bases. It is often more helpful to start with a user story and then try to find out what portions of the code apply to that.


it is true but as we all know checklists can help to make sure stuff isn't missed, even for very experienced people, and it is an unmitigatedly good thing for newer developers.

i gave a go at making a cheatsheet of the steps: https://twitter.com/Coding_Career/status/1350445944395821056...

think it's a good guide to follow even if I instinctively do most of it already.


What helped me in a rather huge Java/Spring codebase with little documentation and little knowledge was running the code locally and using async-profiler to generate flamegraphs for requests to our rest-endpoints - then setting breakpoints in the ide and spending a few hours just following the requests to the code. It's not a good idea to rely on this for a complete understanding but I've remember attempting to read the code class by class before and was unable to get a mental model of it - after doing the flamegraph/stepping in the debugger dance a few days I've started to feel right at home - it also helped me quite a lot to pinpoint further issues.


I think I found the pony: do lots of little experiments to test your understanding of the code. The value of software is not in the bytes of the library/executable or even the source code. It is in having people working for your company who have a mental model of it in their head.


Interesting reading the comments how everyone has their own best way of joining a new team.

For me, the best way to join a codebase without a doubt is to actually just use the product first. What good is looking at the code if you have no idea what the product is even supposed to do?

This doesn't have to be in-depth knowledge, but just go through the setup of your product, do a few happy path use cases, feel what it's like to actually use the thing you're about to develop.


Go and talk with the people who wrote it. Often times there are a just a handful of people that wrote a lot of the core functionality, they often aren't the most social. Talk to them when you are in the planning phase of implementing something.

Lots of great advice here that I agree with. IMHO though I see a lot of engineers miss on the team and social aspects of coding. Just as important as your tech stack is your team. Have an idea for improving something? That's great, but remember to listen first and learn to love what is great about the way it is. For you it could be X% better for the original authors it is a miracle that they made it work at all and it is important enough to need a bigger team.


That, try to understand all the things that are not documented. Sure some projects may have written extensive documentation and all the options they considered and why the did what they did, but most of the decisions usually are not documented. Try to get into the mindset of those people and get into the conversation.


> The rule of thumb I use is to understand something just enough to express what it does without necessarily knowing exactly how it does that. This process is called "chunking," and it relies on the fact that once you have a basic understanding of a unit of code, "you don't need to remember all the little underlying details" (Oakley).

Isn't this how a lot of non-math majors learn math at uni? You learn how to use it, but you don't learn the proofs behind it? When I dove into a first codebase, I took the above said approach because that's how I learned (most) math.

A younger me would've find the Tools section valuable.


Oakley emphasises that understanding is an important part of chunking. Chunking is not the same as root learning without understanding. More that when you familiar enough with a concept you don’t have to think of all the details of it, but can treat it as a unit.


> You learn how to use it, but you don't learn the proofs behind it?

Which makes sense, at least firstly, considering that understanding a proof is a lot easier when its conclusion is already familiar.


I wonder why no one has not mentioned tests. For me, the tests are more valuable for system or service understanding than, for example, documentation. But the best scenario when you combine tests and documentation. I have joined the new team recently. They have a lot of e2e BDD tests implemented using the Cucumber framework. It helps me to get a complete view of the system in the shortest possible time.


Completely agree. Tests/specs are the best way to learn a system. The article barely even mentions tests and the scenario they describe revolves around the use verifying correctness when making your first change to the codebase. Not a thing about using them to understand the behavior of the SUT.


Somewhat related; I've always been very curious why a lot of developers I've worked with seem to think testing isn't worth the effort, and I've been keeping a mental checklist of all the reasons/excuses I hear so that I can reflect on them as well against my own experiences - sort of a way to challenge myself and ensure I'm not just cargo-culting methodologies.

Recently I decided to go through `Growing Object-Oriented Software - Freeman, Pryce` (and actually finish working through it this time) with the goal of understanding what "proper TDD" is supposed to look like. Something interesting I noticed is almost all the complaints I've seen as reasons against TDD/certain testing methods all seem to be examples the authors use as how not to use TDD/testing. It seems to me that a lot of developers have just learned _of_ these techniques by name, and haven't really put a lot of time/effort into practicing their application and instead just seem to write them off on face value or by the literal interpretation of their names.

One example, a lot of people I speak with seem to think TDD is very literal "write a _unit-test_* for everything before you write the implementation OR ELSE...", but of course there's a lot more nuance than that depending on the situation. The book actually puts a huge emphasis on having your initial test(s) be end-to-end tests that slice through the system as a whole, with the idea being to create nested feedback loops of varying granularity/abstraction to allow you to iterate without fear. This was something I never heard emphasized at all when I started learning about TDD, or hell even in a paid course my employer put us through by a "TDD expert".

I should also clarify, I only consider these claims from developers that have a proven track record of working on large/complex systems since, well, those who don't probably haven't cultivated their ideal workflows/approaches yet (or just haven't been given the opportunity to showcase them, as is common in large companies).

* I emphasize unit-test because a lot of code bases I've worked in rarely have anything other than just unit-tests...


I agree. Everyone should consider the costs and benefits of testing to arrive at a system that works for them.

Early on in a system design, I prefer going overboard with integration tests.

The architecture is hard to predict early on, so I don't want to get paralyzed on figuring it out. Just get the tests passing without overthinking.

Once I find a better way to do things, I can rip apart and restructure the internals with the safety net of end to end tests.

I rarely write unit tests at this stage because the effort of writing and rewriting them adds friction to getting the software working and restructuring the code.

If every time I redraw a boundary, I need to rewrite a bunch of unit tests, I'll be less likely to improve the code quality.


I'm skeptical that you get a view of the system through BDD tests better than with somebody walking you through the product. The person can explain the domain, the context, the subtleties, skip the not-actually-interesting-parts, draw things for visualisation, etc while adapting the explanations to the fact that they're talking to a newcomer: the tests do none of that (but maybe I've never done BDD properly)


It's >50% communication.

- Setup docs have to be clean and well maintained, hopefully scripted.

- Architectural overview has to be clear.

- Other devs must make time for new devs, and that has to be communicated.


i really liked this post and made it into a cheatsheet guide: https://twitter.com/Coding_Career/status/1350445944395821056...

thanks to OP for sharing it, its not the kind of thing that gets written down enough but funny enough it is really important to join a team well and nobody teaches how to do it.


This is pretty good advice and I'm glad I recently did some of the steps at a new company.

On the setup step I found it very helpful that my team has used scripts to rule them all, so setup was a breeze! I didn't know about that before.

Short explanation: https://github.blog/2015-06-30-scripts-to-rule-them-all


Things missing from the list:

- get a demo of the system. Not from a developer but from a user/sales.

- Understand the team objectives and goals on the high level.

- Ask your team lead what small project you can do to get your feet wet in the system without an overwhelm.

- get that first thing all the way out to prod so you learn what that's really like and what it takes.


For someone who is new in the team I would recommend :

1. Try to see what existing functionalities are there and how exactly the code is being executed line by line by putting debug points on the codebase.

2. Try to change the local codebase and see how the changes are affecting the application.

3. Whenever stuck - try your best to find out the errors and search for the solution for the errors present on the console. If facing more difficulty, better take help from seniors.

4. Think of the simplest approach you can use to develop a piece or make a change without affecting much of the codebase.

5. See other projects which uses same technology but with small codebases having same kind of implementation and try to learn the best practices followed also do visit documentation regularly because sometimes all you need is already explained in the simplest way.


I you really want to understand something, you have to change it. Or at least try hard.


This seems to be the best method by far. Though not terribly scalable. I think next place I go I will leverage this to greater effect early on.


The text on this blog needs more line height. I find myself tracing lines in order to read.


I’ve always said the easiest way to learn a code base is to write tests for it.

There’s always a flaky test someone just doesn’t want to fix, or a feature that doesn’t have any tests, or a couple components that don’t have an integration test, etc...


Look at some low hanging fruit in the bug tracking system.

See if there is a knowledge base - if there isn’t perhaps create one as you learn and ask others if you have the right idea. Could be useful for the next newbie in the door.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: