
Ask HN: Understanding a large python codebase - bonobo3000
I have recently started working at a company with a massive python codebase. I&#x27;m finding it really hard to understand what is going on because of the lack of types. I don&#x27;t know what the arguments to functions are supposed to be or what they are returning, and because its a fairly sophisticated, large microservice architecture it is even harder to trace things across services.<p>My experience before this was working on a much smaller Java codebase - it was written well and really easy to understand. The fact that we didn&#x27;t really have much tooling was a huge help to me too - so when I deployed a service for example, I could see the shell commands and its basically a &quot;mvn package; scp ...; start ...&quot;. When i wanted to know where the logs go, I look at a log4j config in the project. When i want to know whats running, I ssh in and &quot;ps aux | grep x&quot; - there are maybe 10 servers tops for a project.<p>This is a much bigger company with hundreds of services, lots of machines, complex deploy procedures and in general just much more abstraction both in codebase and ops. My normal approach is to just dig deeper and peel back the abstractions until I see whats happening. I fear thats impossible here. Instead, we ask someone who built it or look at an outdated wiki, which is great but just so much more painful than actually <i>knowing</i> how everything fits together.<p>I am kind of lost as to where to even start understanding this codebase. The 3 main challenges being 1. no type information, 2. massive codebase, 3. ops complexity. What are your tips for getting to grips with this whole system? Any advice is welcome.
======
codeonfire
Well, I would get PyCharm for this particular issue. In these cases you are
usually going to have to search all files for variable names or classes. Learn
all the keystrokes for finding references and viewing class and call trees. If
you are going to be owning the code add some type hinting comments for the
classes and functions that you deal with frequently.

~~~
mrfusion
Agreed being able to click function calls and see their definition is huge.
Also find the back button so you can get back to where you were.

Other than that it just takes time and practice to learn a large code base.
It's basically a new kind of reading you need to train yourself to do.

------
enkiv2
Personally, I'd suggest finding a starting point and tracing through a couple
common operations. That will give you a better idea of why the code that's
there exists and why it works the way it does.

Usually, this habit works better on languages like Python than languages like
Java that encourage repetition and the proliferation of abstraction layers.
You'll do less code-reading in a large Python codebase, because you won't be
dealing with large sets of wrapper classes that exist basically to circumvent
inheritance restrictions. But, I typically do this for any large codebase I'm
expected to understand, even if it's in Java.

Within a particular use case, the lack of explicit types shouldn't matter too
much, if you are reading in execution order. Types are useful if you're
starting from some random function and trying to work backwards, but that's
hard in large projects regardless of the language. While python code may well
take heavy advantage of duck typing, reading code in execution order should
make the set of possible types to be passed in from a particular point clear.

Large projects, no matter the language, take a while to fully internalize.
Your difficulty probably has nothing to do with Java vs Python and everything
to do with small codebases versus large codebases. (I say this as someone who
works with a very large java codebase and a very large mixed shell/perl
codebase at work and works with several large python codebases on the side.)

------
twunde
1) What you really want is a mental model of what the application is doing as
a whole and what the subsection you're working on now is doing. Look for
existing diagrams if possible. If there aren't any start making them as you go
along and then have other devs verify. Then try to own a subsection of the
code and really understand what it's doing. Once you feel comfortable with
that move on to another, preferably related piece.

2)While you normally don't need or want an IDE for dynamic languages, it
sounds like this codebase has reached the point where you should be using one
and in particular be using the jump to definition shortcut. If you make a type
error, this should help you catch it quickly. PyCharm is a good one but there
are alternatives.

3) Operational complexity should be owned by someone already. If not, this may
be the place where you can make the biggest impact. See if there is
centralized logging set up. Make sure the setup/installation and deployment
procedures are up to date. These can typically be automated pretty easily

4) An alternative approach is to look at the data first. Once you understand
the core data models, you typically understand the application

------
quintes
Having worked on a large python code base as well I can tell you that
sometimes it's hard. Look at the types, determine if they're instance of and
try build small test runs around existing code before you modify it. python
tools for visual studio is sweet and may give some help with getting around,
as will pycharm. If figuring out where logs are or other configuration details
are hard to find get them into configuration files. I guess you can draw the
high level of how it fits together but start small, pick a class or method and
follow it through. Oh, enjoy it. python was lots of fun for me

------
mrfusion
The debugger is your friend. But a breakpoint and step brought the code and
inspect variables in places you're really confused.

------
aprdm
use print type() print dir() to debug and see what the objects are / can do

try to think more on the functionality than on what's the type of the objects
the method have as input/output

