
Ask HN: Using Python for large, distributed system - tostitos1979
I&#x27;ve been a Python fanboy for about a decade now. I&#x27;ve written many small Python programs by myself. They were usually one-offs and did a particular task well. Last year, I convinced my team to use Python for a somewhat large, distributed system. Development was a dream, especially when we started using the PyCharm IDE. That said, as the system grew in complexity and features were added, many parts of the system feel brittle. Make a few changes and splat ... there&#x27;s a runtime traceback. We definitely made one big mistake, which is we didn&#x27;t do Test-driven development. I am ashamed to say, we have a tiny amount of regression tests, which I think is the root of the problem. I guess after writing over 10K lines of code for this system I&#x27;ve learned that regression testing is absolutely essential for dynamically typed language. Not news but a hard learned lesson for me. I&#x27;m curious if people have other tips to share when dealing with a large python code base over time.<p>Thanks!
======
inerte
Tests, tests, tests, the larger the scope the better. Some people call it
functional, some called it acceptance, some call it End To End. Test what your
end users are going to do, either that last API call or browsers with
Selenium.

It actually doesn't matter if your language is dynamically typed. Unit testing
and functional tests are pretty big in the Java world, which gave birth to the
JUnit framework family and Selenium, and many others. It's a matter of not
allowing things that should be working a long time ago, ever breaking with any
new commit.

------
aq91
Pylint, pychecker and yes: lots of unit tests.

------
lazysequence
As someone who uses Python as a primary day job language on a very large code
base, I can't in good faith recommend Python to anyone for large systems.

You also mentioned using it for a distributed system.I think this is a bad
technology choice. True, Google most famously uses Python, but I don't think
it's actually good for most of the tasks distributed computing implies. It's a
big discussion, but I'll leave it at - use Python for the parts that are
simple logic if you must, but the actual distributed parts, you are better off
using something with better suited tools such as Clojure, Lisp, Erlang, etc.
I've found Python simply creates problems not worth solving and necessitates
lots of ugly tradeoffs like tons of task queue usage for no good reason.

If you must stay with Python besides the testing comments, I would say:

1\. For the love of God, please use keyword arguments. Why? If some
enterprising developer decides it's time to refactor a function, god bless
them if they think IDE tools will work perfectly. Ordered args are your enemy
in a large code base, don't use them if you don't have to. Obviously,
sometimes you must for dynamic reasons.

2\. Don't be afraid to use classes. Seems like a lot of newer Python
developers go wild in projects with module level functions. These can be
useful and great, especially for functional constructs, but classes can help
cut down on a lot of repetitive code.

3\. Proper naming. Seems stupid, but in Python people still name things badly.
Use expressive names, and that goes for modules too.

4\. Build for generators and list comprehensions. If you build code that is
simple, atomic, and reusable, then they will blend really well into both
generators and list comprehensions. This cuts down on the code you need by a
lot. A corollary is don't implement all your logic inside a comprehension
because it's 1 line. If you do something more than once, obviously make it
into a proper atomic unit of code instead of repeating yourself. A second
corollary is you will find in Python that if you focus on these two things,
you solve a large number of simple problems, including doing things elegantly
with lots of dynamic behavior if that's what you need from the language.

5\. Use virtualenv. People will hate working with you if it's hard to
replicate your environment or it screws with your machine. Inevitably in large
code bases, you will end up with different versions of dependencies in
different branches and such.

6\. Don't abuse __init__.py . This can cause some really weird behaviors if
you have a lot of code importing the same packages again and again. I've seen
bizarre race-like conditions occur because of this. Honestly, I think there's
not much of a reason to ever use it except maybe to stick in some meta
information about the package if you need/want to query it later.

7\. Central settings management. Applies to any language, but have a way to
handle all the settings from the ridiculous packages you use.

8\. Namespace/group your settings values and don't rely on the module to do
this for you. i.e. don't create stuff like WIDTH = 50. Instead, group it with
other settings and use a variable to namespace it properly.

9\. Don't import * from every package. I think the google code guidelines at
one point disagreed with this, but really it is obnoxious in a large code
base.

10\. If you find yourself using 10 parameters or something, you probably
needed an object. Just because you can do kwargs.get and kwargs.pop doesn't
mean you shouldn't create proper objects. Callers in large code bases need to
understand what they are passing in without reading all your source
constantly.

I have many more tips, but I think this is enough of a starting point.

