Hacker News new | past | comments | ask | show | jobs | submit login
I wrote a free book about TDD and clean architecture in Python
192 points by thedigicat on Dec 28, 2018 | hide | past | web | favorite | 62 comments
Hey HN,

I just published on Leanpub a free book, "Clean Architectures in Python". It's a humble attempt to organise and expand some posts I published on my blog in the last years.

You can find it here: https://leanpub.com/clean-architectures-in-python

The main content is divided in two parts, this is a brief overview of the table of contents

* Part 1 - Tools - Chapter 1 - Introduction to TDD - Chapter 2 - On unit testing - Chapter 3 - Mocks

* Part 2 - The clean architecture - Chapter 1 - Components of a clean architecture - Chapter 2 - A basic example - Chapter 3 - Error management - Chapter 4 - Database repositories

Some highlights:

- The book is written with beginners in mind

- It contains 3 full projects, two small ones to introduce TDD and mocks, a bigger one to describe the clean architecture approach

- Each project is explained step-by-step, and each step is linked to a tag in a companion repository on GitHub

The book is free, but if you want to contribute I will definitely appreciate the help. My target is to encourage the discussion about software architectures, both in the Python community and outside it.

I hope you will enjoy the book! Please spread the news on your favourite social network

I really don't get the mantra of "implement only the shortest solution that makes the test pass"

I would add a very important second part here "you think is correct".

An introduction to TDD that starts by implementing a function called `add` with `pass` or `return 9` is somehow deeply irritating to me.

The example would be so much closer to real world usage if you started with `a + b` and then realized the tests also require you to support 3 arguments.

Also: you introduced the test for three arguments after you wrote the first code for `add`.

I agree.

In Philosophy of Software Design, J. Ousterhout says that this TDD practice encourages "tactical programming", i.e. trying to solve the current open ticket with minimum amount of work, instead of solving the problem correctly. Personally I'd go further. To me, this approach encourages replacing your brain with a dumb gradient descent process. Instead of thinking about the problem and solution, you just keep writing an ever larger monster whose sole purpose is to interpolate between test cases.

Tests themselves are useful. Following a process that makes meaning implicit instead of explicit? Not so much.

There is a reason for the "return 9" trick. The point is to force you to write a test that breaks the previous test. In this style of TDD you need to always be working from "red" (in the red/green/refactor cycle).

If your test is "add(4, 5)" and you implement add with "a + b", there is no failing test you can add. However, there are many classes of potential failures. For example, what if "a" or "b" is zero? What if one of them is negative? Even potentially, what about real numbers, etc. If you write your first failing test and then implement it with the entire solution, then you often stop with a single test that doesn't cover the edge cases you care about.

However, the real problem is explaining that style of TDD with such a simple function. I would never use a test first style for "add" unless there were some reason to do so (for example, there was some reason to assume that 0 wouldn't work). The "return the answer" trick is really useful when you have a function that has a very complex set of return values. You pick some arbitrary value to test against so that when you are working on the solution, you can see if you've broken then "happy path" test.

The "return the answer" bit is not even necessary, but it's handy to ensure that your test harness is working properly. I've often seen people do a "green-green-refactor" style of testing and write hundreds of lines of code only to realise that their tests weren't even running and the code doesn't work at all. It's 30 seconds of, "Can I write a test that makes the tests go red?", "Can I make the tests go green?" OK, let's begin.

> "you think is correct".

> `add` with `pass` or `return 9` is somehow deeply irritating

Similar to others, I understand this feeling. And I would urge you to get over it.

First, technically returning 9 for a test of add(4,5) is correct. Your tests define what the function does, not the name. If you are thinking "a correct add adds all numbers", then that isn't wrong, but it is bringing your tacit world knowledge into the code without that knowledge being expressed in the code. And that's Not a Good Thing™.

It also seems to me that what you meant here was not "correct" (because it is correct), but "general". And that's one of the points of XP/TDD: as programmers we have a terrible tendency to over-generalise. Really horrible. "Write only the code to make the test pass" works against that tendency (and also has the nice gamification effect mentioned elsewhere).

The other point is when to generalise. The first answer is "not", the second is "not yet". Generalise only when you have several instances, otherwise your generalisation will, what's the technical term, suck.

Last not least is how to generalise. TDD/XP essentially say that you generalise only as a refactoring, that is a behavior-preserving transformation.

So when you add the add(10,2) test case, the really pedantic way of doing it would be to handle the extra case with an if-check, then once the additional test passes refactor the add method to handle all those cases with less code, by removing the case-handling and doing arithmetic.

What that means is that you separate the coding that gets the tests to pass from the more challenging bits of coding where you need to generalise, with the latter always under full test coverage.

It's a really nice way to make progress.

And of course: use your own judgement. If things get to tedious, it's OK to leave out steps. At some point you will probably find that you are running into problems, at that point you want to slow down and re-introduce more of the intermediate steps.

> First, technically returning 9 for a test of add(4,5) is correct. Your tests define what the function does, not the name. If you are thinking "a correct add adds all numbers", then that isn't wrong, but it is bringing your tacit world knowledge into the code without that knowledge being expressed in the code. And that's Not a Good Thing™.

How do you feel about just starting with tests that are actually more of a specification? For example, I'm starting to feel that unit tests aren't something we should be writing, but are something we should be generating. We should be writing things like QuickCheck specifications. So the first test I'd probably write is:

given x. y. add(x, y) == x + y

This is a specification that is essentially tested against a "model" (native addition, in this case).

> How do you feel about just starting with tests that are actually more of a specification?

Tempted and conflicted. Yes, it's tempting, because "yay, more general and comprehensive", the mathematician inside me says.

On the other hand, the beauty of these sorts of tests is often precisely their concreteness/specificity. Tests should be so simple that their correctness is obvious by inspection. (One of my problems with specifications was always that to me it was no clearer that the specification captured what I wanted than that the code did).

The other side of the equation is that although theoretically such concrete tests are seriously flawed, because they only test one set of values, in practice that is actually usually quite sufficient.

> we should be generating

Again, sounds very reasonable, but now you have to trust (a) your generator and (b) your specification of those tests. I'd rather have a concrete test.

Now I do think there is some use for this sort of thing, for example in ferreting out edge cases that you might otherwise miss. I wouldn't want to rely on it.

> otherwise your generalisation will, what's the technical term, suck.

Linus Torvalds didn't write the linux kernel or Git in this manner and ended up pretty well.

I get the idea behind TDD, and have used it successfully in areas where indeed the business case was very unclear.

But to drink the Uncle Bob Koolaid and pretend this is now the only way to do things is lacking in creativity, and wisdom imo.

> Linus Torvalds

I am not Linus. Maybe you are.

The reason I am such a fan of this approach is that I wrote software without it for a long time (Objective-C pre-processor/runtime, Postscript interpreter + various RIPs + other pre-press stuff, content mgt. system when that was still a thing...), and the difference is...hmm..let's say "significant".

Without this sort of approach, you get that rush of making super progress quickly, of holding the entire state in your head and being just the coolest cat on earth. It just doesn't last. Things start deteriorating, you fix a bug only to have another one pop up, after a while of this you get a sense of deja-vu and notice you've been going around in circles. Pretty soon you are stuck.

When I was put in charge of the CMS, management wanted to know when we would ship the next version. I couldn't honestly tell.

With a TDD approach, you get less of that rush, less of the "I am the greatest". What you do get is steady progress without backtracking.

> drink the Uncle Bob Koolaid

You shouldn't drink anyone's Koolaid, not mine and certainly not Uncle Bob's.

For your reference, I was doing this long before Uncle Bob. At least I remember sparring with him on comp.object, when he was still claiming that "OO is just function tables" and not a peep about TDD from him.

Generalizing and naming is the purpose of functions. An add that only handles 4 and 5 need not exist; add(4,5) gets replaced with 9 and that's that. If I'm looking at a complex expression, it's easier to convince myself that it's right if it contains 9 than if it contains add(4,5). Why would I write an add(4,5) that fails and has to be repaired, if I can write a 9 that doesn't fail? It could only be because I'm anticipating a future generalization.

Oh, and can write a 9 that doesn't fail? That goes against TD; shouldn't I first break the lexical analyzer of my language so that the 9 constant goes red? Then repair things, so the 9 test goes green?

How can you call it TDD if you're relying on non-TDD pieces that work without being covered by tests that previously failed?

> First, technically returning 9 for a test of add(4,5) is correct. Your tests define what the function does, not the name.

You are technically correct that testing "add(4,5) == 9" is valid. However, everyone already knows that an "add(a,b)" function which always returns 9 is broken.

> So when you add the add(10,2) test case, the really pedantic way of doing it would be to handle the extra case with an if-check

Saying an if-check is the correct way to refactor does a disservice to TDD practitioners everywhere.

Implementing a trivial add function via TDD is just bad. It would have been far better to choose something slightly more complicated such as sorting or fizzbuzz.

At what point do you generalize to doing arithmetic and what makes that point better then any point beforehand? What is it that you learned from (4,6) and (10,2) that you didn't know before?

There starts to be duplication in the implementation that you can eliminate by refactoring.

Refactoring is not (necessarily) generalization. I am interested in how you decide you have seen enough examples to generalize (to actually doing arithmetic in this case) and what makes you trust your judgment at that point while rejecting your judgment initially.

Of course, arithmetic is a somewhat silly example, because we generally tend to have arithmetic built in, and we all know how arithmetic works.

"Refactor to remove duplication" is absolutely a standard step of TDD. How you do it is up to you. This is not an automated process, it requires (design) skill and good judgement just like any other programming.

I recall once seeing Beck look at two loops that were quite dissimilar: they had different for structures, and different contents, which is pretty much nothing duplicated except the word "for", and the fact that they were looping - differently - over the same collection. He changed the second loop to loop the same way the first one did. This required changing the body of the loop to skip over the items toward the end of the collection, since the previous version only did the front of the collection.

Now the for statements were the same. "Well, gotta eliminate that duplication, he said, and moved the second body into the first loop and deleted the second loop entirely. Now he had two kinds of similar processing going on in the one loop. He found some kind of duplication in there, extracted a method, did a couple of other things, and voila! the code was much better.


What this approach does is move most of this type of tricky design work to a phase of development when you are not also trying to figure out what the program is trying to do or how to do it. Instead, while you are doing this design work, the behaviour of the program remains fixed and you have tests to make sure that's actually the case.

Having the code remain functionally identical (in terms of the tests) is a lovely safety net to have while you are doing fancy coding acrobatics. If you find you're in a dead-end, revert, check that the tests are still green and try again, with the knowledge you've just gained.

I guess this is something that older developers are more attuned to, because they have been humbled by code enough to appreciate the limitations of their cognitive faculties.

> rejecting your judgment

I trust my judgement, always, and am wary of my judgement, always. But I also listen to the code, and I just don't "remove duplication" until there actually is duplication.




Thanks for the time you took to reply. Just to get something quickly out of the way, yes - it is a silly example but this is the one you chose or at least chose to defend. I will be interested to see an an equivalent of "return 9" in an example that is not silly.

While you make good points about the value of refactoring and having a well tested code, they are not specific to TDD. Nor do they address my question. It seems to be a fundamental property of TDD that a developer is invited to reduce her mental capacity - up to a point, "return 9" or as you suggested "if ((a == 10) && (b == 2)) return 12". Then something unexplained happens e.g.: "refactor the add method to handle all those cases with less code, by removing the case-handling and doing arithmetic.".

What exactly happened here? Why is the developer can now use that big brain of hers 100%, realize that special case handling is not the way to go, and write "a + b"?

There is no switch between "brain on" and "brain off", what you describe as "a fundamental property of TDD" to "reduce mental capacity" is just a misunderstanding on your part.

> Then something unexplained happens

I am not sure why you think this is "unexplained", as it is spelled out in detail in every TDD introductory text.

What there is is a switch between (a) trying to get the tests to pass (tests are red) and (b) trying to refactor the code (tests are green). In both cases, you use your full brain, in case (a) you use it to find the minimal code to pass the tests. If you think finding minimal code is a "brain off" activity, well that's a whole different discussion, but I refer you to Saint-Exupéry[1] and Blaise Pascal[2], for starters.

And in each case you use code as forcing functions for other code. The tests force you to write the production code. Writing only the least possible production code that passes the tests forces the tests to actually test what you want them to test. If you write code that does significantly more than what your test requires, you won't have test coverage for that excess capability. And so if the rest of your program exercises that excess functionality and you break it, your program will break without your tests giving you a warning.

Also, the silliness of the example is of course what makes it a good illustrative example. If we had to first become domain experts with s shared understanding of some random domain, it would make talking about TDD itself much harder. It does require the ability and willingness to use it as an illustrative example.

[1] https://www.goodreads.com/quotes/19905-perfection-is-achieve...

[2] https://quoteinvestigator.com/2012/04/28/shorter-letter/

As before, I'll ignore appeal to authorities :-).

I'll try to rephrase my doubt about TDD. It seems to be the case that TDD proponents claim that

A. Developers not following TDD tend to produce poorly testable code that may be unnecessarily general i.e. attempting to solve problems that need not be solved given the problem.

B. TDD solves the problems by setting a prescriptive template (red -> green -> refactor)

The "refactor" part of B includes syntactic manipulations (e.g: renaming variables, merging if statements,...) which requires little creativity or insight and generalization which usually do require creativity and insight. My question is this: If you accept A how can you trust the developer to correctly "generalize" in B ?

I am really interested in an non trivial example that benefits from "return 9" testing. How about the following problem that does not require one to be a domain expert: implement the function: areCoPrime( a : Int , int b : Int) : Boolean that returns true IFF a and b are co-prime. What examples will you test for and how and at what point would you generalize?

> appeal to authorities :

There are no appeals to authority here, just references to more information. You know, this "web" thing, with links and stuff. I don't think it'll ever take off...

> rephrase my doubt about TDD

Hmmm...so you no longer hold on to your previous claim of TDD being about selectively turning the brain off?

> TDD proponents claim that

> A. Developers not following TDD tend to produce poorly testable code that may be unnecessarily general

No. Pretty much all developers have a tendency to do that:

> B. TDD solves the problems by setting a prescriptive template (red -> green -> refactor)

No. First, no mechanical prescription can solve this problem. Second, if anything the "solution" is to follow YAGNI[1] and DTSTTCPW[2]. As I have explained before and you continue to ignore, where TDD helps is by splitting the task into two, or better, three phases

1. Write a test

You are asked to produce a specific test case, not really that much of a chance of overgeneralising (though not zero chance, see parallel thread).

2. Write minimal code to make the test pass

Since you are only asked to make the test pass, and the goal is to do this as "stupidly" as possible, chances of overgeneralising are again minimised.

3. After the test passes, refactor to remove duplication

At this point, you are only trying to remove duplication, you are no longer trying to solve the original problem, so your scope is much more constrained.

Is this some sort of mechanical or mathematical guarantee that you cannot overgeneralise? Hell no! As I have consistently maintained, TDD is not a "brain off" technique, though you seem to insist on introducing that straw man, only to then object to it. However, it helps tremendously, by drastically reducing the opportunities for overgeneralising (one phase out of three), and even there drastically reducing the incentive and scope for overgeneralising, because you are focused on something very concrete.

[1] https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it

[2] http://c2.com/xp/DoTheSimplestThingThatCouldPossiblyWork.htm...

I'm with you on this. I know it's just an example to get people in the right mindset, but suggesting that a actual programmer would start there is ridiculous.

I disagree (although I think I understand that people might find it strange).

TDD is in a way a gamification of programming. You go back and forth between being rewarded with a green test suite, to challenging your implementation with another test, to "pinhole-surgery" for improving the implementation. Finally you get the relieve of being able to refactor your implementation without breaking the functionality.

So starting with stupid implementations delivers that first reward from the start and helps you find the rhythm of red-green-refactor.

Now of course TDD is also about making a correct implementation eventually, and usually you will get there after a handful of test cases.

I get where this is coming from but it gets boring fast. That's not how flow works.

You don't get into a flow state because of the reward / regular feedback alone. The challenge must match, or slightly exceed, your skill. Otherwise, you get bored. Boredom kills productivity and, ultimately, the point.

A competent programmer may as well just write out the addition implementation. I'd argue even that's too low a bar. In fact, I don't think it deserves a test (adding two numbers specifically).

That leads me to the second gripe. A majority of TDD examples are trivial problems. A great many are that trivial they wouldn't even warrant the time and expense to test in the real world, which is a genuine factor. If you're spending half your time testing trivial shit that can be verified by sight, you're not spending half your time testing what is crucial that may be difficult to verify without trying out the code.

> Boredom kills productivity

You know that graph[1] with performance on the Y-axis and "arousal level" from bored -> eustress -> overwhelmed on the X-axis? I think that mental model is key to understanding why TDD tutorials start out with really dead-simple examples.

One primary cognitive demand when programming is holding many ideas in your head at once, many of which you're only really dealing with for the first time this week. In your day-to-day work, you move from subproject to subproject and have to quickly learn new parts of a codebase. As you are doing this though, you are also holding in your head some ideas which stay there: how the standard library is shaped, what the overall architecture of the project is, who on your team to ask for clarity, and how to use your testing framework. The fact that you already are familiar with these things means that they are "chunked" in the same way that the area code of a phone number is. That means it imposes a lower cognitive cost to hold them in your head.

But what if you are currently learning both the idea of TDD and the specific interface of a particular testing library? Then neither will be chunked and you're going to need to devote a lot more brainspace to remembering that recently-learned knowledge. If you are writing complex tests while you do that, you are a lot more likely to be overwhelmed and perhaps give up on TDD. TDD tutorials are written for an audience for whom how to write a basic test is not well-chunked enough to be boring.

[1] https://courses.lumenlearning.com/suny-monroecc-hed110/chapt...

In some ways at least, you can look at data science / ML work as an extreme form of TDD and closer to it in spirit than most proselytizing I've come across. In another great example, Peter Norvig does way more TDD for his Sudoku solver [1] than a "textbook" attempt at it [2] ..like many orders of magnitude more testing when counting cases. The spirit is better if we try to answer the questions of "what do you know about your program" and "how do you know it" through code.

[1] http://norvig.com/sudoku.html

[2] https://xprogramming.com/articles/oksudoku/

I completely disagree with you. In practice you probably don't need to be that strict about TDD, but if you're being introduced to it with no TDD experience at all then you should absolutely be doing this.

Dogmatic TDD is the best way to practice and learn TDD. If you don't do this then you _will_ inevitably learn bad habits. Once you learn how to TDD like this you will begin to understand when you don't need to be dogmatic, but that only comes with experience.

I just published version 1.0.2 of the book. Some readers spotted typos and bad grammar and submitted pull requests (kudos are in the changelog). You can download it from Leanpub. Happy reading!

BTW, I think it might be worth including "commands" in the usecases, and repository (looking quickly at the code on GitHub). At the moment, the only usecase is fetching data.

Yes you are right. I planned to add that part in a future release. Thanks a lot for the suggestion!

Unrelated: I'm a very experienced developer, and I want to learn Python, and I'm yet to find a book that's sufficiently challenging and doesn't start too slowly to hold my interest.

I guess what I'm looking for is something like: "here's a list of array operations. Now do Towers of Hanoi". Any recommendations?

What you need, in my opinion, is python docs and not a book. Python official docs has a tutorial[0] and and a library reference[1] section. The tutorial gives you a clear introduction to language syntax and general coding style and ends with a brief overview to standard library. After that you can keep the library reference "under your pillow" and refer to it as you need.

I found this the best way to start with python. YMMV.

[0] https://docs.python.org/3/tutorial/index.html [1] https://docs.python.org/3/library/index.html

Well, you can try my "Full Speed Python" book [1].

Each chapter shows the minimal syntax needed for the chapter content, and then it has some exercises to get you going. I use it to teach Python to my Distributed Computing course students (2nd year students, which come mostly from a Java and C background).

The first chapters are mostly for beginners, so I guess you can skip them if you want. I also talk about more advanced concepts on the later chapters (such as iterators, generators, coroutines and asyncio). You can get the epubs/pdfs from [2]..

[1] https://github.com/joaoventura/full-speed-python/

[2] https://github.com/joaoventura/full-speed-python/releases/

This looks absolutely perfect!

This is not really the kind of book you're looking for, I'm, afraid, but still it's the best one I've read on Python in general http://shop.oreilly.com/product/0636920032519.do

As a language reference you could use something like that https://gto76.github.io/python-cheatsheet/ or another good book https://doughellmann.com/blog/the-python-3-standard-library-...

I agree that Fluent Python is best choice, even if it's not exactly what was asked for. You can skip bits to use it as more of a list of features, but it contains the information required to understand what Pythonic means, and what's in the extensive standard library.

Definitely would recommend to anyone experienced wanting to learn Python.

Since python is relatively easy, your best bet is probably to just dive in and see how far you get before you're stuck. Alternatively, join a Discord server or similar for programming and ask the various questions there, like "what are the double-underscore methods even?" or "I've seen this weird syntax, what is this? [x for x in {1, 3, 6}]"

Oh, and if you find a book series like that please let me know - I would love to move in to another language like C++ or similar, but I cannot be bothered reading about while-loops again.

Get yourself Fluent Python, it is a really good book on Python for developers from other languages.

I'm the author of a book also available on Leanpub.

I love this service as a publishing solution. You can write markdown and get your book updated to dropbox (while you write it/push it to VCS) and then publish directly through them to epub/PDF. But the other nice thing is they can export Amazon-compatible files so you can publish to Kindle.

They also offer a very fair and transparent royalty structure. I found Amazon could be a bit greedy. The reason why is when a book goes above 9.99, they increase the royalty significantly.

The only other tidbit I have to say is:

I also considered using sphinx-doc as a route to build the book. I haven't found a way to offer an appealing LaTeX (theme? template?) that'd work with it. If you do know LaTeX well, and have time on your side, maybe sphinx-doc could be down your alley. It can build to HTML, PDF, and epub.

As far as distribution goes: My experience with Apple's book store was very cumbersome. Difficult to get support, and in one case there was a glitch that unpublished the book.

Thank you for your comment. I considered using LaTeX or other tools, but in the end Leanpub is very easy and straightforward and, at least for the time being, I'm OK with it. Thanks for sharing your experience with Amazon and Apple

Awesome. I've always been curious over examples of clean architecture in Python. I had a cursory glance. As a suggestion, I think the `Room` domain model could be improved with dataclasses (https://docs.python.org/3/library/dataclasses.html).

Ah, you already know what you will find in the next releases of the book =) Thank for the suggestion, actually I was struck by dataclasses as they are exactly what I needed for a project I have, a sort of library for clean architectures. We'll see. Thanks!

Or even more by attrs: https://www.attrs.org/en/stable/

Try @attr.dataclass. It uses the cleaner 3.6+ dataclass syntax, but is more flexible (supports kw-only classes).

Thank you both for your suggestions, I will consider using one of these in the next versions of the book. Python is a moving target, so I will have to update the examples.

I've recently also come across https://pydantic-docs.helpmanual.io/ which appears to be useful for enforcing simple constraints on domain entities.

Hey, thanks for doing this! I wonder, is there a way to get the book without leanpub? I’m not too keen on making an account to check out.

That's an interesting suggestion, thanks. I will check if I can distribute it through the blog

One nice thing about LeanPub is that it doesn't have to be free, you can pay as much as you want instead: that way we can support the author.

Yes indeed this is the reason why I prefer to leave it on that platform. I haven't published the book to make money, but support is appreciated and covers the publication cost and other things like the blog. Thanks!

just enter any email address. After that you can just download it.

I haven't looked at it yet but I wanted to say thanks anyway. Your blog has been really useful for picking up some more core software development practices in Python. Looking forward to going through this!

Thank you for reading the blog and the book! Let me know what you think about the book

Thanks, I will read. And hopefully I will review. Hope I can hold myself accountable to that.

Thank you for reading the book. Let me know if you have comments

Just downloaded and started reading the first chapter. Thank you!

You are welcome. Thanks for reading my book


thanks for writing the book.

What software did you use to writing?

Leanpub produces the book starting from Markdown/Markua files. Markua is a dialect/fork of Markdown that implements book-related features like footnotes. So I just wrote it with an editor. You can find the source code of the book here https://github.com/pycabook/pycabook. Thank you for reading the book

Is it possible to compile the book source on my computer with some tool?

I don't know. I believe the Markua specification is still in progress and while it is open Leanpub is the company that pushes it the most obviously. I haven't found any Markua compiler, but for this book a Markdown compiler should cover 90% of the formatting.

s/^/Show HN: / ?

You know what? I wasn't aware AT ALL of the "Show HN:" thing. Sorry. Next time I'll do, I believe now it's late, isn't it?

I was hoping a mod would notice and update the title.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact