
Tips for Building High-Quality Django Apps at Scale - stanleytang
https://blog.doordash.com/tips-for-building-high-quality-django-apps-at-scale-a5a25917b2b5
======
randomsearch
I'm so glad to see the mention of "app" directories here. I've only dabbled in
Django development, but I've always thought the desire to divide things into
apps didn't really make any sense. It felt like the developers of Django had
said "well, intuitively, there _must_ be some unit of reuse at this level" and
then stuck the notion of apps in there in an attempt to provide that reuse.

This seems to me to be somewhat unpythonic, unnecessary complexity introduced
right from the start - why not make the reuse optional?. The file system
layout becomes confusing as a result. More generally, I would expect that the
most minimal Django site would be a single file, that would seem most
Pythonic, but it isn't designed like that. Django doesn't "feel" like most
Python projects do.

I've always assumed this must be because I'm a total noob/idiot when it comes
to real-world development, and that projects like Django must be doing things
right and I just don't get it (although it remains entirely possible that this
is the case).

It's heartening, however, to see I'm at least not _entirely_ wrong for having
a gut unease about this, and it makes me wonder how on earth I can judge
whether it's me or a given framework/language in the general case.

Do I have reverse Dunning–Kruger? Or did I just guess right in this case? I
have no idea.

~~~
mi100hael
Part of it is that Django has been around for quite a while now. It's maybe
_the_ most succesful Python framework out there, so there are some paradigms
that aren't common anymore but are hold-overs from years past. In particular,
it pre-dates the current "microservice" trend and assumes a fairly monolithic
environment.

And I think "apps" _do_ make sense in certain contexts. Consider this
situation:

\- I start a "polls" app like the Django docs example.

\- I get a lot of users and they like my functionality, so I want to engage
the community.

\- I decide to start publishing blog posts, but I'm too lazy to spin up a
whole new Wordpress install

\- I find a "blog" app and drop that into my project with the '/blog' URL
prefix.

Now I can start writing blog posts right away and anyone can go to my site
'/blog' and see it. There are no cross-app database dependencies to worry
about.

The same would be true for adding a basic "social" app that shows a user-
profile page for django Users at '/users/<id>' or something like that.

~~~
vosper
Is it actually possible to find and drop in apps like you suggest? And have it
just work? Is a repository of various blog apps out there? What if one of them
does something nasty to your database?

It seems like it'd be easier to spin up a Wordpress instance...

~~~
arctangent
Yes, there's a whole universe of them.

Here's a good place to start looking for things to help make development
easier:

[https://djangopackages.org/](https://djangopackages.org/)

~~~
vosper
Thanks!

------
cwisecarver
I've got more than 5 years of experience with Django on a number of teams and
at a couple of companies and in my experience almost everything in this
article is completely incorrect.

The only things I would agree with is the point about project layout and
avoiding django's squashmigrations for the truncate the migrations table,
delete the migrations, and create a new initial migration.

Practically everything else in this article is wrong, in my opinion.

~~~
tclancy
I've got 10 years' experience on projects small and large and I have to agree.
The title talks about building at scale but the article doesn't stress that
which makes some of the advice downright weird.

>If you don't really understand the point of apps, ignore them and stick with
a single app for your backend. You can still organize a growing codebase
without using separate apps.

This is where the article lost me. If this is for building at scale, maybe, I
don't know. I never hit a point where designing the project in apps became a
problem. Regardless, if you don't know why Django wants to use apps, that
suggests you are new to Django and probably not building at scale, so this
feels like poor advice. Much of the article is telling readers to do things
exactly contrary to Django's philosophy; the problem with that is there are
lots of articles and StackOverflow answers out there based on Django's
philosophy. There isn't a similar body of reference based on the authors'
approach.

I don't know why explicitly naming your database tables is imperative for
running at scale. Now we're breaking from Django's convention because some day
we might want to stop using Django and we will be annoyed by its table-naming
convention? Avoiding "fat models" is another place where it feels more like
opinion than anything to do with performance or good design.

It would be good to know what database engine the authors are running into
such serious migration issues with-- MySQL?

~~~
Alex3917
> Avoiding "fat models" is another place where it feels more like opinion than
> anything to do with performance or good design

So in the Java world, the general pattern is that:

 _Views:_

    
    
      - Accept and sanitize query parameters
    
      - Call call one or more service methods.
    
      - Catch errors and return an appropriate error response
    
      - Render a JSON response based on the results of the service methods if nothing goes wrong.
    

_Service methods:_

    
    
      - Perform business logic
    
      - Manage persistence
    
      - Bubble up errors
    

The nice thing about this architecture is that each piece of the codebase
tells a complete story about what it's doing. That is from looking at the view
you can see what parameters it accepts, how they are sanitized, what service
method it calls, each of the errors that can be returned, and what the 200
response looks like.

And looking at the service method we can see what business logic it performs,
and what the database queries look like.

In each case there isn't any reason to look at other methods to understand the
'story' of what's happening in your app. This makes it very easy to read the
codebase and audit it for correctness.

The problem with fat models is that they're not telling a story about what's
actually happening in the app, e.g. looking at them doesn't tell you anything
about the business logic the endpoints are performing. And what's worse, you
also can't look at the views or services and know what they're doing either.

As someone who strongly prefers Python and Django over the Java ecosystem,
I'll say hands down that in terms of how web app are architected they got it
right and the Django people got it wrong. As far as I can tell the whole
Domain Model Architecture thing seems like a bunch of bullshit that was
invented to sell consulting. If the advocates of this approach can't even
write a coherent Wikipedia article, it should give you a clue as to what the
code ends up looking like. [1]

[1] [https://en.wikipedia.org/wiki/Domain-
driven_design](https://en.wikipedia.org/wiki/Domain-driven_design)

~~~
mabcat
Is there a good place or pattern for service methods in Django? I've got some
very fat models right now, and it's a DRY improvement over having the fat in
the views, but like you say it takes a lot of effort to trace what's going on.

~~~
Alex3917
Let's say you're following the approach of breaking down your project into
separate apps, so you have an app called user_accounts. This app would be a
folder containing files like:

    
    
      views.py, services.py, models.py, test_views.py
    

So in views.py you'd have a User class, with:

    
    
      A POST method that calls services.create_user(username, email_address, password)
    
      A GET method that calls services.get_user_profile(request.user)
    
      A PUT method that calls services.update_user_profile(x, y, z)
    
      A DELETE method that calls services.inactivate_user(request.user)
    

The return value of each of these views can just be whatever
services.get_user_profile(request.user) returns, rendered into JSON.

Then each of the services performs whatever business logic it needs to,
preferably directly in the method. But if it would be more readable split into
multiple methods, then you can create some private helper methods in the
services file prefixed with an underscore. You can also have a separate folder
somewhere for utility functions meant to be reused across the app, e.g.
get_user_emails(request.user, is_active=True, is_verified=True)

Basically though each view sanitizes the data, e.g. strips XSS out of strings,
makes sure booleans are actually booleans, etc.

Then each service first does field-level validation with serializers, e.g.
ensuring that usernames meet the appropriate requirements for usernames. Next
if there is other business logic validation that needs to happen, it happens,
e.g. making sure that only users with verified email addresses can perform
certain actions.

After that you perform the actual business logic, e.g. transforming any data.
Then you perform your CRUD operation, e.g. creating a user model. And lastly
you return something, e.g. returning the user model.

Each endpoint and service method can be written pretty much following this
pattern, which makes the codebase super readable because once you understand
one endpoint you understand all of them. And the service methods are the
reusable component of the architecture, so e.g. if you want the ability for
admins to create users, then they are created with the exact same service
method. (But called from your admin endpoints/services.)

~~~
ralmeida
That's almost exactly what I do, except for using a big monolithic app for the
entire backend (called "core"), and making "services" a package with several
modules inside.

It also resembles very much what I see in Java projects which use DDD (Domain
Driven Design).

What's your take on the article's point that you should have fewer rather than
more Django apps (citing the problem of inter-app-FKs)?

~~~
Alex3917
So my startup is built the way you describe, in terms of just having one main
app, and I personally prefer this style.

The basic argument in favor of breaking down the Django project into multiple
apps is that it makes the components more decoupled and reusable. But
personally I think this is bullshit. If you want your apps to be reusable and
decoupled then you need to put a ton of time into architecting them this way,
the idea that you're going to get these benefits just from putting stuff into
different folders is magical thinking. It seems like pretty much the textbook
example of cargo cult programming.

That said for the client I'm currently working for, the decision was made to
do it the 'standard Django way' in terms of breaking it into multiple apps. So
far I haven't run into any issues here. I like it slightly less because I
think having all the views in the views folder, and all the services in the
services folder makes folks more likely to reuse code just by making it easier
to find. But yeah, so far no real problems, but I'm also not expecting to see
any magical benefits either.

------
goblin89
I find that the safest way to migrate DB schema is gradually, spreading out
intermediate steps across a few deploys.

Migrate the schema creating the (initially unused) fields in advance. Step by
step change your logic where it talks to the storage—start querying new field
values with a fallback, use new fields for incoming data. Migrate the existing
data. Remove the fallbacks. Ensure the old fields are not used at this point.
Migrate the schema, finally deleting the old fields.

Don’t rush, let every step reach production. In deployment sequence, apply
migrations after the new code is already running. Goes without saying, monitor
error rates for spikes.

Yes, at some points in time your production DB schema won’t be fully
normalized. In return for prolonged messiness you get smoother flow, no single
fateful and stressful deployment that attempts to get it right all in one go.

Migration handling is the only part of the article I find a bit foggy and
debatable. The rest roughly resonates with what I’ve arrived at through my
years of experience with Django web apps.

------
Osmose
I appreciate this article for talking about scale in terms of project size and
cognitive overhead, instead of in terms of traffic/computation/storage.
Writing maintainable services isn't easy, and tips like these are painful to
learn on your own.

------
johnthealy3
I was feeling okay about this article until seeing this colossal punt:

> That said, the real intention behind this pattern is to keep the
> API/view/controller lightweight and free of excessive logic, which is
> something we would strongly advocate. Having logic inside model methods is a
> lesser evil, but you may want to consider keeping models lightweight and
> focused on the data layer. To make this work, you will need to figure out a
> new pattern and put your business logic in some layer that is in between the
> data layer and the API/presentational layer.

Having fat models is definitely a problem I'm having, and it's nice to see
it's a problem for the author too, but the advice "figure it out" is presented
without any explicit suggestions.

~~~
sbov
Procedural/functional code. Splitting your models up more by features than
high level things (e.g. having a separate CustomerBilling rather than putting
it in Customer). Component based architectures. Service layer to coordinate
models. Etc. Also, this is/was a common problem (at least, the god class
aspect) in games that the industry has slowly solved over the past 20 years,
so that's another place you can look for inspiration.

Further, unless you know you're going to scale in the beginning, I would
recommend refactoring/evolving over time. It doesn't do anyone any favors by
having 10 models to represent your Customer if your Customer is already
relatively thin.

------
St-Clock
"Don’t cache Django models"

This is too broad to be good advice. A typical solution to stale models is to
increment a cache version key, effectively invalidating the old cache.

Version keys can be coarse- or fine-grained. For example, you may have one
version key for the entire application (supported by Django by default) and
one version key per model or application (you have to roll your own solution).
If your app's models change, you can increment the app version key and the
next time you try to fetch a model instance from the cache, you will miss and
instead fetch the instance from the DB.

------
bradmwalker
[https://mobile.twitter.com/HackerNewsOnion/status/4768359808...](https://mobile.twitter.com/HackerNewsOnion/status/476835980863733761)

------
msmm
It seems like author has problems with correctly architecting his apps and
blames Django for his inability to do that. Only thing that makes sense is a
tip for migrations.

------
nezo
About ContentType : "If you ever move models between apps or rename your apps,
you’re going to have to do some manual patching on this table"

Thanks for this, just had a kind of struggle recently

------
joepour
It's interesting that they're still using the default ORM. Stranger still
they're letting the ORM scaffold tables (?!?) now they're at this size?

------
tinix
This reads way more like a list of Django caveats and anti-patterns, than a
guide to running any kind of Python application at "scale". Maybe a sprinkling
of good hygiene, but that has nothing to do with scalability (unless you're
talking about developer scalability and cognitive overhead).

Further, to suggest NOT using the ORM and to build out a middle layer on top
of the ORM just for CRUD, is, well, insanity. On one hand the author is saying
to cut down on bloat, and then on the other hand they are saying to add
needless abstraction on an already bloated ORM.

I don't use Django, but I do use Python for application development nearly
every day, and I have to say... this article reads more like a list of
(predictable and basic) problems their team had than a guide to scaling an
application. Maybe they mean, how to scale their team, workflow, and their
efficiency, NOT the application itself.

The only thing I agree with here is the avoidance of some Django features...
However, they should take it a step further, and just avoid Django in the
first place.

Also, am I the only one that has never needed to use "migrations" or any
similar sorts of features? You're doing something very wrong and
overcomplicating things if you seriously need to lean on migrations that
heavily.

~~~
orf
I agree with all of your points about the article, but...

> However, they should take it a step further, and just avoid Django in the
> first place.

Django is a tool, and like most tools it has a use. I find Django
indispensable for writing specific kinds of applications, and it's admin
interface is by far the best thing since sliced bread for internal/backoffice
applications. It's not perfect by any means but it's amazing for quickly
getting a CRUD app with an awesome administration interface up and running
with minimal effort. Django rest framework, migrations and the amazing
ecosystem (reversion, django-mptt, django-polymorphic, debug toolbar, django-
currencies to name very few) are also what makes Django appealing and rather
awesome.

Making a blanket statement like 'Avoid Django' is rather silly, given all
that.

> Also, am I the only one that has never needed to use "migrations" or any
> similar sorts of features?

I'm not sure why you put quotes around migrations as if it's some alien,
obscure or weird feature. If you've never written a web application that needs
migrations then you're not writing the kinds of applications (or indeed any
'serious' application) that would benefit from Django IMO.

~~~
tinix
> I'm not sure why you put quotes around migrations as if it's some alien,
> obscure or weird feature. If you've never written a web application that
> needs migrations then you're not writing the kinds of applications (or
> indeed any 'serious' application) that would benefit from Django IMO.

I have been programming for almost 15 years, and have never once needed to use
migrations. I have worked on projects for multiple years with multiple
developers and both small and large teams, and grew and scaled them, massively
overhauled schemas, etc... and still, have never needed migrations.

Have I ever had to add a column to a database? Absolutely, have I ever needed
a massively overcomplicated "migration" tool? Hell no, because I put more than
5 minutes of thought into my application logic and data structures before I
even started writing code or designing the database schema.

I've single-handedly built, and maintained with a team, applications that did
millions of dollars of revenue per day, with 10s of thousands of users per
minute normally, with 100s of thousands per minute during peak load and used
various styles of SQL data stores with between 20 and 50 tables on some
projects (sounds pretty serious to me)... and still never once had a need for
migrations.

My point is, if you need to lean on migrations even remotely often, you're
doing something very very wrong.

I think, though, the type of developer using Django for its large pool of
third-party extensions isn't really the type of developer who puts a lot of
thought into what they are doing, though. Maybe I'll catch flak for this, but
it's pretty true in my experience. It's the same as some front-end JS devs who
sling various frameworks and libraries together, and end up with a pile of
unmaintainable mess in the end...

Django is used to sling backend web apps together, fast, and needing
migrations is evidence of that.

However, if one is doing proper clean development and following some simple
best practices, an automated and complex migration should never be needed in
the first place.

~~~
tobltobs
You get your DB model 100% correct the first time, always? Even years later
your DB model is able to accommodate all those always changing business
requirements?

~~~
tinix
Yeah, pretty much.

Haven't had a problem yet where I needed automated migrations, and I have been
doing this for years upon years upon years.

Like I said, if you're leaving heavily on migrations, you're doing something
wrong.

~~~
rhizome
_Haven 't had a problem yet where I needed automated migrations, and I have
been doing this for years upon years upon years._

Just out of curiosity, have you had to maintain/modify any of these apps over
these years and years? If so, without any schema changes?

~~~
tinix
> Just out of curiosity, have you had to maintain/modify any of these apps
> over these years and years?

Absolutely.

> If so, without any schema changes?

Usually, to be honest, yeah... Core, basal units of any given business process
usually don't change. Only how they are abstracted, and that abstraction lives
in code, not the database.

With a proper storage architecture it is very very very rare to need to change
old data structures. It most definitely doesn't happen often enough to
necessitate the use, overhead, and complexity of a baked-in migrations
library.

If you think "adding a table" or "adding a field" is a "schema change" then
that is where the problem lies... I think. Extending a schema should not
require a migration library. CHANGING a schema, as in, destroying old data,
and making new data, could surely use the help of a migration library, but,
generally, if you're doing that often, you're doing something very wrong, and
if you're not doing that often, then you definitely don't need a baked in
migration library.

Ultimately, if things are so bad that you need a migrations library, you're
better off fixing that shit at a low level, and starting over from scratch if
necessary. Sometimes, instead of creating years and years of tech debt and
burning resources on working with a clusterfuck, you just need to take what
you can from that clusterfuck, and do it right. Then you only have a single
migration, to go from old clusterfuck data to new extensible data structure.

Chances are, if you have old fucked up horrific data, it stays that way, and
then some person or team comes along and writes a nice clean API on top of it.
And then people have to maintain that horrific thing, and it's only horrific
because it's interfacing with shit data layers in the first place.

Don't just abstract bad data architecture away into a service layer, is my
point... Do it right.

Extensible is the key word here, but with an emphasis on decoupling.

If you have some dynamical data structure that is truly changing,
structurally, often, and isn't simply being appended to or extended, then it
probably should not be stored in a traditional SQL database in any way that
necessitates schema changes. If you can't figure out how to represent data in
an effective, decoupled, and extensible manner, then you're screwed from the
get-go. Migrations library won't help you.

------
linkmotif
Django is not meant to be run at scale. At least not horizontal scale. If
you're not going to use the ORM, which prevents you from scaling horizontally,
why are you using Django? Django can be scaled by getting large instances with
lots of RAM, which is quite affordable these days. It can also be scaled with
caching. But if you want horizontal scaling, don't use Django. That's not what
it's for.

~~~
benwilber0
There's nothing really inherent to Django that prevents it from scaling
horizontally, not even the ORM. For the typical SQL read-heavy workload it
will scale as well as anything else depending on the scalability of the
backing RDBMS. Should always use proper HTTP response caching with something
like Varnish or via 3rd-party CDN where possible.

~~~
linkmotif
> depending on the scalability of the backing RDBMS

As far as I know, none of the backing RDBMSs that the ORM supports are
horizontally distributable. At least not officially, right? CockroachDB with
its Postgres interface sounds good, or Citus? But as of today, are any of the
supported DBs distributed?

~~~
benwilber0
MySQL Cluster [1] has been around a long time for applications that need
linear scalability with ACID guarantees

[1]
[https://www.mysql.com/products/cluster/](https://www.mysql.com/products/cluster/)

