Hacker News new | past | comments | ask | show | jobs | submit login
How to design software architecture for startups (appventuretime.blog)
193 points by cbuu on May 8, 2023 | hide | past | favorite | 158 comments



I hate to be the guy who says "you aren't doing services right," but... you aren't doing services right.

> Each service has to be maintained and deployed and increases the boilerplate needed for shared data structures, complicated communication protocols and infrastructure. By the time we released our app, the common library used by most services has been updated over a hundred times. And every service has to follow suit eventually.

Shared data structures? Common library? Services have an API. If you are sharing data structures and a common "core" library across services, yes, you are officially doing it wrong. From the architecture diagram it also looks like there's only one MongoDB instance, which, if you have multiple services talking to the same database[0], is also a major sign that you're doing it wrong.

IMO the author came to the correct conclusion, that monoliths are better for a new project, but for the wrong reasons. Or maybe the right reasons which aren't clearly explained, which is that at the beginning you don't have a good enough grasp of the domain to properly separate into independent services.

[0] yes, there are exceptions where you can have multiple services use the same physical database to reduce maintenance overhead, but enforce separate ownership through different logical databases or even table-level permissions, but I'm assuming that wasn't done here.


> Shared data structures? Common library? Services have an API. If you are sharing data structures and a common "core" library across services, yes, you are officially doing it wrong.

Maybe I've been doing this wrong all my life then.

Imagine I want to fetch user data by id from Galactus, the user data service. I know what shape of data I want from it, and Galactus knows what shape of data it provides. Should both my service and Galactus have full, individual copies of this data structure? Alternatively, if Galactus has a published schema that I refer to, isn't that a shared data structure?

By extracting the user data shape to a common repository (library, header, schema, whatever), not only we deduplicate code, but we also centralize documentation. Maybe the `address` field is cached and frequently outdated, and someone from Galactus noted this down along with where to find the golden source.

Once you factor in other sources of redundancies (username regex, recommended logging, business-specific snippets), that naturally becomes a common library.

I get that it sacrifices some service independence, but is this piece of independence always so worth it, that common libraries are "officially doing it wrong"?


> Should both my service and Galactus have full, individual copies of this data structure?

Yes!

Maintaining that structure should be painful. Why? Because it's an exposed public API and there are ramifications to updating it that need to be thought out.

Further, it can be super easy to pollute that structure with information only relevant to the server or client.

Using something like JSON schema, openapi, or grpc/protobufs can make generating that second structure easy, but sharing it is something that has caused my org tremendous headaches.


IMO the question to be asking is “how do we reduce risk of second and third order effects of changing the API, regardless of whether the change is intentional or accidental.” In your model, any clients consuming the API would have runtime errors if there are breaking changes because the types/data structures are maintained independently. These runtime errors might be tough to debug. Hopefully you know there are breaking changes coming so you can prepare for when the API change is deployed. If it’s a regression then maybe you get an incident. In the shared type/data structure model the error happens at compile time because the rest of your app doesn’t know how to consume the breaking type change. You know what broke because now your client won’t compile. There may still be runtime errors if the API was deployed after the latest client deployment, but generally it’s easier to surface where the regression/change happened because the types are shared.

I think many people prefer the compile-time error surfacing.

There’s definitely trade offs to sharing types though, especially if the API and clients are in different languages. If your team is small enough and has enough context on API changes it’s probably easier to maintain different types. But if it’s large or spanning multiple teams it might be better to rely on tooling to share or generate types. It’s always about trade offs.


HN isn’t letting me edit my comment so here’s some additional stream of consciousness from me.

There’s definitely trade offs to sharing types, especially if the API and clients are in different languages. If your team is small enough and has enough context on API changes it’s probably easier to maintain different types. But if it’s large or spanning multiple teams it might be better to rely on tooling to share or generate types. It’s always about trade offs.


Ok, so this answer, and the other answers in this comment thread thoroughly convince me you're all batshit insane.

How can you get any work done like this?

Are you all working at massive corps that develop at a glacial pace?

Sounds like you all spend your time shuffling papers rather than doing anything meaningful.


> How can you get any work done like this?

People get work done by knowing what they're doing, which I'm not sure you are able to tell.

There is plenty of literature that explains quite thoroughly the process of software architecture. Basically all major software architecture styles from the past four decades reflect the need to encapsulate and insulate implementation details, including the need to specify a domain model and how it should be specific and exclusive to each project.

Somehow, you are oblivious to basic principles but still feel entitled to insult others based on domain knowledge you clearly lack.


Do you want a serious answer, or are you just here to insult people? If you edit your comment to remove the insults, I'd be glad to share my thoughts.


Look, I've never seen microservices done well. I'm negative about it because people are implementing this architecture in applications that as far as I can tell should never even consider it. Maybe they're just doing it badly. But as far as far as I can tell it's a pretty awful architecture pattern.

To implement microservices, one takes what is a service in a normal application, a handful of code files in a normal application. Maybe some model files, a service, validators and a repo file. A slice of an application.

One creates a new project file, build files, etc. Maybe a new repo, maybe not. You then wrap that simple service in a bunch of boilerplate plumbing code so it can actually work on its own.

So, basically, a ton of extra code, right off the bat. Each time.

Then to do it right according to this thread, you duplicate the definition files between your services, multiple times, add JSON schema files that you didn't have to maintain before, and, someone else has mentioned, create an extra library on top of all this so your colleagues can implement it as if it were just a normal method call.

Even more code!

And that's your microservice. A lot of extra work. Busy work as far as I can see, no benefits. Just to do exactly what it used to do.

But, worse still, it has huge drawbacks, including:

    1. Very slow "method" calls. Normal method calls are obviously orders of magnitude faster than whatever you're doing. L1 Cache is always going to be massively faster.
    2. Poor debugging
    3. Complicated devops requirement
    4. Hidden complexity in the interaction between services that is impossible to see

I just don't get it. Never have. I tried to play along, but I personally think the emperor has no clothes. If a client is going to insist on microservices, so be it, but it's a massive waste of time and money in my opinion.


You're the first person in this thread to mention microservices. The discussion has been around broader service-oriented architecture. Sometimes those services can be quite large, in which case the boilerplate overhead is not nearly as onerous as you describe. I've worked on services that had 200+ engineers on them.


That's what the article is about.

That's what this thread is about when the original poster says "you're doing services wrong".

So, no, I'm not the first person to mention them. You just need to read the context of the discussion.


Friend, do you not understand that not all services are Microservices?

The article even states Microservices are not suitable for startups - the conversation in this thread has been able service oriented architectures which is a much broader topic.


None of what's described above is materially difficult or slows down a team used to this method of operation. Perhaps stop applying your narrow lens to all development.


A team, yes. This was two people doing everything. Why would they fully use a design practice designed to scale people on a codebase?


You are oblivious to the point of this approach. Scaling has nothing to do with it. It has everything to do with not imposing useless and detrimental constraints that buys you nothing but problems. You specify interfaces, and keep implementation details from leaking by encapsulating and insulating them. This is terribly basic stuff.


You can do all that without separate services.

OOP languages have interfaces.

You can do this already without adding any sort of microservice, schemas, duplicate definition files, externally maintained libraries, etc..

It's a basic feature of most languages.

It is NOT an exclusive benefit of a microservice pattern. Stop claiming that, it's one of the most frustrating claims/lies microservice advocates make.

The actual benefit is that you're forcing developer to use interfaces. At a massive cost.

There are much cheaper alternatives. You enforce a Dependeny Injection pattern on your services. Code reviews. Linting tools.

So no, this is not basic stuff.

And worse still, if your team can't properly use interfaces in your languages, how do you expect them to suddenly learn to use them properly in your services?


It'd be great if you minded your tone; this is HN.

I don't know where you're getting implementation details leaking when it's just API definitions being shared - they don't leak implementation details unless they're badly designed, which would affect them either way.


I wonder if there are two uncontrolled parameters here.

Firstly, the space and time scales. If your two-pizza team has twenty services, and they communicate like this, and interfaces change a few times a week, then there will be quite a lot of pointless paperwork. If your two-pizza team has one service, used by other teams, and the interfaces change once a month, then this might be an appropriate amount of speed bump.

Secondly, tooling. If your APIs are all done by hand, then making an update is a modest amount of boilerplate. If you are generating everything from schemas, and you have your build down tight, then it can be a matter of changing the schema file, pushing, waiting for that to propagate, then adding the necessary data to the message you changed.


the whole point of this architecture is to transform conway's law from a liability to an asset -- it's a solution to problems that only exist at org sizes large enough where product velocity is bottlenecked by inter-team friction

services map to teams, not units of functionality

imo minimum org size to use microservices is something like 50 engineers


Whether you're right or wrong, the negativity isn't needed and accomplishes nothing.


Technology should never be painful. If it is, it means you're doing it wrong. The fact that you have to put in work and tell the technology you have this shape in two different places is a code smell. The system is telling you the approach is wrong when you are working for the technology instead of the technology working for you.


I'd suggest reading "Falling into the pit of success" [1]

> a well-designed system makes it easy to do the right things and annoying (but not impossible) to do the wrong things.

When designing an API consumed by many people, changing that contract is often the wrong thing to do. You'll have unintended breaks with your consumers. So, adding an extra layer of annoyance is something that tells a dev that wants to change one of these models "Hey, what you are doing might impact more than just the place you are changing. Maybe you should rethink this".

That is, it's a bit painful.

Good software is using pain and annoyance to discourage devs from hurting themselves. When making a bad change isn't painful, it happens frequently.

Have you ever wondered why we write tests? Tests encourage pain. They make future changes annoying (because you might need to update the test, a pain). Yet most people see writing tests as a virtue, not a vice. That's because the pain of the failed test prevents unintended consequences of code changes.

That's what having 2 models does.

[1] https://blog.codinghorror.com/falling-into-the-pit-of-succes...


> Using something like JSON schema, openapi, or grpc/protobufs can make generating that second structure easy, but sharing it is something that has caused my org tremendous headaches.

That's just implicit sharing, or versioning with extra steps...


Another answer:

> Imagine I want to fetch user data by id from Galactus, the user data service. I know what shape of data I want from it, and Galactus knows what shape of data it provides. Should both my service and Galactus have full, individual copies of this data structure?

No. Galactus should have the full shape of the "user object" but should never put that on the wire or expose it. The wire API should expose sensible things like "what is the billing and shipping address(es) of this user id". Galactus is responsible for maintaining the mapping between the "user object" and whatever the relevant return format is for those data.

Edit: This allows you to run API migrations and data migrations independently. Which is crucial for any service that will last longer than 1 year.


Agreed with all of this. I consider the API structure (which might be defined by protobuf/Avro/Thrift/etc) to be separate from anything that is used by either the server or client services. The API structure is the only thing that should be shared.

When you start out, it will feel very repetitive because your client object structure matches your API structure matches your server object structure. You might even think "DRY!" and want to combine these, but IMO, resist the temptation. You will eventually have a case where these need to evolve independently. For the Galactus case, imagine that for compliance (or whatever) reason, you need to start soft-deleting users, and you do that via the simple case of an `IsDeleted` flag on the User table. You'll need that flag on the User object on the Galactus side, but you don't want to add this to the API and expose it to clients.

> Edit: This allows you to run API migrations and data migrations independently. Which is crucial for any service that will last longer than 1 year.

Totally agreed here too.


Is it normal for people to just splat their database across the wire? Because that seems pretty terrible. We wrap every API response in a DTO and are deliberate about picking and choosing which fields we want to expose. Anything else sounds like a recipe for bad times.


I think it is both an awful idea and very common. In the early stages your database looks very similar (if not identical) to what you want your API to look like, so it's very tempting to reduce boilerplate by having your API and database objects use the same data structures internally.


Yep, it’s a terrible idea but that’s exactly what frameworks like Rais push you to do.


Yes, but that's not his point. Obviously Galactus has some internal user data structure that is private to it.

But there's also something going over the wire, as a response from Galactus' API, and consumed by one of more other services.

His question was, shouldn't the knowledge of the structure of that be in some type of shared code?


IMO if that structure is autogenerated from the API specs (like protobuf, OpenAPI, ...), it's not needed to share it. Plus you can make it evolve at the pace of each service, if your API keeps backwards compatibility.


> This allows you to run API migrations and data migrations independently. Which is crucial for any service that will last longer than 1 year.

I've felt nervous for years about going all-in with Django Rest Framework, and you've articulated exactly why.


We use it at work. We always define the exact fields we want, and can swap them for SerializerMethodFields if we need to do something different/funky. It works very well, imo.


Yeah, I wish every language had something as nice as DRF. I’ve never had issues with it for a few different multi-year codebases.


Well yeah, but Omega Star still doesn’t support fucking ISO timestamps like they said they would over a month ago. And so Galactus wont be able to find the Birthday Boy provider which mean WingMan wont know how to talk to anybody…


I'm fucking blocked and I'll never find love.


You’ll never know my pain - or Galctus’s pain - you stupid, pathetic little product manager!


> Imagine I want to fetch user data by id from Galactus, the user data service.

Reference: https://www.youtube.com/watch?v=y8OnoxKotPQ


The ideal way this works is the Galactus team publishes a library that consumes their API. They build their service, you then use their library to access it.

But the key here is that their own library still goes through their API, not directly to their database. So that if you don't want to use their library you don't have to.


The thing to watch out for with the client is that the dependencies of the client aren't shared with the server dependencies. Client libraries should be minimal and optional when working with an API.

The failing pattern we've seen at my company is a client library, a server library, and a shared "model" library.

The issue that has happened time and time again is the model library ends up including classes and dependencies that either the server or client needs. This in turn leads to a frustrating experience of dependency wrangling when someone decides to throw in lombok, or kafka, or JavaFx into the model library for that 1 sweat `Pair` class the server is using internally. Or worse, they depend on a different model library because they need that one sweat enum. (Which in turn depends on other model classes, or server classes, or whatever).

If you do a client library, it should depend on practically nothing. The transport library (http client), authentication, and the data marshalling library. Nothing more. It should also be viewed as an example of how to work with your API and not the only way to work with the API. (Don't reinvent graphql... please!)


The way to do this wrong (by forcing the use of the "blessed" client or common platform dependencies, instead of leaving it open to the consumer as you suggest) is discussed in this talk: https://www.microservices.com/talks/dont-build-a-distributed...


Ah, that's exactly what I've been doing, and unconsciously assumed here. I can attest it works very well, hence my confusion.


the whole point of an API is to define an interface in terms of a runtime protocol, invariant to any specific language, code, library, SDK, etc.

in fact the galactus team should actively _not_ publish a library for their API

doing so introduces a source-level coupling between consumers and producer, which the API was (presumably) created specifically to avoid


It's basically a reference implementation. Their library should not do any work other than marshal data. Kind of like how Amazon produces a command line tool to interact with their API, but you don't have to use it. It's just a lot easier.


the problem is that when an producer publishes a library like this, it's very tempting for them to assume that all access will go thru that library, and maybe even that all access will go thru the most recent version of that library

it takes an enormous amount of discipline to ensure that this kind of software is treated as "a reference implementation" rather than the canonical path

if an internal service API is too difficult to use without an API library, then this is a pretty good signal that something is wrong and needs to be fixed, imo


> Should both my service and Galactus have full, individual copies of this data structure?

Yes. I recommend you read "Domain-driven design" by Eric Evans, specially the part explaining the concept of a bounded context.

> Alternatively, if Galactus has a published schema that I refer to, isn't that a shared data structure?

No, that's the interface.

> not only we deduplicate code

You don't. You just needlessly add a constraint for no reason at all, and in the process make your life a mess.


> Shared data structures? Common library? Services have an API. If you are sharing data structures and a common "core" library across services, yes, you are officially doing it wrong

Quite correct. Each service should have it's own duplicated copy of `someUtilityFunction()`.

Then instead of applying 100 fixes to it you can have 100 different behaviours for some pice of business logic.

/s


Business logic belongs in services, not libraries. If you're copying nontrivial business logic into different services, that's a big red flag; you need to pick a service to be the single source of truth for that business logic and have other services call into it.

edit: Libraries don't solve this, to be clear. Let's say you have 100 services that reference `someUtilityFunction()` in SharedLib==1.0. Oops, there's a bug! OK, let's publish SharedLib==1.1. Great... now you still need to go and update, test, and deploy 100 different services, because they're not going to pick up the corrected business logic otherwise. And of course, you might have the case where someone got lazy and didn't upgrade and they're still using SharedLib==0.0.1-alpha, and there's a bunch of breaking library changes that happened between that version and 1.0 that were never implemented...


> Libraries don't solve this, to be clear. Let's say you have 100 services that reference `someUtilityFunction()` in SharedLib==1.0. Oops, there's a bug! OK, let's publish SharedLib==1.1. Great... now you still need to go and update, test, and deploy 100 different services, because they're not going to pick up the corrected business logic otherwise. And of course, you might have the case where someone got lazy and didn't upgrade and they're still using SharedLib==0.0.1-alpha, and there's a bunch of breaking library changes that happened between that version and 1.0 that were never implemented...

Yeah, but that relies on a lot of "ifs" when a bugfix is done.

If the services don't have tests before deployment, you'll have problems.

If a rarely-updated service doesn't get the fix, you have problems.

If someone got lazy and didn't upgrade (how is that even possible? they don't have to do anything to get the new thing other than rebuild, redeploy) you'll have problems.

OTOH, there's no "ifs" about getting a bugfix when they're all using the same copy of `someFunction()`.

When they're all using the same copy of `someFunction()`, the bugfix will go in with the next build of their service, without them even knowing.

After all, you don't have each team keep their own copy of the kernel, do you?


If you require all services to constantly be updated with the latest versions of libraries, you've built a distributed monolith, which is generally acknowledged as being far worse than both independent services and an actual monolith.


So what's the plan when some services think that's a bug and others think it's a feature? You fix some and break others.

So you can make a new function. Maybe one calls the other. They do nearly the same thing but not quite. Some dependencies move to the new one, some don't.

Or you version the library. Shortly after, people want mixes of behaviour from different libraries, and someone else wants two dependencies that each want different versions of the library.

I don't have a good recommendation here but silently changing the behaviour of libraries is not necessarily a better play than not doing that.


> Shared data structures? Common library? Services have an API. If you are sharing data structures and a common "core" library across services, yes, you are officially doing it wrong. From the architecture diagram it also looks like there's only one MongoDB instance, which, if you have multiple services talking to the same database[0], is also a major sign that you're doing it wrong.

Yes, and that API has types, e.g. protocol buffers. And also, sometimes you just want to change the API, and you want to make that as easy as possible.


APIs defined with IDLs like protobufs make sense sometimes, but not always

using an IDL means that consumers of your API need to depend on an IDL definition at the source level, which introduces coupling that is not always useful

for example, talking to the stripe API doesn't require you to import a stripe.proto


> If you are sharing data structures and a common "core" library across services, yes, you are officially doing it wrong.

Let me reply in the spirit of your comment:

No, you are officially wrong. [0]

[0] Says me.


What is wrong with sharing a single database between services?

Seems like a pretty commonly accepted pattern:

https://docs.aws.amazon.com/prescriptive-guidance/latest/mod...


In addition to what other people said, the arguments under You should consider using this pattern if are all basically "you currently have a shared database and don't feel like splitting it apart."

The only bullet on the list which isn't related to "it currently works this way and you can't change it" is the one that says

> You want to maintain and operate only one database.

Which is frankly a terrible bullet point because the next question is "Why would someone want to do that?"

The whole list is basically just "if you have constraints that make a better design impossible then I guess you can do this"


> Why would someone want to do that?

Because for example you have BI reporting queries that need to join across schemas, and it's the most cost effective way to do that. I've seen that kind of thing both through ETL and going through normal service APIs, and both are way more expensive (in developer time and compute resources) than e.g. doing a normal db query against the slave.

A shared database will also make me effective use of resources than if you try to manually assign them to split databases.

If you need cross service transactions, it's easier to have the db do that than to implement 2pc.

The real question is why split it? If you can fit on one machine, there's not much reason not to. Like the previous poster said, you can use separate schemas or table permissions with service accounts to keep logical separation.


The reason to split it is that as soon as you give your BI dept. free reign to query the tables for users, user_widgets, widgets, widget_metrics, widget_metrics_metadata, and half a dozen other things with one query, they will do it and lock up your production database by running some atrocious 12 hour query composed completely of select * from window_functions that you’ll be back at square one in no time.

The other (more serious) problem is that if multiple services can edit the same data independently, eventually they will, and it will cause data loss or degradation that will be catastrophic. And, although you CAN isolate things with permissions, the chances of this reliably getting done without a lot of time put into DBA (a role which simply doesn’t exist at startups), is very low in my experience.


That link seems to be making a strong case against using the same database for multiple services, without coming right out and saying it.


It might be because the target customer of AWS is a large, sprawling IT department, where having separate databases could be a good idea.


In the first paragraph, that doc echoes exactly what I explain.

> You need to carefully assess the application architecture before adopting this pattern, and make sure that you avoid hot tables (single tables that are shared among multiple microservices).

Using the same database table with multiple services (micro or not) is an absolute disaster.


Having service A write to a table and service B read from that table in the same database is a perfectly suitable architecture for many applications.


At that point it's just an argument about "what is a service"? But people typically find that using the database as an API ends up with the services so coupled that they might as well be one even if they technically run as different binaries.


Exactly. We’re talking about software architecture for startups here. Best to keep it simple.


Isn't the database just another service?

Are other services also restricted in the same way?


It's nuanced, but it's not because there is no single owner.

If microservices share a database, it's not clear who migrates the schema and coordinating it is nearly impossible because another piece of code in another service may be reading the old schema and break.

Sharing the database means you have a distributed monolith - you lose all independent deployability. It is the worst of both worlds.

The exception may be if you share a database instance, but each service has its own schema and only has authorization to read and write to its own schema.

That doesn't make the single database a service - it would make it shared infrastructure, however. This may bring its own maintainenance challenges but it doesn't violate any principles.


Depends on the use case but all physical resources are finite in some limit so you may want to distribute that load so the different domain processes can scale appropriately. When/where/how is dependent on the application you’re building.


About the MongoDB instance: It's indeed one physical instance but each service owns its own database. Thank you, I will make that more clear.


It's a replicaset at least, correct? Hopefully not standalone


> Shared data structures? Common library? Services have an API. If you are sharing data structures and a common "core" library across services, yes, you are officially doing it wrong.

I agree w/ your other take re: the database but don't understand this one. Sharing data structures between client and server, particularly request/response structures, is hugely useful, certainly not an antipattern.

And similarly the idea that if you share a common lib across service boundaries then that's "wrong" falls apart at even the most cursory glance.

Service A and B both depend on Service C. So there's a shared client library/SDK that Service A and B use. As just a trivial example.


if you share data structures in this way you're introducing source-level coupling between services

but one of the major reasons to break up logic into separate services is to eliminate source-level coupling

a consumer should not need to import service X's SDK in order to use service X, if that is the case then it introduces all kinds of bad situations like "well if i change my service X's behavior in some backwards-incompatible way, it's fine, as long as i update the SDK appropriately" -- wrong

etc. etc.


> but I'm assuming that wasn't done here

I assumed it was done here. Why wouldn't it have been? There's no evidence of that.

> Shared data structures? Common library? Services have an API

I assumed that was what they meant: common data structures being whatever the various APIs emitted and received. You don't need more than that if you're two people shipping a full product. People aren't buying your internal architecture, and it sounded as though they built it so the code could be separated out in future if necessary.


if you're two people then there is absolutely no reason to introduce a network API boundary between functional components

if you do have an API boundary then it's important to understand that interactions against that boundary are resolved at runtime -- in contrast to data structures which are resolved at compile time

if you break something out to a service that's only accessible over a network boundary then there's not much sense in requiring access to go thru some SDK library, right?


Do you even gRPC/avro/thrift/IDL bro?

I know those aren’t the data structures you probably meant, but those are data structures that are commonly shared.

That’s different than a model though, which I think is really what OP is talking about.

There’s no problem in sharing a library but it is a model which works best with a monolith or a monorepo (and obviously one language). You definitely DO NOT want multiple versions of your library kicking around.


The request/response objects for the API are a shared data structure, whether you decide to share them via a library or not, no?


Or even through SQL views, if applicable.


I was expecting something like this article https://alexkrupp.typepad.com/sensemaking/2021/06/django-for... (discussed at the time https://news.ycombinator.com/item?id=27605052) but was greeted with something far more abstract and less helpful. Look at the difference in quality of their table of contents!

    - What does our app do?
    - Microservices usually don't work well for startups
    - Move fast and outsource things
    - Consider building reusable things
    - Be pragmatic
    - Boundaries along sync/async communication patterns
    - How we did it and how we would do it next time
    - About flexibility

    - Predictability
        - Rule #1: Every endpoint should tell a story
        - Rule #2: Keep business logic in services
        - Rule #3: Make services the locus of reusability
        - Rule #4: Always sanitize user input, sometimes save raw input, always escape output
        - Rule #5: Don't split files by default & never split your URLs file
    - Readability
        - Rule #6: Each variable's type or kind should be obvious from its name
        - Rule #7: Assign unique names to files, classes, and functions
        - Rule #8: Avoid *args and **kwargs in user code
        - Rule #9: Use functions, not classes
        - Rule #10: There are exactly 4 types of errors
    - Simplicity
        - Rule #11: URL parameters are a scam
        - Rule #12: Write tests. Not too many. Mostly integration.
        - Rule #13: Treat unit tests as a specialist tool
        - Rule #14: Use serializers responsibly, or not at all
        - Rule #15: Write admin functionality as API endpoints
    - Upgradability
        - Rule #16: Your app lives until your dependencies die
        - Rule #17: Keep logic out of the front end
        - Rule #18: Don't break core dependencies
    - Why make coding easier?
        - Velocity
        - Optionality
        - Security
        - Diversity


This is a really cool and very opinionated post that sums things up to a coherent recipe that will work very well. There are different ways to achieve this (also with Django/DRF), but the purpose of this one is to make it specific to Django's implementation. I think the OP meant more general advice on architecture altogether.


I would add:

- Don't build your CI/CD pipelines; use Vercel or something similar

- Don't use message queues, crons and anything that can by design cause race conditions until absolutely unavoidable

- Use a framework if you and your team know one well (e.g. Django Rest Framework), otherwise use whatever you know inside out even if it doesn't have all the features you need - or you'll end up slowing down painfully at every other complex bug

- Using TypeScript for both backend, frontend and the domain simplifies a bunch of things (configuration, types)

- Unless you're building ML pipelines in parallel to a web app (or some similar clear technological separations exist, e.g. web interface and an UE5 game), you should use one programming language. All your developers should know the app A-Z in the early stages.

- Your product is unlikely to succeed overnight but likely to die because of overengineering. You should worry about scaling, but only a little bit.

- Forget using JWTs if you're in some sort of regulated field. Any sensible white/greybox certification for insurance (and that's often a requirement for running a business) will require you to have "log out all devices" option.


Agree with most of this, but not sure about a few points here:

> Don't build your CI/CD pipelines; use Vercel or something similar

Why wouldn't you build your own CI/CD pipelines early? It's kind of trivially easy to do early on, and in my experience won't really change until/unless you want it to.

> Don't use message queues, crons and anything that can by design cause race conditions until absolutely unavoidable

Message queues are not universally the cause of race conditions, you just have to accept the eventually-correct nature of the resulting system. If you know how to handle that, you're fine.

> Forget using JWTs if you're in some sort of regulated field. Any sensible white/greybox certification for insurance (and that's often a requirement for running a business) will require you to have "log out all devices" option.

JWTs didn't stop us from getting SOC Type I and II certified, or prevent us from getting insurance, not sure what you mean by this. I guess we're not in a "regulated field" but if you aren't, JWTs are great.


Re: CI/CD - if you want to have:

- Multiple environments - History of deployments accessible at hand - Management of environmental variables/other configs etc.

You're going to spend time on something that's available off-shelf. I don't believe it's either trivial or less error-prone than using any existing service.

Re: message queues: Yes, but you may also just not use them and then not have to consider that complexity. Again, if necessary, use, if not, don't. Write boring, synchronous code if you can. Everything in the world is "fine if you know how to use it", but you can dedicate your energy elsewhere.


> JWTs didn't stop us from getting SOC Type I and II certified, or prevent us from getting insurance, not sure what you mean by this.

What he probably means is that JWT can't be invalidated on its own like you can with a session. But of course it can be done in a similar way, and some products support it. You just lose some of the benefits of JWT. (Let alone the ability to negotiate the logout time span).


You can start storing JWTs in the database to regain control over their validity, but then you fundamentally have sessions, just done in an unnecessarily complex way. It goes back to some Xbox vs. Playstation/Android vs. iOS type of argument here, but JWTs _as a stateless token_ (as most people would use them) are just not suitable for the job if you're in a business that requires good security practices.

I wrote that point specifically because rewrite of authentication in the middle of product building just to get certified/approved by pentesters is a horrible experience. Using auth0 or Okta or whatever software that handles it for you is probably advised as the person above wrote.


Yeah, I just think folks overestimate the value of a "log out of all devices" button. Short TTLs on session tokens seem fine enough for the early months, at least, and then yeah once you can get that first round of funding (or revenue to pay for the lights), getting something like auth0 might make more sense.


I agree with you, but that's one of the things that will have to be rewritten if you're in a regulated business - unless you go for some hybrid "JWT but stored in the database" which is, basically, not a JWT.


A JWT with a sid claim that RPs use to check session validity is 100% still a JWT with bearer authz claims.


On a project I worked on, we implemented a "log out all devices" option and similar security features by building a simple deadline system on top of using JWTs.

Every user's data contained among other things a deadline timestamp, which was (re)set every time a security relevant data point was changed (password, 2FA, log out all devices, etc...)

When authenticating a user, services would check for both the JWTs validity (iat, nbf, exp), as well as the cached deadline (iat >= deadline) using a shared auth library. I'd be happy to learn more about the downsides but for as long as this was in place, it has worked really well.


The whole point of JWTs is to not check some other value in the backend. Once you're doing that you've lost the major value of JWTs.

So yeah, once you start doing DB calls for every. single. request. you're going to have a lot more options, but JWTs are meant to avoid that.

My understanding is the "right" way to do this is to maintain a CRL.


Yes, it is good if you can avoid checks, and in most cases you can. But in practice it doesn't matter very much: - Scaling is rarely the problem. I never experienced it myself. The same goes for the session. - The complexity can be hidden behind a library. Iirc there is even a specification for this. - You have other advantages because you usually use JWT with OAuth2.x/OIDC in the SSO context. - You benefit from standardization.

CRL is probably not going to help you. At least I have no idea. But jti's blacklist will. Don't forget - you still have to maintain and distribute it, which probably won't lead to a better implementation.


scaling auth infra is basically the entire motivation for things like JWTs

if you don't care about every request making an auth check, that's great! then just use basic sessions or whatever. there's no reason to use anything more complex


No, you have lost ONE of the major values of JWTs. You still enjoy a lot of the others (standardized data structure decodable by both sides, library support, easy custom payload).

Also, DB calls are not necessary, when it comes to scaling, in-memory caching is a very valid option, so a DB read is only necessary occasionally.

Yes, you lose some of the benefits of JWT but that's inherently expected when using them for something that goes beyond their original capability.


the only unique advantage of JWT-type auth is reducing load on auth backends

everything else you mention is equally available in other, simpler, auth schemes


Yes, I didn't disagree with that. All I'm saying if you're already using JWTs, you're not locked out completely from implementing these features. It just comes with a drawback that might be worth it, depending on your customerbase and team size.


And what parent post described is functionally equivalent to CRL it's a list of user (id, earliestValidTokenTimestamp) which for case of invalidating all sessions on demand serves same purpose. The only major problem is timely sync of those lists in all clients. Same problem as with CRLs, but probably lighter. And of course with other set of limitations.


What is CRL supposed to stand for? Google only gave me "Certificate revocation list" which doesn't sound right.


Yep that's it; JWTs are signed, that's how they work.


Don't forget that a single certificate is used to sign multiple JWTs. Revoking a certificate (let alone communicating it) will revoke all signed JWTs, not just the specific user's.


> Don't build your CI/CD pipelines; use Vercel or something similar

I'd give exactly the opposite advice. Being able to properly debug your CI/CD code, or even run locally, has been a godsend when you're starting up.


I haven't met vercel. Hudson, buildbot, in house thing built from scratch for some reason - all bad associations here.

A CI / deployment pipeline that you can't run locally is absolutely a disaster. You end up building different code to prod, in different fashion, and generally hoping that bugfixes to the local one will be meaningful in the CI one.

Buildbot calling a top level makefile that does everything, fine. Some java thing that is configured through a web gui and thus can never be restarted if it ever disappears, not fine.

I love the idea of CI and loathe every instance I have ever used.


I agree with you, however modern services (like Vercel) solve those problems pretty well. They provide you with the CI you'd otherwise write by yourself.


There should be no CI/CD code - build the product instead. Fundamentally, outsource devops to some service. Use Firebase, Vercel, etc. Building proper multi-environment CI/CD is where you're doing devops with the belief you're going to be "better at it" with "more control", eventually writing code nobody asked you for.


CI/CD ended up being probably the easiest and most beneficial part of the process for me. Really doesn't take long to get a good system going.


There are few principles that hold, I've found, when it comes to designing software at a startup.

Most companies of this size might not be around in a few years. For those first few years it's a good practice to refrain from thinking about the company as a software company (the notable exception being a startup that is literally building software products like libraries, databases, compilers, etc... then disregard this advice). It's a business trying to find product-market fit that happens to use software.

Under this lens it doesn't make sense to start writing software if you can avoid doing so: the company might not be around in the next year or two if it can't land those first customers and start growing. The less you write, the less you have to maintain, the more you can focus on what matters: finding that product-market fit.

This means optimize for doing the least amount of work and writing the least amount of software possible. Use frameworks (code you didn't write), libraries (code you didn't write), and services (code + infrastructure you don't maintain) as much as possible. TFA points out several of these: use an auth service, use a PaaS, etc.

The majority of your "software" often looks and feels like glorified configuration. You're pulling together frameworks and libraries on top of services that do most of the work for you. Good. You've kept your capital costs down and didn't spend time building something you're going to throw out at a moments notice!

It's when you gain traction and your services bills start ballooning and your customers start to demand transparency, reliability, that you need to start thinking about hosting your own software and building an architecture that suits your applications' usage patterns. This generally happens after the business has found a niche and is now focusing on adding customers rather than finding them.


I agree with most of what you wrote but I strongly disagree with the fact that customers starting to demand reliability is a scenario that should happen before you start thinking about hosting your own software.

Treating stability and reliability as an afterthought is one of the main things that's wrong with the tech industry at the moment, I would argue. It's a competitive market. Move fast and break things but at least have the requirement to be as reliable as possible from the get to.

Alas, this is written from a user and tech enthusiasts standpoint. I know it's a futile argument since reliability is not a requirement for growing a project far enough to get a worthwhile exit, so we will continue to "enjoy" unstable products.


Sorry but couldn’t disagree more strongly about reliability and stability. Those are afterthoughts because a buggy, unreliable service that solves a critical pain point for a customer is fixable, but a perfectly operational service that doesn’t do much for people isn’t.

Startups trying to find a market need to cut every corner imaginable to minimize any work that isn’t related to finding the customer, as this is the existential crisis.


There is a lot of wiggle room between a buggy, unreliable service and a perfectly operational service.

And yes, as I said, from a business point of view, I absolutely agree with what you say. From a user's perspective, I don't. And I hope more people will start to reject adopting buggy, unreliable projects in the future.


There’s great precedent for low reliability products becoming popular. ChatGPT and Twitter both became widely adopted despite frequent downtime.

This extends well beyond software. The Ford Model T other early motor vehicles weren’t exactly safe and reliable but still solved a real problem for millions of customers.


I understand your objection. I also value reliability. I would say I'm a bit more conservative about joining startups for this reason (and the fact that I have to be prepared to be out of a job in a year or two if things don't work out). I would prefer not to spend my time frustrated and compromising on my values unless I'm willing to take the risk for the payout.

But that's the reality of startups: spending too much time on reliability when it's not needed is wasted effort. As a startup you can at least pull the "choose boring tech" lever and rely on the frameworks/library's/services to care about reliability for you.


I've been tinkering with a project and one thing that stands out to me is cost. For example, I had a super simple architecture on AWS with 2 load balancers in front of 2 ECS services running on Fargate. Using the smallest possible instances that simple system was costing ~$3.50/day or ~$100/month. That included basic things like the Route53 entries, certificates, the load balancers, the ECS containers, AWS code pipelines running a couple of times a day, S3 buckets for logs. It did NOT include databases, SQS queues, Redis/cache services, monitoring. Just one web server and one api server.

I'm not a novice with AWS but it is easy to get comfortable spinning up idealized service architectures without calculating the costs. Even at meagerly funded startups it is no big deal to spend $1000/month on infra costs. But I am not really interested in forking out that kind of money for my weekend hobby projects.

It is just not financially reasonable for me to have multiple micro-services alongside multiple databases, caches, CDNs, queues, etc. Even when I consider having my own EC2 cluster backing my ECS services (probably a cost savings but I haven't investigated) I feel confident the cost will be significantly higher than just building a monolith and obviating the need for multiple load balancers, queues, multiple databases, etc.


You are probably correct. Did you see the article on how Amazon Prime moved away from Microservices on AWS due to cost.

Thats a little bit amusing.

https://www.primevideotech.com/video-streaming/scaling-up-th...


Why couldn't a suite of microservices and supporting datastores be deployed on a single instance (via docker compose, et al.)?


I suppose it could. I'm spoiled in that the places I have worked on micro-service architectures had either massive kubernetes clusters with 100s of services or were using ECS backed by EC2. I have an intermediate level understanding of those deployments, how to make them robust and scalable, etc.

After seeing the bill I even considered looking into a single instance deploy of multiple containers. Outside of running my own kubernetes cluster I was aware of some blog posts I had read on using docker compose for production. I have no production experience doing this so I don't know how to make it robust, secure, scalable, etc. I could learn.

But then I remembered a tip on optimization I learned while I was working in the video games industry. I was working with a guy who was focused on performance optimization. He told me that a common mistake junior engineers made when given optimization problems was that they would try to make the code faster as their first step. One way that is a mistake is to assume something is taking a lot of time without profiling first. But another mistake is to optimize a process that can be eliminated. He quipped: The fastest thing you can do is nothing at all.

So rather than optimize container deployments ... why not eliminate them? That honestly seems faster to me than trying to productionize a docker compose deployment just so I can say I am using micro-services. I mean, as this post and many others have mentioned - micro-services add complexity already. It seems more optimal to me to remove the micro-service complexity and save myself the hassle of fighting with docker compose or whatever other single-instance solution might exist.

And if I get solid ARR and I feel fine spending $1000/month on infrastructure, I'll probably just use a managed container hosting solution. I've split up a lot of apps into micro-services and in general it is very easy to do when needed.


> Splitting your codebase into multiple independent services comes with a lot of disadvantages: Each service has to be maintained and deployed and increases the boilerplate needed for shared data structures

I think the author got "separate and isolated services" with "separate and isolated source code", they are not the same


Stopped reading at chat server. Some FAANGs use IRC when all fails.

There is no reason a startup should be building a chat server in 2023. Best reason given in the article was 'integrations'. Use an IRC bot to post messages to the channel, don't waste time writing yet another insecure chat client and server.


Isn't the chat server they talk about for somethin in-app ? Like if your game offers player-to-player chat, or something similar ? I don't think they are talking about tools for software engineers themselves.


But then the length of the article wouldn’t seem substantial and you wouldn’t bother reading it at all because you’d assume it wasn’t worth your time and the author wouldn’t get to tell you about whatever random app they’re building and shilling for. Key insight: the article isn’t about the tech, the article is about them trying to drive traffic to their app or whatever.


It looks like monoliths are officially back in fashion.


When I researched it(around 2017), most microservices sucess stories began with "our monolith couldn't scale beyond our 200 devs, 300000 clients and 150 millions of revenue per year, so we had to do something".

So, how did you manage to go from nothing to 200 devs/300000 clients/150M of revenue? Exactly, with a monolith.


Getting to market so you can get the next found of funding or start making a profit is always in fashion.

Whether you did it perfectly right once you get there is the correct "technical debt" problem to have as a burden.


"We're very worried about tech debt. If we don't get this perfect now, we'll forever be flailing around trying to make it work even as other teams add worse baggage to the wrong pattern you laid down. We'll never have time to go back and fix it."

-- No one ever.


This made me think of Twitter's "fail whale". And then I remembered that mess of spaghetti PHP sold for 44 billion dollars and now I'm rethinking my life.


I dont think they were ever “out of fashion”, it’s just that the people pushing microservices were extremely vocal about it.


Indeed. I never understood how they got synonymous to "big ball of mud". When I think of a monolith, I think of something heavy with a very simple structure like the Washington Monument.


The older I get, the more I understand my seniors.


Welcome to seniorhood. Also get out of my lawn.


I do worry the pendulum will swing too far the other way. Having a separate Lambda function for every web request "because microservices" sucks, but having a single codebase worked on by 200+ engineers also comes with a lot of problems.


Exactly, I do appreciate micro services but not for the technical aspect.

They shine in large cove base where at least a handful of teams are responsible for some services and not other.

So you don’t have to get in there backlog too much.


Have a folder/subproject per team?


This helps, but doesn't fully solve the problem. You still end up using the same runtime so it's not like the teams are fully independent. Classic problem is you have a /Team1 and /Team2 folder (or library that gets imported, doesn't matter). Team1 needs to upgrade to pandas 2. They try, but wait, Team2's code depends on Tensorflow which depends on pandas 1.x. You are now in dependency hell[0] and need to do something unpleasant, whether that's forcibly updating everything in that pandas/TF/etc dependency chain, which can be a long and painful project in large codebases, if it gets done at all. Or doing JAR shading or fork-and-rename to be able to support both pandas==1.x and pandas=2 in the same runtime, which may not even work in pandas' case. Or the most common thing, which is just give up and stick with comically outdated libraries and runtimes for way too long. Hello, Java 6/7.

[0] https://en.wikipedia.org/wiki/Dependency_hell


No suggestion will fully solve all the problems. Are you looking for silver bullets in HN comments? :)

I would rather deal with folder per team in same repo than services spread across multiple repos that still depend on common libraries/utilities/databases, which has happened at most jobs I've had, and it's a mess.


lol

I work on a single codebase with 2000+ engineers


I wish blogger driven development would go out of fashion, but I guess it never will. People need to be engineers. Stop trying to figure out how to solve your problem by copying what others have done to solve theirs.


This. It's hard to manage a dozen services with only 1-2 devs - especially considering you're solving a problem you don't have yet, and by the time you do, you'll have more devs to handle/justify the complexity.


Lets start with the assumption that what you are doing at a hypothetical startup is already pereto efficient. That means that if you attempt to improve your architecture in one way it will mean a trade-off against something else. The most common sort of trade-off like this is how long it takes to make some change to the product vs how scaleable the architecture is. Another might be the total effort to make a change vs the ability for multiple large teams to work on the product in parallel.

If you are a startup and you have not had at least one phase of explosive growth which proves your market fit exists and you have found it, then you should always optimize for a solution that is just 'good enough' in every other way, but is the fastest possible to build and ship changes. You will die from not finding market fit, and if you find it you can turn your attention to scaling the product, or whatever else you have to solve. Don't 'design your data architecture' until you have to.

And for the love of god, never use micro services at a company that has less than 500 software engineers. Microservices are *BAD* for everything except allowing lots of people to work together. It is the most complicated, least efficient possible way to build software except where you have no other choice, which you almost always do.


There is a lot of information out there on how to build software for enterprise systems. If you are designing a system for a startup, a lot of these patterns and techniques simply don't work well. High levels of uncertainty, the need for maximum flexibility and a quick pace as well as serious restrictions in (wo-)manpower are challenges that are often unique to startups. On the other hand, startups can make compromises that enterprises can't make.

Here is how we did it.


This still feels super complicated... idk I just used Sveltekit + Airtable and Notion as CMS (cached with Cloudflare). Deployed on Vercel's free tier. Cloudflare workers for certain functions (like grabbing stuff from Notion), and now FastAPI / Fly.io for langchain stuff. And that's it. Of course, we're not anywhere high # of users or profitable, so this might all just fall apart tomorrow.


What’re you building?


Really, the question here is "do you build a general-purpose solution or one customized to your use case?"

I'm not sure where things are these days on the design spectrum, but I think there's still a bias towards "reusable code" and "abstraction."

But why? If it's code for your MVP, the future is uncertain, and time is more important. Are you really going to reuse this? You're probably going to rewrite it in chunks. Design it for that.

One thing to watch out for in a "shared" architecture is update hell...in the sense that you have so many dependencies that you're basically forced to update everything all at once because of the way you've structured things. That's a fucking nightmare, especially if the various parts have different update latencies. Your app needs approval from the app store, so if it shares something with the back end you literally need to trigger your backend update when app approval occurs. And you have to ensure all your clients update at the same, time, which is unlikely.


> Auth services, obviously

Isn't it easier to do simple cookie based session token authentication simpler for the beginning?


From my point of view, yes, it does. And in most cases there is nothing wrong with it. Just not hyped. Probably, the only real reason to use something like OAuth2.0/OIDC is single sign-on. (BTW, probably all SSO providers use cookies and sessions to maintain their own user state).


An external chat service can be a pain in the ass just like a microservice. Depending if you want a lot of business rules integrated in that chat, you will have to keep data synced between your server and the chat's server. This can become a nightmare if you dont get it right.


> An external chat service can be a pain in the ass just like a microservice.

Of course, because it’s just someone else’s microservice. Socially, it’s nice to have someone else take care of it. But technically, it’s more opaque than even your own microservices.

> Depending if you want a lot of business rules integrated in that chat, you will have to keep data synced between your server and the chat's server.

Right, because both the chat and your core service share users. So the boundary criteria for microservices, ie being standalone, is not there with chat, or many other types of services.

From a technical perspective, these services would be better suited as libraries that require some integration work. However, that’s not conducive to usage based pricing, which is the snake within the microservice oil bottle.


What are people using for auth these days?


Keycloak is good for most use cases. I heard that some teams failed with Cognito (AWS).

But for the current product I built my own server. I was tired of fighting some Keycloak specifics like - No "stay logged in forever if you use the (web) application from time to time". - Bugs around caching (especially in integration tests). - No zero downtime deployment (should be fixed soon, too late for me) - Weaknesses around configuration. I used to use keycloak-config-cli, which is good enough for most use cases.

And some missing functionality that needs to be implemented very often, like "Accept ToS before profile is created". Having said that, Keycloak is a very good product and I haven't found a better free and open source product. But sometimes it is not the right choice.

By the way, I personally don't plan to outsource user data and credentials to a 3rd party service. Although Auth0 is very sexy.


I implemented it myself in Go using github.com/golang/oauth2 and github.com/golang-jwt/jwt.


The article mentions "Auth services, obviously". I'm curious, because to me it seems like people have mostly used open source things they include in their apps.


PocketBase, Supabase, Firebase… all your base are belong to auth.


I feel like the quickest I've ever moved and most innovative stuff I've ever done comes from doing it the "hacky" way. At our latest setup we're just relying on Framer (framer.com) to build all of our UI components using a free drawing canvas.

Then for more customization we have a setup where devs can just spin up a CodeSandbox link in the browser that imports the Framer component to add more interactivity (like custom hooks etc). Everything is deployed via CodeSandbox.

Devs/Designers/Anybody else just use CodeSandbox links, no editors. People responsible for content use these links to edit .json files inside CodeSandbox repository to quickly add/edit things. New images can be uploaded via drag & drop in and when saving the file it becomes part of the git repo. No CMS needed.

We have a bunch of lock-ins with this setup (like if Framer/CodeSandbox goes down we're f**ed) but I feel like going with anything else there will always be lock-in (AWS, Vercel, some open-source project which is now dead and nobody can maintain anymore).

It's amazingly fast to work this way & we love it.


Interested in learning more about the codesandbox workflow. We use Webflow for the flexibility and abilities it gives a small team with some non-engineers, but I want to move off of it if I can find a solution that maintains some of the benefits while addressing some of the drawbacks. Is there any content you can point me to that inspired this?


What are the main drawbacks you see with Webflow?


Simple answer: don't build anything yourself if you can find something that fits 80% of your use case.


With one caveat: don't force a round peg into a square hole. If it works for 70% your use case but the last 30% is mission critical to your whole business and it doesn't really work for that, even with minor mods, be willing to find better ways.


Outsource peripheral technologies, build core technologies.


And this is why I use Oracle Apex lol


How is you app going (retention, user growth)? Is it only for private happenings or do you have public events in it?


A better title would be "How to f#ck up architecture in your startup". Sharing data models. Sharing database. Good lord! This is NOT the way. Your service has an API. You're service should only be called using that. It's client models are json or protobuf or <insert meta>. You don't need to share anything other than maybe a CI environment and a JWT token.


Advice versa. Architecture designs, not the other way around.


Thanks. This was exactly the app I wanted to build :)


Conway's Law




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: