
Ask HN: What are some architectural decisions that improved your codebase? - happycoder97
Dear senior developers on HN,
What are some examples of design choices that helped you reduce the effort needed to change your code according to change in requirements?
What are some of the architectural choices you made that made your codebase easier to work with?
======
ncmncm
Eliminate threads, queues, locks, buffer allocate & free, copying, system
calls, synchronous logging, file ops, dynamic memory allocation.

Replace with huge-page mapped ring buffers, independent processes, kernel-
bypass set-and-forget, buffer lap checks, file-mapped self-describing binary-
formatted stats, direct-mode disk block writes, caller-provided memory.

~~~
rramadass
You can't just tease us like that! I demand some details on everyone of the
techniques in ;-)

>Replace with huge-page mapped ring buffers, independent processes, kernel-
bypass set-and-forget, buffer lap checks, file-mapped self-describing binary-
formatted stats, direct-mode disk block writes, caller-provided memory

Please elaborate.

~~~
ncmncm
Well, OK. All this is about high-throughput, low-latency systems.

The principle is decouple, decouple, decouple. Memory isn't just memory, it's
paged and mapped, and the mappings are in a small cache called the TLB, one
for each core. Each "hugetlb" page, 2MB or 1GB on x86, takes just one such
cache entry, so anything big, like buffers, should live in hugepages.

A ring buffer is a kind of queue with just a head, and one writer. Each new
item goes at the next place in the buffer, round-robin. A head pointer -- if
it's in shared memory, an index -- gets updated "atomically" to point to the
newest item. Downstream readers poll for updates to the head. New stuff
overwrites the oldest stuff, so downstream readers can look until it gets
overwritten, and can often avoid copying. They don't need to lock anything,
but need to check that the head hasn't swept in and and overwritten what they
were looking at; that is called being lapped. It is their responsibility to
keep up, and prevent this.

Because there is never any question where the next entry goes, hardware
devices understand ring buffers, and can be set to write to them whenever
there is data. Typically a proprietary library talks to a proprietary driver
to set this up, and then the hardware device runs free with no more
interaction. (io_uring, AF_XDP, libexanic, ef_vi, DPDK, PF_RING, netmap, etc.)

Usually the hardware ring buffer is pretty small, a few MB, so for high-rate
flows there might be cores dedicated to copying from it to one or more much,
much bigger ring buffers in shared, mapped memory. Typically, multiple
downstream readers watch for interesting traffic to show up on such a ring,
splitting the work out to multiple cores.

Threads famously interfere with one another, mainly when competing for locks;
but also, whenever they fool with the memory map, other threads may experience
TLB cache stalls. Separate programs are better isolated, and can be further
isolated by running on a dedicated core ("isolcpu", "NOHZ", and "taskset")
that is protected against the OS sticking other threads on it, or vectoring
interrupts to it. In extreme cases a core may offload its own RCU retirements,
or even not run any kernel code.

A unikernel may run on such a core, running a single program, so what it
thinks are system calls just call a static library. There is a lot of work
going on on variations on this theme -- exokernels, parakernels, etc.

Instead of getting the file system and buffer cache all mixed up in your
program, you can append to files with O_DIRECT writes, or store to mapped
memory and let the kernel expose it to other processes, and spool to disk,
asynchronously. A monitoring process can look at event counters in such memory
as they are updated in real time. It is generally better if the program
updating the counters also stores a generic description of them -- type, name,
a hierarchical structure that can be read out to a JSON record, periodically,
by a separate program. That might be written to a log and/or feed a status
dashboard. Thus, the code doing the work just updates memory words pointed to
from its working configuration, but doesn't ever need to format or write out
updates. If there is any actual text logging, it goes through another ring
buffer to a background logging process that, ideally, is also responsible for
formatting.

Memory management -- new and delete -- is a source of unpredictable delays.
Such allocations are always OK during startup, but often not after. A function
that needs memory, then, should use memory provided by its caller. The top
level can handle memory deterministically, pre-allocated or on the stack, with
a global view of program behavior.

Using separate processes enables starting and stopping downstream processing
independently, and isolates crashes. Ring buffers being read are always mapped
read-only, so a crashed reader cannot corrupt any shared state.

~~~
non-entity
This is all fascinating. Do you have any recommended reading on designing said
high-throughput, low-latency systems?

~~~
rramadass
You might find the following useful.

* Network Algorithmics,: An Interdisciplinary Approach to Designing Fast Networked Devices - [https://www.amazon.com/Network-Algorithmics-Interdisciplinar...](https://www.amazon.com/Network-Algorithmics-Interdisciplinary-Designing-Networking-dp-0120884771/dp/0120884771/ref=mt_hardcover?_encoding=UTF8&me=&qid=)

* See MIPS Run - [https://www.amazon.com/Morgan-Kaufmann-Computer-Architecture...](https://www.amazon.com/Morgan-Kaufmann-Computer-Architecture-Design/dp/0120884216/ref=sr_1_1?keywords=see+miups+run&qid=1568954579&s=books&sr=1-1-spell)

* UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers - [https://www.amazon.com/UNIX-Systems-Modern-Architectures-Mul...](https://www.amazon.com/UNIX-Systems-Modern-Architectures-Multiprocessing/dp/0201633388/ref=sr_1_1?keywords=curt+unix+systems&qid=1568954680&s=books&sr=1-1)

* Advanced UNIX Programming - [https://www.amazon.com/Advanced-UNIX-Programming-Marc-Rochki...](https://www.amazon.com/Advanced-UNIX-Programming-Marc-Rochkind/dp/0131411543/ref=sr_1_2?crid=2FOM6IA8PYE2V&keywords=advanced+unix+programming&qid=1568954735&s=books&sprefix=Advcnced+unix+pro%2Cstripbooks-intl-ship%2C373&sr=1-2)

------
benologist
I have been making my test suite emit structured data for the API tests which
is used to document the API. This eliminated the margin for error in manually
keeping the API documentation up to date. This improved the test coverage a
lot as complete coverage is required for the documentation to be complete. It
looks great too -

[https://github.com/userdashboard/organizations/blob/master/a...](https://github.com/userdashboard/organizations/blob/master/api.txt)
derived from
[https://github.com/userdashboard/organizations/blob/master/t...](https://github.com/userdashboard/organizations/blob/master/tests.txt#L244)

Another thing that helped was moving all my UI page tests to Puppeteer which
is a NodeJS API for browsing with Chrome and tentatively Firefox web browsers.
This let me automatically generate screenshots for my entire UI to publish as
documentation, while simultaneously testing the responsive design under
different devices which surfaced many issues.

[https://userdashboard.github.io/administrators/stripe-
subscr...](https://userdashboard.github.io/administrators/stripe-
subscriptions/deleting-subscription-at-period-end) generated by
[https://github.com/userdashboard/userdashboard.github.io/blo...](https://github.com/userdashboard/userdashboard.github.io/blob/gh-
pages/administrators/stripe-subscriptions/deleting-subscription-at-period-
end.js)

~~~
2rsf
> moving all my UI page tests to Puppeteer

Why did you choose Puppeteer over other options like Protractor/Selenium or
Test Cafe ?

~~~
benologist
I already had some familiarity with Puppeteer but mostly it's just because
Puppeteer's NodeJS and my project's NodeJS, they work together without extra
setup steps, configuration etc.

------
gitgud
Stateless components, or as I like to call them _dumb components_.

We found it much easier to reason about logic in the code base with having
many small _dumb_ components, which didn't have any state or complex
functionality. These would be controlled by a few smart parent components to
coordinate them.

The result was a lot cleaner. We implemented this on a Web client, but I think
the concept would work well in any codebase.... _dumb classes are easier to
understand_

~~~
rumanator
> Stateless components, or as I like to call them dumb components.

You mean like pure functions?

[https://en.m.wikipedia.org/wiki/Pure_function](https://en.m.wikipedia.org/wiki/Pure_function)

~~~
gitgud
Yes similar, I'm talking about reactive UI components though (used in React,
Vue, Angular etc.). They're a class that might have many functions. In this
case all the component's functions would be pure functions though.

Perhaps a better term could be _pure components_ maybe?

~~~
BubRoss
What does that mean? Something that is read only?

------
Someone1234
Immutable JavaScript/CSS/Blobs/etc.

We have a very typical [web] codebase, server-side code (e.g. business rules,
database access, etc), server-side Html generation, and
JavaScript/CSS/Images/Fonts/etc stored elsewhere. Two repositories (content
and code).

So the obvious question is: How do you manage deployment? Two repositories
means two deployments, which means potential timing problems/issues/rollback
difficulties.

The solution we use is painfully simple: We define the JavaScript/CSS/etc as
immutable (cannot edit, cannot delete) and version it. If you want to bug fix
example.js then it becomes example.js 1.0.1, 1.0.2, etc. You then need to re-
point to the new version. The old versions will still exist and old/outdated
references will continue to function.

This also allows our cache policy to be aggressive. We don't have to worry
about browsers or intermediate proxies caching our resources for "too" long.
We've never found editing files in-place, regardless of cache policy, to be
reliable anyway. Some browsers seemingly ignore it (Chrome!).

We always deploy the "content" repository ahead of the "code" repository. But
if we needed to rollback "code," it wouldn't matter because the old versions
of "content" was never deleted or altered.

There's never a situation where we'd rollback "content" because you add, you
don't edit or delete. If you added a bad version/bug, just up the version
number and add the fix (or reference the older version until a fix is in
"content," the old version will still be there).

~~~
mattmanser
A much easier way than this is to append a hash of the file instead of
'versioning' it. Some people add it as a query string, some add it into the
filename.

Been doing this for years with infinite (well, practically) cache settings.

These days it's built into most js compression tools afaik.

~~~
Someone1234
That doesn't work, because no one file exists in isolation. If you're using
version 32.14 of this, you want version 32.14 of that, and this other thing.
Versioned directories make this kind of grouping natural and easy, co-mingled
hashes do not (and you could do both but you have the downsides of both and no
real upsides).

Plus semantic versioning can help cross-team communication, there's no human
understanding of raw hashes.

~~~
diggan
You don't necessary need to use a hash based on randomness. Either using the
git commit as the version/hash or a hash based on the content of the file
itself works.

So as long as your entrypoint and it's references are versioning, everything
follows from that. So if I load version A of index.html, it also points to
version A of the scripts/styles. If you load version B, you get version B of
the scripts/styles, since everything is versioned the same way.

~~~
mattmanser
Git commit is a bad solution, you want to use the file hash so you can have
multiple bundles that version automatically. Also, if you pushed a change to
even some comments or something not related to code, your bundle will change.

Often you have a bundle of library code that you rarely ever push changes to,
and you don't want your clients to download each time you make minor changes.

------
cbanek
Simplify, simplify, simplify. Don't make tomorrow's problem today's
complexity.

Get rid of any configuration options that no one uses. These things get passed
around in flags sometimes to deep levels and can make logic complicated. Don't
add a configuration option until you are sitting at someone's desk and see
they need it and why. Only add the bare minimum. Same for APIs, buttons, and
features.

------
tnolet
Don’t use Kubernetes or Microservices. Solves most problems.

Not even being sarcastic.

~~~
rumanator
In your opinion what's wrong with Kubernetes or microservices?

~~~
quickthrower2
Not the OP, but I'd say use only Kubernetes if you have the time to dedicate
for the team to learn that technology and it's mental model.

~~~
pbar
From a developer point of view, one should not have a mental model in play for
Kubernetes, the standard 12 factor pattern should be it. If not, the
infrastructure and the app are strongly coupled

------
weitzj
The biggest principles for emerging good code for me are:

Inversion of control (pass in your dependencies), keep your architecture
orthogonal (make it composeable and really think if you need to inherit things
rather than delegate them), code-generation of a transport api via gRPC and
only focus on the business logic implementation.

~~~
happycoder97
What is orthogonal architecture?

~~~
shoo
i would assume: the different parts of the architecture are independent, and
each addresses a completely distinct non-overlapping responsibility. i.e. you
can add or remove or adjust each part without interactions between parts

~~~
happycoder97
Could you give some guidelines to keep an emerging architecture orthogonal?

------
valand
Keep states where they are needed.

Make most things immutable.

Prefer composition to extension.

Treat Types as contracts.

Sandbox "unsafe" codes (codes that interacts with network, file storage, etc).

Eliminate side effects.

Eliminate premature abstractions.

Prefer explicit over implicit.

Keep components functional.

Prioritize semantic correctness and readability.

Use events to for inter-component communication when those components don't
need to care about each other's functionality.

Think protocol over data.

~~~
croo
I nodded along expect the last one. What do you mean by thinking protocol over
data?

~~~
valand
I meant: When creating an endpoint, a component, a feature, or a data
structure, I treat them like protocol. Protocols enable other components to do
more things while being robust and efficient. It must be, to certain degree,
extensible and forward compatible. With that mindset, you're likely going to
avoid more trouble in the future, while indirectly enforcing open-closed
principle in every level.

------
scarface74
Ripping out as much home grown code for cross cutting concerns (logging,
database access, retry logic, etc) that previous developers used and using
third party packages.

------
hellwd
There are many decisions that you can make to improve the quality of your
codebase. There is no a recipe that you can follow because each application is
different but there are some general things that can make your life easier.

Here are some tips that helped me a lot:

\- Keep your solution and tech-stack as simple as possible

\- Mark those parts that can change often and try to make them configurable
(when you have it configurable you don't need to change code and re-deploy
every single adjustment)

\- Make sure you have a good and readable logging

\- Use DI

\- Separate your application core application logic from the infrastructure
part (DAL, Network Communication, Log Provider, File readers/parsers and
similar)

\- Keep your functions/methods clean and without side effects

\- Method has to return something (try to minimize the usage of "void"
methods)

\- Split each feature or functionality you are working on into small pieces
and compose the final thing with them

\- Be disciplined about your naming conventions and code style

------
tnolet
One of the things in the Clean Code book really helps.

Methods and functions should be around 5 lines.

Doesn’t always work but is great to aim for.

~~~
caseymarquis
Is there an article on this? I feel like I must be missing some context, as 5
lines seems short enough to be counter productive.

~~~
cbanek
I think 5 lines is pretty short but good. At the very longest, I like a
function to fit on one screen of text so I don't have to scroll to see the
entire function.

~~~
juangacovas
I like to use curly brace jump shortcut and interactively debug my code and
other's code to avoid being too picky about this stuff, unless you have to
stick to 80x24 kernel surface ;P

------
pryelluw
Small and simple over big and complex. Plus some functional patterns and a lot
of YAGNI based thinking.

------
pearjuice
Honestly, tests and by extension testable code. The amount of enterprises
processing tens to hundreds of millions of dollars (either business value or
actual revenue) without tests of vital parts of their software is something
which is mind blowing. You can sometimes not fathom how they are comfortable
with changing a line without having tests to back them up. They f5 a page or
recompile the server software, redeploy click through it and "yup it works
let's ship" and then a few days later find out it broke a csv import of the
external warehouse inventory system which runs once a week because they
removed a dash between sku and title for better SEO in the online catalog.
Oops, good luck finding out where the problem is because you have zero
integration tests. A few million down the drain because import division
couldn't possibly know what to forecast on due to no stock data. And this is
not an exception to dumb bugs and malfunctions occurring because developers
don't write tests.

You can start an entire business in consulting on test automation and you
would never run out of work.

------
oftenwrong
YAGNI, KISS

Choose Boring Technology
[http://boringtechnology.club/](http://boringtechnology.club/)

Build your system to be level-triggered as much as possible. Its default mode
should be reconciliation: examining its current state and transforming that
into the desired state, especially if the current state is "something went
wrong". Build in dumb reconciliation before worrying about making it more
real-time.

The fewer moving parts, the better. Don't go multi-service architecture until
you absolutely have to (see YAGNI, KISS).

Keep your business logic contained, separated from everything else, in ONE
place. If I open up your business logic code, I shouldn't see anything about
persistence, the network, etc. Similarly, I shouldn't find any business logic
in your other concerns. The business logic interacts with other concerns via
abstractions.

Be unforgiving when it comes to correctness guarantees. Use the type system as
much as possible to make errors impossible.

------
diehunde
Very basic ones:

\- Strong test suite

\- Delete duplication as much as possible by using any techniques such as
method extraction and keeping classes and methods small.

------
he0001
Reduce the number of tools, use them to the max, and know those tools
intimately. When it falls short consider a new tool.

------
happycoder97
I had been organizing all of my projects so far using layered architecture.
Recently I read this article about layered architecture:
[https://dzone.com/articles/reevaluating-the-layered-
architec...](https://dzone.com/articles/reevaluating-the-layered-architecture)
Now I feel that layered architecture was a poor choice for many of my previous
projects.

So, I think, instead of layering, for example I should put everything that
needs an access to a User entity's internal fields in User class itself.

For example: User.getProfileAsJson() // for sending out to frontend

Now I am confused regarding where to put methods that involves two entities.
Suppose there is an Event entity which represents some online event that can
be registered by the User.

Where is the best place to put getEventsRegisteredByUser()?

~~~
neRok
I'm not a pro and don't do this for a living, but here are my 2cents...

I recently started a large project, so did some reading on
architectures/patterns like DDD and Clean-Arch. One of the most important
points I took from both was to clearly define your domain. But based upon past
experiences, I have developed a dislike for "heavy" objects like those used by
DDD and ORM's in general. I like to keep things simple, sort of "functional"
in nature - what your link refers to as anemic objects. So I have stuck to the
SOLID principles, and in particular the D = dependency injection. I've also
taken a fancy to RPC style code, so that influences my code. BTW, clean arch
isn't too different from the image of Layered-Arch in your link, more of an
evolution really.

So here is how I apply my concepts to your problems...

Users want to know the Events they are registered for, and Events want to know
the registered Users. You have a circular dependency! But really, the problem
to me is that you haven't expanded your domain enough. I think you should have
a third entity, something like UserEventRegistrations. Now User's and Event's
don't depend on each other, and UserEventRegistrations will depend upon them.
No circle!

As per my like for anemic objects, I would have a User model object to hold
properties like name, and a UserRepository for doing CRUD style operations
with methods like GetByID() that returns a User instance. The same would apply
for Event, and something similar for UserEventRegistrations, except it's
repository would have a dependency on the User and Event repository so that it
can do methods like GetEventsByUserID().

Then to apply this in Clean-Arch style, I leverage whatever statically typed
language I am using (Go, TypeScript, etc) to implement interfaces. So I define
the domain layer as the model objects, and interfaces for the repositories.
For the persistence layer, I would create a concrete implementation of the
repository interfaces, and they would return instances of the domain model
objects. Then for presentation, I would create a layer that expects to be
dependency-injected with a concrete implementation of the repository
interface. So my layers are separate, based upon the "contract" that is my
domain layer.

Now your example for User.getProfileAsJson() is vague in meaning, but if you
wanted to return the data in a different format than the domain model, you
could have another layer on the presentation side of the equation that handles
this. It would utilise the repositories to build what you need. So your
"Profile" might be a single JSON payload containing a User with their Events.
Your function would do UserRepo.GetByID(), check you have a User, then do
UserEventRegistrationsRepo.GetEventsByUserID(User.ID). Then it would stick it
in your payload, and viola.

I've not completed my project yet, but I've implemented some functionality in
all layers (Go server pulling data from RDBMS and sending to TypeScript UI),
and it seems to be working well. I've also noticed after the fact that my
domain layer ends up looking exactly like a protocol buffers definition, so
maybe just use those.

------
matt_s
Specifically to allow easier changes: abstraction, encapsulation and
separation of concerns.

An example would be if you have a module that calls a REST API to get/put
something (say time sheets for your invoicing app), then have that be its own
module that is testable.

Create internal TimeSheet data structures that you pass to/from that module.
The core functionality of your app should be implemented using the TimeSheet
data structures and you can have tests that use those and then separate tests
around calling an API.

New customer comes along and says they want to send you CSV files via SFTP
(yuck, but they got money). You just have to write a new interface that works
with exchanging those files and gets them into your TimeSheet data structures,
the core of your app should remain unchanged.

------
srijanshetty
One controversial opinion: monorep which is ideal for small teams iterating
really fast. The other one was figuring out 12 factor app by serendipity as we
were focussing on keeping our operations simple.

~~~
pbar
Make sure to take great care of the monorepo, and break it up _before_ it
becomes impossible but necessary

~~~
srijanshetty
I completely agree, but we're constrained to two developers and won't hire
anytime soon. So, monorep seems to be working great for us.

------
throwaway1954
Write proper git commit messages.[1]

A few examples here.[2]

1\. [https://drewdevault.com/2019/02/25/Using-git-with-
discipline...](https://drewdevault.com/2019/02/25/Using-git-with-
discipline.html)

2\.
[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/)

------
afpx
Eliminate “broken windows”

[https://en.m.wikipedia.org/wiki/Broken_windows_theory](https://en.m.wikipedia.org/wiki/Broken_windows_theory)

~~~
Noumenon72
What, were people swearing in the comments and putting Easter Eggs in the
releases?

~~~
reilly3000
I'm assuming they mean a clean and orderly codebase invites developers to
commit clean code. Its hard become motivated to make well-formed units when
there are dumpster fires everywhere you look.

------
ndreipoppa
For an event driven app(poker game) built with React, redux and redux-saga, we
deleted almost the entire project(100k lines of code) because our logic was
tightly coupled with the sagas and reducers. Now we moved our logic inside the
state selectors(we use reselect), the reducers are dumb, while sagas are only
used to listen/dispatch async actions.

------
dustingetz
Clojure

~~~
xcubic
Where to start?

------
CameronBarre
Structuring software as a series of processes separated by queues in the small
and large.

