
Ask HN: As a junior developer, how do I learn to better architect software? - greyskull
I don&#x27;t quite know how to ask this, so bear with me. I feel like there&#x27;s a large gap, a lack of direction between the common advice - DRY, KISS, design patterns, clean code, etc. - and understanding how to tackle actual architecture problems.<p>On the small team I work on, we have various problems that are affecting our ability to maintain and extend our code, and, more importantly, hindering our ability to scale.<p>An example: We&#x27;ve been using MySQL basically as a data processing platform for transforming data overnight (and storing it transformed in separate tables), so we can do even more aggregation for requests during business hours, albeit more efficiently; I want to be able to simplify the role of the database, move the processing logic up, and get away from long overnight jobs. However, we lose the benefit of having our pre-processed data resting in the database waiting to be aggregated at a moment&#x27;s notice; doing both the processing then aggregation on the fly would be far too expensive. How do I research how to deal with that?<p>This also extends to architecting our code. The front-end especially is rough, the usual story of pent up technical debt. Our web interface has gotten more and more complicated, and while we can get very far by just revisiting it with experienced eyes, there&#x27;s still glaring problems with complex interactions of components.<p>I&#x27;m pushing (and succeeding) at moving towards efforts to clean up the mess that has accumulated (maybe even a whole rewrite, woo!), but I want to be able to really pull my weight and contribute to the process. I&#x27;m a recent graduate, and my design experience is relatively limited, but considering the number of resources available to me to learn how to become a better _developer_, I&#x27;m missing what&#x27;s available to become a better _architect_ (beyond the usual &quot;it comes with time&quot;). I hope that makes sense.
======
weddpros
Offering advice is difficult without more information... but I'll try. Yet
maybe I'm talking more about systems architecture than software architecture.

A cutting-edge architecture is often using cutting-edge technologies. We don't
have much details about your system, but it feels like "big data" is your
keyword: be it NoSQL databases, map-reduce, or maybe streaming/events systems
are areas to investigate.

Apache has many projects related to big data, which may give you a broader
vision... and guide you. Be prepared for a paradigm shift: you can't apply old
recipes to new technologies. For example, believing Cassandra could replace
MySQL because it has "almost SQL" would be such a mistake: Cassandra requires
a different data model.

Try to understand how big data technologies could fit together for your
problem, and you'll gain valuable insight. Apply these technologies in a
successful project, and you'll gain experience.

As for your frontend, in terms of architecture, there's not much that can be
done but to decouple it (with a REST api) if not already done... It could make
a rewrite easier.

Sorry for such general advice. Maybe it's too basic to really help you.

~~~
Ace17
Software architecture precisely is about abstracting your application code
away from technology. Practising TDD is a good way to allow you learn about
how to better design your components. Because all your components end up
having two clients: your application code, and your test code. This will guide
you through better components interfaces (and also make your code a lot more
flexible!).

~~~
weddpros
As for OP, I think his post mentions both systems architecture and software
architecture, although he doesn't make a distinction.

He cites scalability and maintainability as 2 problems linked to a single
definition of "architecture". One is more related to "systems architecture"
and the other to "software architecture"...

Also I'm not sure I agree 100% about "Software architecture precisely is about
abstracting your application code away from technology".

Say I want to go from MySQL and night batches to Cassandra and real time
events processing: code is unlikely to survive, because of the paradigm shift.

Say I want to go from a monolithic Rails application to a Rails REST api
server + Angular front end. Same thing: most of the code will not survive.

A worthwhile change in systems architecture is probably one some code will not
survive to...

~~~
Ace17
OK, let me rephrase it. The better your software architecture is, the more
resistant to technology changes it will be. (it doesn't only apply to
technology changes, by the way).

I understand your examples. When a technology changes occurs, all code
directly depending on this technology doesn't "survive". If you make SQL
requests from your view code, the view code will probably need to be mostly
rewritten when you decide to change the database. But a good architecture will
not feature SQL requests from the view code. A good architecture will prevent
components to depend on things they don't need. A good architecture will
introduce layers to guarantee some degree of independence. So when a
technology change occurs, most of the code "survives".

It all boils down to managing dependencies: you don't want high-level code
(business rules, use cases) to depend on low-level code (storage, display).
High-level code is the reason why your application exists. It's independent
from low-level details such as "my storage system is a SQL database" or "my
webserver is Apache", or even "the code runs on MS Windows".

Porting is exactly this: changing some low-level concretion (OS, middleware)
while keeping the same logical behaviour (i.e from the user point of view).

~~~
weddpros
You can obtain some independence through software architecture, I agree...

But it's no panacea. We thought ORMs would shield us from DBs change, then
noSQL appeared. We thought 100 concurrent users were plenty and used 1 thread
per user, but now it's 100M and we need event loops and sharding. We thought
our server had one core, now it has 16, and we use 100s of these.

OP's company thought using mySQL was a good idea, but maybe someday, nights
will not be long enough for batches to run. Abstracting away mySQL will not
help.

Nowadays, the number of possible architectures has exploded. Moving from MySQL
to Postgres solves no hard problem. But moving from MySQL to a cluster of
Cassandra servers does, and it's hard. Just like moving from night batches to
real time.

I'm biased of course, because I'm working on projects where "old tools" can't
be used because of scale. For "old style projects", old tools are still
usable, and little benefit can be gained from newest systems architectures.
But sometimes, the real "business problem" is scale, not complex business
rules.

Last word about software architecture: I'm the one who introduced my employer
to modular developement, TDD or functional programming (and many others). I'm
convinced good software architecture is of paramount importance, but it can't
solve most paradigm shifts alone.

~~~
vezzy-fnord
What do you mean by "then NoSQL appeared"? It has always been around, it just
became an extremely peculiar buzzword for a while.

~~~
weddpros
"It has always been around": not in 2000... I'm 43 :-)

~~~
vezzy-fnord
So what's this, then:
[https://en.wikipedia.org/wiki/Dbm](https://en.wikipedia.org/wiki/Dbm)

~~~
weddpros
Not even networked, surely not a replacement for a sqldb server. That's not
really what you have in mind when you say "nosql".

nosql appeared when they started to call it so... Redis/couchdb were among the
first I think, which were called nosql (don't trust my words on that).

For sure I know not-sql predates sql. Like IMS(1968) predates DB2, from which
we got SQL. But not-sql was not called nosql :-)

------
brudgers
Start with Episode 1 of SE-Radio [1] and listen in order. Some of it will be
dated, some of it might be a challenge relative to your experience, a lot of
it will probably be not directly applicable to your job. But they discuss a
lot of software architecture at the hardcore UML, big systems, legacy
integration, Phd research level and you will get to see how experienced
architects think about problems and ideas and solutions and discipline. You
will encounter abstractions, and applying useful abstractions is the heart of
design.

Good luck.

[1]: [https://feeds.feedburner.com/se-radio](https://feeds.feedburner.com/se-
radio)

------
kluck
In order to find solutions for your problem (redefining the role of the
database) you should look at how to analyze your software processes (data
flow, dependencies, coupling). Being a good architect, in my opinion, is all
about analyzing "what's there" and finding ways to transform that into "what
should be there which is better because X". What helps me most when analyzing
an existing system/architecture is visualizing it. When I can see the thing
with my eyes, there are usually all sorts of visual cues that lead to the
actual problem(s) with a specific architecture and that is often a good
starting point for making changes. Sorry, if my advice may seem a bit generic
or simplified.

------
asfarley
Personal projects. As a junior it's hard to get the authority to make long-
term architectural decisions in the workplace. It's not really enough to work
inside someone else's architecture; seeing something grow from nonexistence is
the one true path.

------
collyw
Without knowing much detail about your platform or what it does, you _should_
be using the database to be doing the heavy processing, it is likely to be
more efficient than application level code. In my opinion the declarative
nature of SQL means that is less buggy than application code (though that may
not always be the case).

I usually find if my application code is getting to complex is time to update
the data model.

Is the database properly indexed so that the queries run optimally? If you
aren't sure, this site here is an excellent resource. [http://use-the-index-
luke.com/](http://use-the-index-luke.com/)

------
Jugurtha
Stand on the shoulders of giants:

"The Architecture of Open Source Applications"

[http://aosabook.org/en/index.html](http://aosabook.org/en/index.html)

You can start by checking Volume I.

There's also "500 Lines or Less" which is a part of the effort:

[https://github.com/aosabook/500lines](https://github.com/aosabook/500lines)

Enjoy yourself :)

------
enterx
1\. See how others have solved the problem. 2\. sketch, sketch, sketch - it
easier to throw the paper then to rewrite the system.

------
chipsy
Think of the architecture as a wheel that you're running in like a hamster.
You never "finish" architecting, but you know that you make progress when you
complete full revolutions of the wheel. In the first half of the wheel, you
code simply and naively and accumulate features and problems. In the second
half, you gradually sort out those problems, identifying the low-hanging fruit
for refactors first and then gradually applying more advanced techniques to go
from an 80% correct solution to 90% by identifying the things that are key
components, which would benefit from a more complex treatment, while throwing
out things that are unjustifiably complex and replacing them with "dumber"
code. Then in the next revolution, you go from 90% to 95%, and then to 98%,
99%, 99.9%, and so on, until the day the codebase is abandoned.

If you apply the procedures in the wrong order - trying to do advanced design
in the first half of the wheel, for example - the project immediately suffers
because of premature assumptions. This is why rewrites are so risky, because
you're throwing out accumulated knowledge from previous turns of the wheel in
the hopes that you can jump to a higher correctness level. Remember - even if
you succeed at the rewrite, you never get to 100%.

So, based on your description, we can say that you're somewhere in the second
half, and you currently have a problem with analysis of the current
architecture. First figure out which parts are highest priority, and which are
likely to continue to last for a whole turning unaided. Then dig into the
structure of the high-priority code looking for two things:

1\. Bad factorings - things that would work better if they were rewritten
inlined. Each time you inline, you remove at least one point of dependency;
each time you factor out something done in two places, you add at least one
point of reuse. Thus you can make a ton of progress just by inlining the
existing source, reading the result, and then finding new reusable parts in
that - typically the outcome is a net positive on dependencies/reuse.

2\. Bad data design - structures that cause more problems than they solve. At
first data is always "just" parameters passed to an algorithm, but in any real
program data soon also has to carry around information determining the future
state of the program at a broader level(running different code based on the
type of the data, etc.). It's this second aspect that is key - what you are
looking for is how the current form of the data is motivating dependencies in
the code, and whether a different canonical form would reduce dependencies.
Sometimes this means more structure, sometimes it means less.

This is a "shaking off the dust" action where you discover the true nature of
the codebase - as opposed to everyone's initial conceptions - and it can be
done at any time. You can also use techniques like drawing the callgraph or
running benchmarking tools with an eye towards finding architectural
bottlenecks. Eventually you'll see patterns that warrant the inclusion of some
nice higher-level, shared construct. Those are the design wins you are looking
for - you only need one or two of them to make a huge impact, and they only
come right at the completion of the revolution, enabling a new round of more
naive, product-facing code to be written. Taking care of the low-hanging fruit
makes the code base less confusing to work on, thus it typically precedes the
big wins.

The introduction of new external dependencies like a different database
product is one of those ways in which you can make a big win, but you don't
want to add a big dependency lightly, since it raises the lower bound on how
much effort is needed just to maintain the system at a basic level. It's
always another tradeoff that adds new code "behind the scenes", and the main
advantage is that you aren't writing that new code yourself.

As you go through this process, testing and static analysis becomes crucial
for making sure all the shuffling around of things isn't causing an issue, so
don't proceed until you have a process for that. Testing can always start as a
manually-driven checklist, and then automated as automation wins are
identified. Likewise, static analysis can start as a human code review process
and then be supplemented with automated tools. Too much automation creates
another dependency, thus it has to be weighed against human time costs.

------
angersock
Probably the most useful thing you'll do to learn about architecture is to
screw up this project. Sorry, but there it is.

You can read a lot on Martin Fowler's site about how to screw up the project.
You can watch some great videos by Uncle Bob (I especially recommend
"Architecture: the Lost Years") on how to screw up the project. You can even
checkout some ruminations on software development by reading through old game
project post-mortems on Gamasutra.

If you want advice about how to get better at this, I'd say sit down at lunch
and redesign the system. Do this like every day for a week (or month, or until
you stop leaving work hating the state of things). Then, go talk to the
business folks and really _learn_ how the business functions. Then, figure out
where your designs failed to meet these requirements, and use that as a guide
for how to evolve things.

I would say, though, learn how to pull off features aggressively and without
hesitation to simplify the code. The best architecture is the one that is
hardly noticed, but which, when seen, appears obvious and yet mutually
exclusive with bad implementation.

~~~
sheepmullet
"If you want advice about how to get better at this, I'd say sit down at lunch
and redesign the system. Do this like every day for a week (or month, or until
you stop leaving work hating the state of things). Then, go talk to the
business folks and really learn how the business functions. Then, figure out
where your designs failed to meet these requirements, and use that as a guide
for how to evolve things."

Spot on. And to add to this you should also try to implement parts of your
better designs outside of work. This will help you gain a concrete
understanding of the trade offs in your design.

Secondly, when you get requirement changes or feature additions at work, try
and apply (even if just on pen and paper) those changes to your architected
versions. Often this will give you an aha moment where you realise your
favourite alternative has serious flaws.

