
How to transition from “startup code” to “stable production code”? - hazz99
Hello!<p>Does anyone had any advice (or guides) on how to transition from messy startup spaghetti code to something more stable and resilient?<p>I am a young software engineer and don&#x27;t have much experience. I know about testing and a <i>little bit</i> of devops (containers, etc) but not much.<p>Most guides I find on the internet are very entry-level. Should I wait till we&#x27;re profitable enough to hire a senior engineer?<p>Our stack is React&#x2F;NextJS&#x2F;Hasura, but I&#x27;m looking for more general advice.
======
throwawaywrench
A random bag of advice based on my personal experience.

1) If you are recruiting and if you are attracting new recruits with phrases
like green fields projects, bleeding edge tech etc stop doing that at least
some of the time. The problem you have is partly a recruitment issue.

Try and get at least one person who is content to be chugging through bugs,
low level security issues, performance improvements, and test coverage. Having
at least one person in the backroom who just does this stuff makes a massive
difference over a year or so.

2) If you know that specific areas need to be reworked allocate a non-trivial
part of your development time to do maintenance. That means shipping less new
stuff. You have to be ok with not shipping some new thing in favor of re-
engineering some internal system that no customer even knows exists.

3) If you don't know what unit tests, integration tests, and snapshot tests
are you should learn. Figure out what your test coverage is like. Don't go
nuts and start aiming for 100% but you can likely cover the majority of your
code with automated testing.

4) When you need to fix a bug think about whether you can create a test.
Create a test that fails while the bug exists, fix the bug, test now passes.
That will prevent you ever reintroducing that bug. Don't be surprised if this
significantly increases the time it takes to fix the bug but it pays off in
long term stability.

5) Collect performance data. Look at it regularly until you have an intuitive
sense of what normal looks like. That way you will know when it gets
better/worse. If you can have a dashboard on a monitor showing stuff like
average request duration all the better. If you can get stuff like error
count, request volume etc on there too you will have gone a long way to
gaining an intuitive sense of 'something is wrong' before anyone else notices.

------
bobblywobbles
My work process goes like this: 0\. Talk to or get instruction from someone
who understands the problem 1\. Understand the problem yourself 2\. Think
about how to solve it 3\. Solve it 4\. Go back and clean up the code (making
it production ready) 5\. Repeat 4 if necessary

Eventually, you learn and get good practices in memory that you can start
writing production code before you ever get to step 4. This is how I see
developers gain experience

------
parentheses
why does the quality of your code matter? it matters if you plan on changing
it often. visiting code just to improve it's quality will likely cause bugs
and cost time/money to no end. focus your energy on code that's changing most
frequently or growing the fastest.

0) human angle

meet to determine what of these prevent sustainable operations or growth. if
something becomes quadratically more painful in relation to team growth, deal
with it quickly

determine what things are annoyances that would save real time (getting angry
at the codebase costs time)

buy time from leadership; if leadership is not on-board with this, then it
will always suffer

1) code review

reviewing code when a company is growing is a good way for original authors to
impart knowledge or suggest architectural improvements (often ones they wish
they made). it also creates accountability and a discussion forum for the
items that follow.

2) testing

testing is like a ratchet and it can be used to build a punch-list of the most
important things to fix over time.

coming up with a way to test that gets you a lot for a little is key at first.
here's an example for an API: (1) snapshot your database, (2) make a bunch of
api calls, (3) turn those api calls into test cases (there are creative ways
to do this quickly), (4) run them all the time, (5) reduce the number of
failing tests over time

find ways to turn your system into a function (use property based testing or
fuzzing and you get a _lot_ for a little).

testing smartly is an endeavor in creativity. an ex-coworker has recently
written a lot about testing without focusing on so i needn't repeat his words
([https://blog.nelhage.com](https://blog.nelhage.com)).

3) quality metrics and agreed-upon standards

some linters are almost free to enable. you can automatically populate the
code with ignore/bypass annotations and all new code should follow the
standards.

enforce these in code-review and monitor the chart of lints violated (also,
lints violated per line of code) - ideally both should trend downwards

4) architectural patterns

most companies have a few key architectural pieces that matter. make
patterns/base-classes/whatever to ensure you can visit those with improvements
without having to code-mod each one. that's what these software tools are for.

~~~
parentheses
there is something i recently started pondering:

there's a perception vs. reality effect here too. let's say an engineer can:

\- implement feature X while shipping tests in 8 hours

\- implement feature X without tests in 10 hours

if i was a manager and tests were delivered, i'd expect that the engineer
wasted time on tests, when instead they could be shipping more features. so,
at the detriment of the code-base developers _could_ be taking longer because
of manager perceptions.

this is an over-simplification. there are second-order effects that take on
this shape. if i write cleaner code (even without tests), i make the next
engineer's life easier.

we work in an environment where it's possible these acts of future-helping go
unrewarded. so, as early-stage engineers we're conditioned to avoid issues of
quality.

------
muzani
Definition: A startup is an organization formed to search for a repeatable and
scalable business model.

Early in a startup, you're heavily focused on research, not development.

Stage 0: Make sure there's a market for your "code"

You should optimize for maximum pivoting speed early on. That means your code
is horrible, it is disposable, and it will break. You might even want to think
outside the box for this - your stack might not be React/MongoDB, and so on.
It might be HTML + Google Sheets + some $5/month thing that converts Google
Sheets into an API.

Disposable is key for speed. Expect to do hundreds of prototypes before you
hit something good.

At some point, you'll jump from a trickle of 4 users/month to 200 users/week.
I would say that it's not startup worthy until there are people who are lining
up to pay for what you're doing.

Stage 1: Start selling

Now you've got people lining up to pay. You have to prove the next risk:
payment. Calculate how much money you need to actually build a self-sustaining
business where everyone can quit their jobs. What's the simplest thing you
need for this? 1000 customers paying $10/month?

Aim for that. If you want to be a little more ambitious, go for 10x, about
10,000 active users/month.

This depends a lot on what you're doing. Let's say you're doing a sales
management tool for used car salesmen and need to track the condition,
repairs, costs on a used car, complaints, etc. You can probably repurpose an
existing CRM's API, or use some open source code. There will be issues with
flexibility and scaling, but you just want to hit 1k-10k paying users.

Or if you're doing debt collection for babysitters, you can use some debt
collection software, link it to a quick and dirty database, and give it a
suitable UI for babysitters.

Once you can secure a stable cash flow, you'll be in a good negotiation
position for investors. It's easier to say, "I have 1k customers and need
funds to get it up to a million," rather than say, "I want to build something
for a million customers and the estimated market size is _this_ big."

Stage 2: Start growing and scaling

So by now, you should have a loyal customer base that throws money at you, and
some funds to improve your software.

Next, look for the bottleneck to getting 7% more revenue next week. It could
be more users. It could be existing users wanting to pay more for another
feature, or requiring that feature before they pick you over the competition.
It's rare that people leave because of a few bugs.

Under that growth target, you can expect to scale about 10x per 8 months. So,
you don't have to build something that supports a million active users right
away, but it should be on your roadmap for the next 16 months.

Build a good pipeline. You'll see cracks in yours - maybe compiling takes a
long time. Maybe there's a lot of null pointer errors. Maybe juniors are
pushing breaking code to production. Find what it is, and patch a solution for
that. Every team is different.

Don't build the best pipeline, not yet. Overengineering is much harder to fix
than underengineering.

Stage 3: Start stabilizing

At some point, the sales/marketing team will no longer be tightly coupled with
the product/engineering team. Users are no longer looking for new features.
They just want it to do what it does, better and more reliably.

By now, you probably have a monolithic mess. You'll have spaghetti code from
juniors and freelancers, that feature that you put in but nobody wanted, tech
cofounders who ragequit, hacky fake code written by cheap people you hired
because you couldn't afford anyone else.

Start documenting and writing automated tests. I know this is controversial to
start this late, but too early, and you run the risk of documenting/testing
something that gets thrown away. Documentation tends to be hard to read, and
people are going to ask anyway.

"But what about if an early stage programmer leaves and won't communicate with
the team?" Then it's likely that the code _doesn 't work as documented_,
especially if you haven't had automated tests yet. It's also quite common that
code was intended to run one way and does an entirely different thing later
and someone forgot to update the documentation.

You might want to hire/promote a system analyst at this stage. This is a
person who knows all the hacks that were used at some point and whose full
time job is to provide support and say, "Oh that thing doesn't actually work
that way," or "We did that because of a rare use case or limitation in the
technology." If you make a programmer or manager do this part-time, they tend
to bury problems or blame someone for not RTFM. System analysts also help to
fix the process - if they're getting a lot of complaints and confusion for one
thing, they'll know that something in the pipeline is flawed.

So in short, the code itself isn't too tough. As long as it runs well, sells
well, and doesn't fall apart, it should be fine. In a startup, you end up
engineering your processes and your team more than the code.

Also don't forget to engineer the cycle between customer > product > UI/UX >
development > QA/testing.

------
shoo
> messy startup spaghetti code to something more stable and resilient?

Maybe the first thing is to try to identify what the actual specific problems
are. Solve the key problems you actually have now, not the problems you want
to have. This probably needs to be driven from a problem identified in the
business then mapped across to what is happening in the tech stack / codebase
/ organisational processes. Messy code in itself isn't a problem. Symptoms of
problems could be: our customers are leaving because we don't have enough
uptime, every second time we make a release we break something but don't
notice until our customers email us weeks later with a support request, there
is a piece of business critical code that we need to change but everyone is
too afraid of introducing bugs to change it, the former lead developer left
and no one understands any of the code in subsystem xyz, releases are a huge
scary manual process involving the whole company with a high failure rate.

Each of these are different problems and will probably require different
solutions

Hopefully you're using version control, writing automated tests, have CI set
up that blocks changes from being merged if they cause test failures, have
mandatory code review from a peer as part of the standard development process,
and have some kind of repeatability/traceability built in to deployment
process so it is very easy to tell exactly what version of code was live at
any given time. Monitoring for production systems is a pretty great idea too.

Nygard's "release it!" book is a pretty good read with a mix of case
studies/war stories and discussion of techniques that can be used to build
systems that are able to cope better when failure occurs:
[https://pragprog.com/titles/mnee2/](https://pragprog.com/titles/mnee2/)

I got quite a bit out of it as a programmer who had largely been writing
application code in backend systems for a decade but didn't have much
experience of building reliable services.

A different kind of problem is to find yourself in a situation where there is
a lot of complex business critical code that must be changed (new feature/bug
fix) but the code has no automated test suite to give anyone confidence that
the proposed changes will not introduce regressions. A rough rule of thumb is
to first try to get an automated regression test suite wrapped around the code
before changing the code, so you have a reliable indicator that can detect if
you have accidentally changed the functional behaviour of the code. Depending
on how the code is written exactly this can be very hard -- e.g. if the
behaviour of the code depends on a lot of side effects it might take a few
days or weeks of analysis and very paranoid surgical patches to modify the
code enough so it is possible to get it to produce deterministic output from
fixed inputs in a regression test harness. Once there are automated regression
tests in place that can detect changes in functional behaviour it is much
easier to perform "refactor to test" changes with speed and confidence, to
restructure code so it is possible to decouple things so fine grained unit
tests can be injected to test specific functional behaviour.

