
Ask HN: How would you fix this failing startup? - thenewcto
Suppose that you&#x27;ve joined a small startup as the new CTO. This startup has a lot of users but its quality of service is terrible. Their website and apps are slow and they fail left and right.<p>How do you find the cause of all problems and fix them? And more importantly, how do you make sure they never happen again in future?<p>P.S: The tech stack is not important here. I&#x27;m interested to know your approach and management practices that you&#x27;d implement.
======
watt
This will happen again and will always happen if product is prioritized above
sound technology base. Any time something is developed incurring tech debt.

To solve, the company will need to go into investment phase again. Invest into
sound technology, paying off all the tech debt, building resilience and self-
healing into services.

Product development will suffer, but that's how debt works.

~~~
thenewcto
As I said, this is not much of a technical issue, but lack of processes and
best practices. The codebase does not look very bad.

> building resilience and self-healing into services

How would you do that?

~~~
watt
Circuit breakers, retries, back-off. Monitoring, heartbeat detection,
automated restarts when service is unhealthy. Here's a no-nonsense paper, and
a lot can be relevant to a smaller company too:
[https://www.usenix.org/legacy/event/lisa07/tech/full_papers/...](https://www.usenix.org/legacy/event/lisa07/tech/full_papers/hamilton/hamilton.pdf)

------
kanobo
You should read The Phoenix Project by Gene Kim, you basically described it's
premise. It's a pretty good novel that describes how DevOps practices can help
a company dig itself out of the hole you describe.

------
karmakaze
How do you know tech stack isn't contributing?

> apps are slow and they fail left and right

Better tooling can improve performance and reduce errors.

> approach and management practices that you'd implement

What approach and management practices are currently used?

~~~
thenewcto
> What approach and management practices are currently used?

As far as I've seen, they don't have any in place.

------
icedchai
I'd start with monitoring: performance, error logging, general infrastructure.
You can't fix the problems if you don't know what's actually happening.

