

Ask HN: Has anyone implemented a zero-maintenance system? - _pdp_

I am more and more interested in designing and developing a zero-maintenance system - i.e. a system that once built requires no maintenance whatsoever.<p>Is this possible to achieve at all?
======
ChuckMcM
It is an interesting question, but it would help to be a bit more crisp. You
need to define 'maintenance'. Allow me to explain.

A sun-dial, requires little maintenance after it has been set up. Cleaning off
the bird feces and the leaves is all. It is not "zero" maintenance though. A
trail duck is just a pile of rocks, and in the absence of getting knocked over
requires no maintenance, but may require repair from time to time. A
spacecraft on a mission to Pluto might require a software change or a course
correction, but its hardware is not changed once it launches. A humming bird
feeder sits on my porch and provides sugar water to birds in the neighborhood,
when it runs out I pour in more.

So there are three things people often lump under the general heading of
maintenance:

Repair - making something work after breaking.

Resupply - replacing consumable supplies, cleaning, and adjusting

Renew - giving something new capabilities or changing its operation by
replacing parts or software

If you're being pedantic about the term, there is no such thing as 'zero
maintenance' if you want a system to remain stable. Entropy is a thing.

So perhaps the more crisp question is, are people working on systems with
really low maintenance requirements? Where "low" can be defined as hours of
maintenance vs hours of operation or skill level of maintainers vs skill level
of the builders, etc. Sure there are some really famous ones like the clock of
the Long Now. Every museum in the world with interactive exhibits tries to
design systems that will run with very little maintenance for a long time.
Pretty much every operations lead spends time working on ways to keep large
numbers of computer systems running with little human intervention.

------
ams6110
This was the default way software worked before widespread use of the
internet, and in particular game software up until the last few years. You buy
a game on a CD or cartridge, and that's it. There is no "maintenance" after
that.

Before internet use was common, there was just no really convenient way to
deploy bug fixes, at least to consumers. Enterprise software updates were
probably handled by a visiting rep who applied the updates, or a set of tapes
with instructions to the local administrators.

But for consumer/office stuff, when you shipped something, that is what it
was. You worked pretty hard to be sure did what it was supposed to do.

~~~
Semiapies
You still had plenty of bugs, of course. Users just had to deal with them
longer - until the next upgrade, which usually wasn't free.

------
andyidsinga
This question should include a time frame. (just like when someone desires a
"real time system", the deadline is key to answering the question "is it real
time or not"

One way to set this time frame could be "life of the system" \-- which might
be say 2 years, or 5 years, or 50 years.

The more constrained your requirements and design are in terms of inputs and
outputs and error states, the more you can validate and get closer to
maintenance free (finite state machines are a nice pattern for this).

Many embedded systems (appliances, space program gear) achieve this, but I've
noticed that as those are built more and more with off the shelf general
purpose hardware and software they become more error prone and require
maintenance (due to competition and time to market tradeoffs?)

edit: one final thing re error states : watchdog timers. These can be
tremendously useful in reducing maintenance, and keeping functionality alive.
When certain error states are entered, the watchdog timer triggers and the
system can restart to a known good state and, hopefully, restore its core
functionality until that particular error state is entered again.

~~~
_pdp_
Thanks for the comment. You are right. I did not think of all the space
exploration equipment that is up there. But maybe there are doing upgrades as
well? I don't know.

Also I like what you said about timeframe. Why software doesn't have a
warranty period after which the software is absolute?

~~~
andyidsinga
certainly they are doing upgrades and maintenance. They have probably set time
frame requirements for these too, for instance, "device must be able to
function with current sw for N hours/days/months while out of radio contact"

Your question was a great one though -- and worthy of pursuit as a sw
engineering goal in general!

------
forgottenpass
Of course it's possible, but I don't know where you're trying to go with this.

Set a lifespan the product should be able to operate without maintenance,
based on your goals and engineering constraints. Design with the lifespan in
mind, perform lifecycle testing. Then release your product, stop working on it
and don't offer support.

The feasibility of such depends on what you're trying to build. Customers have
different expectations of lifespan, maintenance needs, and manufacture support
depending on the product. A garden hoe is not an alarm clock radio, which is
not a car, which is not a general purpose operating system.

The status quo of maintenance for any product domain is an inherent limitation
to putting out something better. You can't just jump ahead of the competition
because you will it so. You need some combination of: building with higher
quality components, insight into the problem domain that none of the current
commercial solutions have, or simply more productive (ie. more expensive) R&D.

If you're just looking at products that currently require maintenance and are
licking your lips at the idea of selling a turn-key solution, join the party.
Otherwise roll up your sleeves and make low maintenance a key design goal of
your product.

------
acd
Sure I've seen it it did not run anything modern os wise. The MS DOS based
control computer at my dads previous work place had an uptime of what was
close to forever. We are talking stability in the range of ten to fifteen
years. It ran a climate computer system control program and the control
software newer crashed. Because MS DOS it was single tasked and the control
program was very stable.

At the time there was little computer hacking activity so you did not have to
update the software either. Backups was done to floppy disks and there was a
reserve hotswap backup computer. For remote control you would dial into the
computer via modem, first via 300/1200 baud that is 1200 bits per second. Then
later at lightning 9600 baud.

I'm also sure you find very good old engineering in the computers of Voyager.

~~~
PaulHoule
People do maintenance all the time on spacecraft, including sending commands
to get stuck things unstuck, updating firmware, and other things:

[https://books.google.com/books?id=le-M5s191moC&pg=PA126&lpg=...](https://books.google.com/books?id=le-M5s191moC&pg=PA126&lpg=PA126&dq=telstar+repair+radiation+damage&source=bl&ots=nWVQAv2qIW&sig=t9nRqSJnQS9mRYOihilc6KrBvaU&hl=en&sa=X&ved=0CEsQ6AEwBmoVChMIs76ItIboxgIVxveACh3XnADR#v=onepage&q=telstar%20repair%20radiation%20damage&f=false)

------
Zenst
It is always something to aim for but with anything time always catches up. I
would say if it is anything internet or connectivity wise externally in any
form comm wise, then you have to plan for maintenance from a security and
protocol changes aspect over time (thinking transaction data formats).

So if security not an issue or removed from the project via some wrapper be
that system locked in a room with control and other layers. Then you still
have to plan for how long it is meant to last and being realistic.

WIth most business assets your talking 5 year write off at best tech wise with
many 3 or maybe less, depending upon use.

SO possible - in the right situation - yes. But do not forget security.

THough remember time always a factor and also wear and tear and with that a
pen eventually runs out of ink and required refiling. So dependant upon the
timeframe doable but for many at what cost.

Another aspect would be look at what you have got that has fulfilled this
criteria and not required maintenance and then ask, were you lucky or did you
plan it that way as for many it usually ends up as some legacy system, hardly
anybody uses or with low usage that has no internet connectivity at all. That
is when you see your Amiga or C64 or early PC working away without issue.

But things fail and however well you plan things, things happen and to not at
least check everything working and monitoring because it was designed to just
work is something you should also plan. Expect the unexpected.

------
deanclatworthy
Your question provoked an interesting thought process when I began to answer
it.

I had first thought that one of my projects was zero-maintenance as I truly
gave it zero-maintenance in the three years between when I built it, and when
I sold it. It was a website which indexed content from others. I had coded the
"spider" in such a way that it was quite flexible if the content on the page
changed a bit and it stood me well for those three years. I had a database
back up system in place that backed up to another server.

But then I began to realise that although I had gave the site zero-
maintenance, it wasn't truly zero maintenance. If I'd had a hardware failure,
I'd have a backup of the database but no way to automatically spin up a new
server and put it in place. But even if I had some monitoring in place for
that, what if the monitoring service went down? I guess what I'm getting it as
that there can never be a zero-maintenance self-sufficient service - just one
that is incredibly well automated by great engineers.

------
kwhitefoot
My fridge is 30 years old and unless you count periodic cleaning and
defrosting as maintenance it has had no maintenance at all in that period. One
of my two freezers is 25 years old, ditto. Microwave oven is over twenty as
well, but as the internal light has failed I suppose it doesn't quite count.
My wife's hair dryer wasn't new when we met and that was forty years ago,
still going strong.

I very much doubt that anything I buy from now on will last so well, mostly
because of quite unnecessary use of electronic control systems that add
unnecessary features and extra failure modes. Of course my new fridge uses
only half as much power as the old one which means that the new one saves me
about 125kWhr per year, which in turn means that it will take about twenty
years to pay off the investment, good job that wasn't the reason I bought it.

As someone already pointed out you have to specify the expected life time to
be able to evaluate 'zero-maintenance-ness' of a system.

------
rbc
The idea of zero maintenance is appealing but out of reach in a lot of ways. I
suppose you mean some kind of software system. From the perspective of the
systems world the runtime dependencies are a moving target. Most platforms
evolve and their interfaces change over time.

The teams the develop these platforms (Java/Node.js/Ruby) have to balance
support for old code branches with the development of new ones. Once the
platform team leaves a code branch behind, the problems start to multiply.
Zero day exploits are developed for zero day vulnerabilities. The underlying
operating systems also evolve and leave the old versions of the platform
behind as well.

At some point you have to open your applications code base back up and update
it for the updated platforms. If you don’t the application dies with its
platform. Sort of like the ship that sinks of rust or building that collapses
due to the accumulation of deferred maintenance.

------
pjc50
Many non-networked systems neither need nor support updates.

If your system is networked, there is always the risk of security updates
being required. This can be minimised by extreme effort, but if you have to
support SSL/TLS? At least you need a way of deprecating the vulnerable
ciphersuites.

Would you want a web browser with zero updates? It would gradually become a
handicap.

For consumer equipment, "zero maintenance" means "disposable": in the event of
trouble, throw it away and buy a new one. Good for manufacturer turnover, not
so good for the environment.

The other option is simply 100% outsourced maintenance, which has long been an
option. Some mainframes would even do their own fault reporting, all you'd
have to do is say hi to the technician and let him into the building.

------
_pdp_
I think I should have been a bit more clear but I also like that the question
was a bit vague because there is a lot of interesting comments highlighting
things I never thought about it.

I believe that software engineers failed in some ways because we always factor
the maintenance as part of the development cost. In other words, we assume
that we have to maintain a system once it is built. It is even more relevant
with web applications which give the impression that they are in a constant
flux. I don't buy the idea that web apps needs to adopt because standards
don't move that fast and almost all browser are backwards compatible with
sites that were developed in the 80s. I don't know what is behind HN but its
simplistic, non-fluid interface hasn't been changed (I don't know actually)
since it was made which for me shows that it is possible to make something
useful without the need to constantly change. There are many example like
that.

I also like the analogy that some of you wrote about physical systems. Indeed,
there are physical systems that hasn't been changed for years and they still
work. There is also a lot of software systems, mainly SCADA stuff, that are
also designed never to be changed. Not long ago I heard a story about a
pentest on a SCADA system that was controlling a dam. Despite that the pentest
found a bunch of vulnerabilities, it was not possible to change the system
because its author was dead long ago. The dam operated just fine even with the
vulnerabilities in its software.

Another point I wanted to bing about is about defining time and version
constrains. Back in the days software was not continues. In other words you
get Doom 1. Doom 2 is a different story. Therefore, Doom 1 source code can go
free. I would say that Doom was very much low to zero maintenance system,
albeit a software one. Bugs were part of the character of the software.
Hacking the software through its bugs was as close as it get to magic in the
digital world. As we get more connected we require to maintain the software.

These are just a few things that came to my mind. I will add more thoughts as
I go over your comments.

~~~
PaulHoule
For operational software that runs a business you are going to need to change
the software because the business changes.

It is true that the visual UI of HN looks stable but behind the scenes they
are always making changes to improve the community. Part of it is dealing with
attacks & spam, part of it is trying to shape the behavior of users who are in
the grey area between positive and negative contribution.

So far as security, that is another force for perpetual change.

------
vinceguidry
If it's doing anything useful, then no. The definition of 'useful' will change
over time, so what the system is doing will fall away from that unless it is
changed to bring it in line with the new requirements. Cue maintenance.

The trick isn't eliminating maintenance, but in making maintenance as hassle-
free as possible. The best way I've found to do this, the way I do it
personally, is to build enough slack into the human system surrounding the
automated system (in other words, the business you're working for) to
adequately design, implement and iterate maintenance procedures. "Ship" the
procedures by handing them off to an unskilled person so you can get feedback
for iteration.

------
mattkrea
I don't think there is such a thing.

Best case scenario is a lot of automation and monitoring.

Where I work our dev / engineering team is investigating machine learning to
automate repair a bit better. For example, over a 7 day period we record data
for load on our backend and that ML model predicts and generates autoscaling
rules according to historical data. Some of our services are about as low
maintenance as I would think they can get (autoscaling, autoremoving poorly
performing instances, etc) but I'd like to improve and only make an engineer
get involved when the system can't be trusted to make the right call.

------
Spooky23
If the requirements are static and you don't require external connectivity,
it's very possible.

My wife is a financial person. Her billing system (it's a public utility)
until 2009 ran on an AS/400 with 33mhz processor. It was probably rolled out
in 1991 or 1992 and upgraded to support IP connectivity in the late 90s.

It's only connectivity was a modem that called IBM when something broke.

I made fun of her about it, but It actually worked great. The only reason it
was replaced was that IBM ran out of spare parts for the printer.

------
Pamar
Google Search Appliance was supposed to be "deploy and forget" (at least in
its earliest version) but it still had a connection (over phone line, IIRC) so
that Google support could reach it in case of need.

Such a design makes sense only if the work the system has to do is predictably
static though. Even in the case of Google Search Appliance, the moment your
company adds a document in a new format the device will require some kind of
external intervention to keep working.

------
lugus35
A Zero-maintenance system ? This is just what NASA is trying to build.

It's just a matter of relation of cost versus maintenance work.

If you want to build a zero-maintenance system by yourself, just try to
imagine your service and servers will be launched over to Pluto or a comet,
and will never come back.

But remember that you have zero-maintenance if you pay someone to do the
maintenance in your place. That's what people generally do when paying for
cloud services.

------
lazerwalker
I have a server that doesn't do anything at all. It has neither inputs nor
outputs connected to it, and isn't even connected to a power source.

Except that I suppose I'll still have to blow the dust out of it every so
often, and make sure it's in an environment where it won't rust.

(To be less glib: what do you mean by "system"? What do you mean by
"maintenance"?)

------
milankragujevic
My hair dryer has been working for 20 years without any maintenance. However I
don't think that would be a good idea for anything connected to the internet.
However, if it's a closed system, and it does the same thing constantly in a
constant environment, I think it's possible.

~~~
_pdp_
But what if anything you program is written exactly like that. The internet is
still based around 20-30 years old protocols and there are software systems
out there that still work. So what is the secret sauce of building a software
system that will survive without any maintenance for the next 20 years?

------
fizx
Any system is zero maintenance as long as you choose not to maintain it.

Also, zero is a very small number.

------
brianwawok
Most hardware is like this, right? Your microwave does not (yet) get firmware
updates.. what ships has to work right.

I don't think this is a good idea for a web server.. browsers change, you have
to do some things to keep current..

------
otikik
AFAIK the only zero-maintenance system is no system.

So in a way, every time I realize I don't need to build a "system" for a given
task, I am building "a zero-maintenance system" of sorts.

------
cgio
I have a couple "quick fix" excel based solutions that still run, with no
maintenance 7 years later. There is nothing more permanent than the temporary.

------
beginrescueend
It depends on what you mean, what your goals are, what you are trying to
build, etc.

"No maintenance whatsoever?" I doubt it. Things break, get old, go obsolete,
are insecure, and wear out.

Perhaps, you should worry about "nines of uptime" or "fixed requirements" or
something along those lines.

Simplicity is first. The more complex, the more maintenance. "Everything
Should Be Made as Simple as Possible, But Not Simpler" \-
[http://quoteinvestigator.com/2011/05/13/einstein-
simple/](http://quoteinvestigator.com/2011/05/13/einstein-simple/)

"No moving parts."

Make it highly available and redundant. Power, cooling, networking, hardware,
and software redundancies are needed.

Make it immutable. Change and mutable state will create maintenance. Implement
functional programming, if you write software.

Monitor it and make it self-restart. Somebody already mentioned watchdogs, for
hardware.

Make it ultra secure. No outside networking?

Program finite state machines....

If you imply a "hardware and software" solution, these points sound like you
need redundant hardware and Erlang/OTP. Take a peek at OTP and the Erlang-
based languages (Erlang, Elixir, Joxa, and LFE).

At least with redundant power, cooling, hardware, and Erlang/OTP (Elixir/OTP,
etc.,) you gain the ability to do all of these things.

With Erlang/OTP, you can achieve very high uptimes, and if you design it
correctly, you do have the ability to hot-patch running code, if you do have
to (rarely) perform maintenance.

While you're at it, you also get distributed programming, concurrency, and
parallelism, for free, with Erlang/OTP. This, in and of itself, can "reduce
maintenance."

See
[https://pragprog.com/articles/erlang](https://pragprog.com/articles/erlang)
and
[http://stackoverflow.com/questions/8426897/erlangs-99-999999...](http://stackoverflow.com/questions/8426897/erlangs-99-9999999-nine-
nines-reliability)

------
lcfg
You should probably define "maintenance", otherwise it's very difficult to
agree on such a system.

~~~
_pdp_
Once built - you don't have to attend it anymore.

~~~
spookylukey
You need to be more specific. In the face of what kind of disruptions will it
continue? For example:

* do you need to keep supplying electricity?

* do you need to keep paying the bills so that someone else will supply electricity?

* will it continue working if there is a hardware failure? What kind of hardware failures can be tolerated? Do you need to intervene if the data gets migrated to a new system?

* will it continue working if there is a flaw (not just a bug) found in something? e.g. it uses TLS version X which turns out to have some flaw in Y years time

* what about it society changes in some way so that the load on it is quite different in the future?

* how long are you expecting to do zero maintenance? At some point the operating system you run on will be obsolete, and there will be no upstream fixes to critical security bugs - what will happen then?

~~~
_pdp_
I should have been more clear but you have some excellent points that make me
thinking. I think the crucial elements are to define the timeframe and
usefulness which is somehow related to time.

------
bdastous
This seems analogous to a perpetual-motion machine.

------
BurningFrog
Define maintenance.

Are you talking about software?

------
blincoln
Back when I was a systems engineer, I built a couple of systems that come as
close as I'm ever likely to see to zero-maintenance.

One of them is a piece of automation that looks for Active Directory accounts
that are "inactive" based on a variety of criteria (including correlation with
data not stored in AD). If they're considered "inactive", then they're
disabled and their description is updated to indicate when they were disabled
and why.

I originally wrote it as a 6-12 month stopgap until it was replaced by a fancy
commercial account lifecycle product. I believe it's now 7 years later and
it's still in use by the team I used to be on when I wrote it.

This may sound like a relatively simple task, but it actually wasn't, which is
why (IMO) almost no one ever succeeds in building low/zero-maintenance
systems. Some of the challenges I ran into:

\- Some users only use their accounts in ways that don't update the last logon
timestamp in Active Directory. I can't remember all of the specifics offhand,
but one was that at the time, there were still BlackBerry users, and if they
only ever used their AD account for email, and only read email on their
BlackBerry, the timestamp wouldn't be updated, so the automation had to query
the BES database to look at their last usage their too and use the most recent
of that vs. AD. I think there were 3-4 things like this, and the automation
used the most recent of them all.

\- Employees go on maternity and military leave, and disabling/deleting their
account would make for them being really unhappy when they got back. So the
automation also has to check the HR system to see if their record is flagged
as being on some sort of extended leave.

\- The government requirement that spurred the development of the automation
only applies to accounts for people. There are plenty of service accounts that
log on infrequently enough that they would be disabled, so the automation also
has to differentiate between those accounts and accounts that represent
people.

In addition to the basic functionality, I also felt that it had to have
significant safety features in place, because if something goes wrong and
_all_ accounts get disabled, then no one can log on to fix the problem. Among
some of the other safety features:

\- With each iteration, the automation calculates how many accounts will be
disabled during the next iteration as well. If that number exceeds one
threshold, warning emails are sent to the account administration team. If it
exceeds a larger threshold, it will refuse to operate altogether.

\- If no information was obtained from one of the data sources that it uses
(e.g. a database was moved to a different server and the connection string
wasn't updated in the account automation config), it will refuse to operate
and generate warning emails.

\- I don't remember the details, but there are special conditions for things
like "too much time elapsed between iterations" and "the last iteration took
place after the current iteration" to catch edge cases where there are
problems with time synchronization.

I was able to build the automation in a way that made maintenance as close to
zero as possible. In the ~7 years it's been running, AFAIK the only thing
that's needed to be changed were the thresholds for number of accounts
disabled in a single iteration, because the company expanded its use of AD and
suddenly it became normal to operate against a thousand or more accounts at
once instead of e.g. 200. A database connection string might have been updated
when a DB got moved to another server as well.

Anyway, trying to predict all of the things that can go wrong even for such a
simple system turned out to be a _lot_ of work. My experience is that it gets
dramatically worse as the system itself becomes more complicated (e.g.
complicated enough to be a commercial product as opposed to an engineering
maintenance task). I don't think it's really practical to do it with
significantly more complex systems - modern computing involves too many
changing variables.

For an analogy, consider trying to build a zero-maintenance system that waters
and fertilizes a garden in a way that keeps the plants healthy. It wouldn't be
_that_ hard to build one that would handle the current garden. Imagine trying
to build one that would handle literally any plant that someone could put in
the garden. It's not _impossible_ , but you'd need all kinds of wacky sensors
and logic to figure out what each plant was and how frequently to
water/fertilize it. It's much easier and less error-prone to just require that
the gardener update a table that lists what types of plant are in which part
of the grid, and _maybe_ generate an alert if something changes that indicates
that the table may no longer be up to date.

------
umanwizard
The iPhone is pretty close, unless you physically damage it

Not sure what is meant by "system" here.

~~~
joshribakoff
The battery has a lifespan and eventually needs replacing as well. Not sure if
that falls within the scope of physical damage, its subjective I guess. Even
routine charging could count as maintenance. What about cleaning the screen
(to get fingerprints off the screen)? Surely that counts as maintenance too?
New wifi & cell protocols could be invented that also render it obsolete. If
you run out of space on it, old files aren't automatically pruned... going
through my photos deleting stuff counts as maintenance I'd say. I've also had
it become frozen / sluggish and had to reboot (even with iPhone 6)

------
marknadal
Yes. Or at least that is the goal.

What type of "system" is it? A database.

I got fed up having to do maintenance on my database, specifically worrying
about having to expand its storage capacity. I'd constantly be woken up in the
middle of the night because my entire web application had crashed because it
ran out of disk space, because the database greedily pre-allocated excess
space to improve performance. I'm not a DevOps guy, so figuring out LVM and
MDADM on the fly was a nightmare. And as my app got more popular, things got
worse and I just could not keep up.

But then I sat back and thought about it for a minute. I was deploying my app
in the cloud, not on my own physical machines. Why was I worrying about
expanding storage capacity when the cloud can sell me more of it than I could
possibly keep up with? Why was I maintaining finite space when there was an
infinite amount available? That doesn't make sense.

The other problem was that despite the fact everything else was working (the
web server, the frontend javascript, etc.), if the database server was down
then nothing would function properly because data is the life blood of my app.
This sucks, if one piece breaks the whole thing is defunct.

So after a while, I decided to change all this. I decided to build a database
that would be zero-maintenance, it hit the top of hackernews several times,
some big name investors (Tim Draper) got behind it. Check out this demo of it
automatically recovering from the complete loss of primaries -
[https://medium.com/@marknadal/gun-0-2-0-pre-release-auto-
rec...](https://medium.com/@marknadal/gun-0-2-0-pre-release-auto-recovery-of-
primary-fault-5f4ffbe63301) .

So how is it zero maintenance? Well some other people in this thread mentioned
some important points:

\- There are fewer moving parts - there is no database server. Instead it gets
embedded into your app server and your frontend (see next point), so the only
thing you have to maintain is your app. If your app is flawless or (more
likely) can auto-restart just fine, then you'll won't need to do any
maintenance.

\- It is highly available and redundant because it uses Peer-to-Peer
architecture, like BitTorrent. So even if your server crashes, your app can
continue to work in offline mode or with WebRTC (not implemented yet) continue
to interact with other users. When the server auto-restarts at some point, the
offline data will sync back up and resolve any conflicts. If you have more
than one server running, then things will continue to work even if one or many
of them crash.

\- Immutable data allows the system to recover and resolve conflicts without
your manual intervention. I'll let others talk about how awesome immutable
data is, I'm sure you've heard enough about how many problems it helps with.
Especially when it comes to maintaining systems, stuff can't get corrupted,
and even when it does, the immutable data allows it to reconstruct itself.

\- State machines allow the system in advance to know what is valid and
invalid behavior, so it for the most part can avoid going down paths that are
"incorrect" which lead to having to do manual maintenance, because it is
already instructed in the first place what states to avoid or how to exit
those states if it gets into them.

I'm super glad you asked your question, because I feel like a lot of software
developers out there are super negative and bitter about it because most
systems they have worked with have basically ruined their lives (like what
happened to me, having to fix stuff in the middle of the night). But just
because a lot of things have been like this, doesn't mean we can't borrow from
engineers or mathematicians ways to make zero-maintenance systems. So I really
hope you find what you are looking for or build one yourself, if you do please
let me know mark@gunDB.io because I'm interested in that sort of stuff. Maybe
we could start some group/forum around zero-maintenance systems!

------
michaelochurch
Yes, I have. It failed. No one used it, so it didn't require maintenance.

I assume that that's not what you're looking for, though.

There's a fine line between "maintenance" and "improvement", and without the
latter, you have stagnation. There certainly are systems that require very low
levels of maintenance. I have a friend who built a program in Erlang that is
still running, 10 years later. (I don't mean that _the code_ is still in
production. I mean that _the program itself_ is still running.) Of course,
Erlang allows the definition of "a program" to span multiple machines, and
we're debating terminology here...

Pay-as-you-go maintenance is best. Don't allow technical debt if you can help
it, push back against The Business on deadlines, certainly don't allow that
micromanagement under the name of "Scrum" to get in or else you're just fucked
when it comes to quality because you'll get a fuck-quality-I-need-to-complete-
story-points culture, and create a culture of doing things right the first
time.

Not that you'll necessarily use them, but learn a few things about strong
statically typed languages like Haskell or Ocaml (Java doesn't count; that's
shitty static typing). One of the great things about Haskell is that it allows
safe refactoring. You're not holding your breath every time you change the
code, because the compiler will usually tell you where your change broke
things, and you can just go in a fix them. It is possible to write highly
reliable software in dynamically typed languages (such as Erlang, mentioned
above) and I don't mean to denigrate those tools at all, but it's a bit
harder, especially when you're fairly new to programming, to do so.

Finally, once your system reaches a certain size, you will need tests no
matter how good your type system is. They start to become an obvious win
around a thousand lines of code. Consider generative testing (e.g. QuickCheck)
rather than hand-written tests if you can.

