

Ask HN: Genetic algorithms for detecting Single Point Of Failure - fizwhiz

I&#x27;ve always been curious at the level of sophistication of the simulation tools wielded by our peers in other natural sciences like physics. It appears that they are able to really simulate a multitude of complex interactions and consequences through both mathematics and computational models. Perhaps I&#x27;m being extremely naive, but I&#x27;m yet to come across a comprehensive enough tool (or set of tools) that can model a &quot;perturbation&quot; (request, input etc) rippling through an entire software system that could ultimately cause some severe failure. This failure would be the result of some hidden Single Point of Failure or a result of a congruence of multiple side-effects that end up in a system-wide halt or performance degradation, and it could be modeled by initializing a sense of &quot;state&quot; per component of your system (ex: When my web server is servicing 5000 requests, AND my message queue has applied some backpressure and is 80% full, AND there&#x27;s some heavy asynchronous processing happening in a few other machines for crunching matrices, AND one of my DB&#x27;s has gone down and my Read replica is going to stand in its place temporarily, AND there&#x27;s some arbitrary network jitter -&gt; BOOM!)<p>I understand that architects try to think through these cases on their own, but as you add more components to your system and interface with more open source software that you may not understand as deeply as its authors, it appears that there needs to be a more scalable way to &#x27;discover&#x27; hidden weaknesses. Could we use genetic algorithms to model each of these subsystems&#x27; &quot;state&quot; and throw some kind of perturbations (requests) its way to see how our system reacts? Is there much value in doing this, and if so, then why haven&#x27;t we had our computers simulate computers and subsystems in the realm of software development and architecture?
======
jesusmichael
I'm not sure this type of testing is necessary, since we (computer scientists)
can break down a complete system into is parts and then test them in various
manners. Production performance shouldn't deviate from that of test
environment much.

~~~
fizwhiz
> "test them in various manners"

I'm aware, since all of us do test these sub-systems via unit tests,
functional tests, integration tests and load tests. The point of my post,
however, was different.

The point was that is there an existing way to "definitively" attempt to
capture hidden weakness in the system by mutating each component's state to a
point where the whole system in general exhibits some flavor of a break down.
The above scenario is significantly harder to capture via the above mentioned
testing methodologies which is why I wondered in today's world how to people
attempt to capture it? Our peers in the natural sciences seem to follow much
more rigorous practices so it seemed like low hanging fruit to borrow some
ideas from them.

~~~
jesusmichael
Hmm... I guess I don't understand. My view is that in natural sciences there
are so many variables that can act on a given system that you could never
truly do isolation testing.

However in your scenario, you're saying mutate a "piece of data"? I guess I'm
wondering why a piece of data would mutate in the system if it wasn't
manipulated by a user?

