Most practically: your personal knowledge of the system will grow tremendously through the process of trying to formalize it and debug the errors you encounter. It's likely that basic assumptions you have about the protocol are false and that real bugs arise due to these subtle imperfections.
Some formal system technologies let you "extract" algorithms into other languages. Model checkers like Alloy don't. What you take away is what you learn and what you can use of that learning to convince others.
A good concrete takeaway would be information you could use to design a test suite that breaks the system even where others believe it to be solid.
My backround in hardware and software validation is long and deep, so I clearly see the benefit of getting information about the test space that needs to be covered. Still, it seems like it ought to be possible to do some kind of automated test stimulus generation using what Alloy already does. I am imagining a machine-readable report in some format that is easy to feed into a problem-specific test generator script.
In a past life, I worked on automatic test pattern generation for sequential logic circuits. Which, if you want to tilt at windmills for little $return (1), is as good a pursuit as any. But it strikes me that this is a very similar problem. Given a description of a state space, create stimulus that covers it, sensitizing all the logic paths in a way that produces an observable result that differs from expectations.
(1) The central lesson of trying to sell test: Electronic Design Automation dollars are like western US water law, the first to divert wins. In the case of EDA, the up-front design tools get all the dollars, and there is hardly a trickle of money left by the time it reaches test phase.
This is essentially exactly what model checkers do. The big difference is that they exploit the structure of the problem description in order to reach exhaustiveness in “reasonable” time. This can be a big time saver over trying to do it manually.
I’d emphasize that using Alloy/TLA is often an iterative process where you refine your model over time. In this way, it’s possibly faster to dig through several iterations in Alloy before identifying a counter-example to a property of interest.
But if you are mostly interested in finding interesting test sets then you should be able to stop once you’ve modeled your actual implementation well enough.
Hmm. I don't use Alloy nor do I know hardware yet. I'll still try to dig up something. (comes back) It looks like a lot of CompSci folks built test generation in Alloy for languages such as Java. Might have not released any general tools for Alloy, though. Good news is they describe the benefits and some specifics where hardware people might get ideas for homebrew tools.
Honestly, I'd have thought TLA+ would be better of the two for hardware given it deals with concurrent things. Also, it can do stuff similar to SPIN model-checker that was used extensively for hardware verification in the past. For hardware, I'd have experimented with SPIN for lightweight method with ACL2 for heavyweight esp thanks to Jared Davis at Centaur contributing ACL2 components he used.
Just for kicks, I did some quick Googling on test generation for sequential circuits to see if I could find anything interesting for you. Me, too, as I enjoy seeing what people do in your field. Most were paywalled but this one wasn't:
Thanks for the links. My current application is not test generation for hardware, though. I'm working with autonomous robots that need to interact with real world obstacles that can't be easily predicted (laundary carts, golden retrievers, bride's maids....), need to navigate spaces where WiFi comes and goes, and need to interact with other automated systems running their own state machine that I don't control. Something that could identify "Recovery scenario needed here" has good value.
Apparently, I totally misread what you needed. Sorry. Your domain is very interesting. I briefly studied it a long time ago. They were using fusion of data coming from different components with quite non-deterministic algorithms such as fuzzy logic or neural networks. There wasn't any way to verify them at the time. I've seen a few attempts at such things but would need to know what kind of algorithm you use for sense, learn, and act (or whatever they call it). No guarantees here.
Some formal system technologies let you "extract" algorithms into other languages. Model checkers like Alloy don't. What you take away is what you learn and what you can use of that learning to convince others.
A good concrete takeaway would be information you could use to design a test suite that breaks the system even where others believe it to be solid.