Hacker News new | past | comments | ask | show | jobs | submit | pramodbiligiri's favorites login
   submissions | comments

> 1) How do you model the myriad fault types that could possibly occur in distributed systems?

You need to think at the right level of abstraction. Are packet loss and reordering really different faults? Are packet loss to a node and the stop failure of that node different faults? The answers depend on your model.

Lamport's Fast Paxos model is a good example of how different faults are handled in different levels of models (http://research.microsoft.com/pubs/64624/tr-2005-112.pdf).

> Is it possible to partially model a system? For example, in SOA, the behavior of a service can depend on and be greatly affected by the behavior of other services. Can each service be modeled in isolation with meaningful results?

Yes, there are several powerful techniques for partially modelling systems. One of the most successful is the idea of abstraction (modelling less) and refinement (modelling more). Abstraction and refinement let you move up and down the levels of detail, and "refinement mappings" provide a rigorous tool for ensuring the validity of these moves.

"The existence of refinement mappings" (http://research.microsoft.com/en-us/um/people/lamport/pubs/a...) is the classic paper in this area, and Lamport's earlier "What good is temporal logic?" was one of the first to introduce the idea (http://research.microsoft.com/en-us/um/people/lamport/pubs/w...). This paper from Van Renesse et al (http://arxiv.org/abs/1309.5671) is a good illustration of the kind of formal things that can be done with refinement.

In practice, choosing the level of abstraction to apply these techniques is key to their success, but their flexibility makes it easy to work at any level. Choosing the right one is very application specific, and requires engineering and mathematical judgement.

> 3) Is the run time of model checking managable? The paper mentioned a few seconds to find bugs, but how long to exhaustively check a model without bugs?

That depends hugely on the complexity of the model. TLC is a very brute-force solution to the problem of model checking (with a few clever optimizations), and run time rises very quickly with model complexity. Model checking is tractable for some very meaningful models.

This is why "the objects of the future" have to be ambassadors that can negotiate with other objects they've never seen.

Think about this as one of the consequences of massive scaling ...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact