A major difficulty here is that undefined behaviour is a property of a particular execution of a C program, rather than a static property of the program itself. Tools that dynamically detect UB are useful, but will not demonstrate that there are no inputs for which a program will go wrong.
As powerful as an interpreter might be for running programs, it isn't the right tool for designing programs. Validating a program design requires static analysis, to be performed either mechanically (e.g., type checking) or manually (e.g., manipulating predicate transformers in the way advocated by Dijkstra).
A major difficulty here is that undefined behaviour is a property of a particular execution of a C program, rather than a static property of the program itself. Tools that dynamically detect UB are useful, but will not demonstrate that there are no inputs for which a program will go wrong.