
Tis-interpreter: An interpreter for detecting undefined behavior in standard C - modeless
https://github.com/TrustInSoft/tis-interpreter
======
modeless
Here's an example of how it was used to detect undefined behavior in libwebp:
[https://trust-in-soft.com/out-of-bounds-pointers-a-common-
pa...](https://trust-in-soft.com/out-of-bounds-pointers-a-common-pattern-and-
how-to-avoid-it/)

And musl libc: [https://trust-in-soft.com/variadic-functions-in-tis-
interpre...](https://trust-in-soft.com/variadic-functions-in-tis-interpreter/)

~~~
unwind
Thanks, very interesting.

The second one about musl was very long-winded, it seemed it was written for
people who have never even heard about varargs, which made me wonder what the
intended audience is, heh. I mean, if you know about undefined behavior enough
to use a tool to track it down, wouldn't it make sense to also know about C's
varargs mechanism?

Also it made it sound as if varargs is super-archaic and "best avoided" scary
monster thing, which it really isn't in my opinion. It's a well-defined and
useful part of the language and really helps make better APIs.

------
MaxBarraclough
I'm not convinced by their justification for doing arithmetic on uintptr_t and
then casting back to a pointer type.

They say

> It should work fine for all the ordinary platforms we are used to, because
> of footnote 67 in the C11 standard.

which admits things aren't assured to work. Aren't we in the business of
producing good quality code, here? Following their link to the footnote, it is
indeed far from a rock-solid assurance of expected behaviour.

These two StackOverflow comments make a good case against it -
[https://stackoverflow.com/a/19476805/](https://stackoverflow.com/a/19476805/)
,
[https://stackoverflow.com/a/43419534/](https://stackoverflow.com/a/43419534/)

but our author isn't the only person who thinks it might be ok -
[https://wiki.sei.cmu.edu/confluence/display/c/INT36-C.+Conve...](https://wiki.sei.cmu.edu/confluence/display/c/INT36-C.+Converting+a+pointer+to+integer+or+integer+to+pointer#comment-88032748)

~~~
tom_mellior
Out of interest, where is that quoted sentence from?

My understanding is that while this tool aims to catch _undefined_ behavior,
it also tries to keep going when it encounters _implementation-defined_
behavior. I think you can choose a concrete target platform (CPU/OS/compiler)
to model, at least you can do that in Frama-C, which tis-interpreter is based
on.

So if you specified, say, x86-64 Linux with GCC as your platform, and you
happen to be mostly sure that x86-64 GCC on Linux generates code as they
expect, then that's fine.

It makes no sense to interpret C code independent of a model of a target
platform. There is just too much implementation-defined stuff.

~~~
MaxBarraclough
> Out of interest, where is that quoted sentence from?

Oops, I should've replied to modless's comment - it's from this blog post:
[https://trust-in-soft.com/out-of-bounds-pointers-a-common-
pa...](https://trust-in-soft.com/out-of-bounds-pointers-a-common-pattern-and-
how-to-avoid-it/)

> you happen to be mostly sure that x86-64 GCC on Linux generates code as they
> expect, then that's fine.

Being 'mostly sure' isn't good enough; that's the point of rigorous tools like
this. If a new GCC optimisation breaks your code because you failed to follow
the letter of the law and had UB in your code, that's your fault, not GCC's.

> There is just too much implementation-defined stuff.

Take that thinking to an extreme and you're barely able to reason rigorously
about C code at all. Any number of dangerous constructs might happen to work
on a given CPU+compiler. Good C code should be portable. Detecting dangerous,
poor quality code is the whole point of tools like tis-interpreter.

Things like casting int* to int and back to int* might happen to work on your
CPU+compiler, but it _should_ cause warnings.

~~~
tom_mellior
> Take that thinking to an extreme and you're barely able to reason rigorously
> about C code at all.

No reason to take things to the extreme. Is the code

    
    
        int i = 32767;
        int n = i + i;
    

well-defined? It is implementation-defined whether it is undefined [0], since
that depends on the implementation's size for int.

You _are_ barely able to reason rigorously about C code without considering
some concrete real-world implementation's behavior. (I agree that the
uintptr_t thing would merit an optional warning; maybe there is one.)

[0] Language lawyers might disagree with this statement and just call it
undefined. But if anything, that's even less useful.

~~~
MaxBarraclough
That's a good example. I see two ways to handle that one:

1\. Don't use int, use a type that gives stronger range assurances, like
int_fast32_t (though I'm not sure at a glance what's the minimum legal value
of INT_FAST32_MAX)

2\. Put a compile-time assert to ensure that your range assumptions are met by
the target platform's 'int'

A pity that Ada's 'range types' approach to fixed-point types never caught on.
It solves this platform-variance question, expresses the programmer's intent
better, and enables useful runtime checks. It's possible with C++ templates
(libraries exist for the purpose), but no-one does it.

As for the question of what our static/dynamic analysis tools should do: they
should be paranoid, and warn that the code might invoke UB, unless
specifically configured to make strong platform assumptions. By default, we
should aim to write good portable C.

~~~
dhekir
Even CompCert, the formally verified C compiler, had to make some concessions
concerning the standard: it does not verify _C per se_ , but CompCert C
([http://compcert.inria.fr/man/manual005.html](http://compcert.inria.fr/man/manual005.html)),
which is very close to C99 but with fixed-size integers (32-bit), two's
complement signed integers (so no UB in signed overflows), etc.

Otherwise, it is practically impossible to find some actually useful C code
that is fully portable and compliant.

But I agree that analysis tools should be as configurable as possible, and
report UB whenever they can. However, if the default settings lead to way too
many false positives, the users might end up giving up before learning how to
turn some of them off. It's a delicate balance between paranoia and usability.
Even -Wall -Wextra -pedantic do not activate all warnings about detectable
UBs, and that's about as paranoid as you can get in GCC/Clang.

~~~
MaxBarraclough
As tom_mellior says, CompCert is just a compiler, and it's not deviating from
the C standard.

The C standard says that sizeof(int) is _implementation defined_. CompCert has
to choose a size, and they go with the unsurprising choice of 32-bit.

As for no UB in signed overflows: where the C standard says 'undefined
behavior', that doesn't prohibit an implementation (i.e. a compiler) from
picking a well-defined behavior. It just means that portable code mustn't rely
on that behavior - build such code on a different standards-compliant compiler
and you might end up with demons flying out of your nose, etc.

> it is practically impossible to find some actually useful C code that is
> fully portable and compliant.

Disagree. The C/C++ code I've written has been portable and standard. The
hardest part was that CMake is a trainwreck.

> if the default settings lead to way too many false positives, the users
> might end up giving up before learning how to turn some of them off

That's more an issue with legacy code that when writing new code.

------
krasin
Thank you for posting this. Would you like to compare tis-interpreter with
UBSan? In which case using tis-interpreter will be better?

Speaking as someone who had rather good experience fighting this kind of bugs
in Chromium using UBSan.

~~~
dhekir
One case in which I couldn't get UBSan (nor MemSan) to report issues was with
the usage of unitialized values, which may constitute undefined behavior.

UBSan does not deal with uninitialized reads, but Memory Sanitizer does
("MemorySanitizer is a detector of uninitialized reads"). However, unless it
changed, it only detects them when they are used in conditional branches. So
in the following code, for instance, I couldn't get UBSan/MemSan to tell me
the printf will read some uninitialized value.

    
    
      #include <stdio.h>
      int main() {
        int uninit;
        printf("%d", uninit);
        return 0;
      }
    

tis-interpreter (and Frama-C/EVA, which is related) will report the
uninitialized read.

Note that both tools are _very_ different in nature and usage, since tis-
interpreter is a sound static analyzer (for all entry values), while UBSan is
a runtime monitoring tool (Frama-C also has E-ACSL, its own runtime monitoring
tool). The former provides stronger guarantees, especially when the set of
inputs can be large, while the latter will surely be more efficient when doing
bug detection work, since there won't be any false positives.

Overall, I'd say, use UBSan/MemSan/etc. for the majority of the code, and
focus on sound analysis for critical components or libraries, where mistakes
can be expensive.

~~~
MaxBarraclough
> tis-interpreter is a sound static analyzer (for all entry values), while
> UBSan is a runtime monitoring tool

Isn't it a dynamic analysis tool, i.e. a C interpreter that keeps an eye out
for UB? From the GitHub README.md:

> tis-interpreter works by interpreting C programs statement by statement from
> beginning to end, verifying at each statement whether the program can invoke
> undefined behavior. This makes it comparable to Valgrind and C compiler
> sanitizers (UBSan, ASan, …)

~~~
dhekir
> Isn't it a dynamic analysis tool, i.e. a C interpreter that keeps an eye out
> for UB?

Indeed, I'm sorry, I had tis-analyzer in mind. tis-interpreter is indeed
closer to Valgrind. It interprets the C code according to its semantics,
without going down to assembly level nor executing the code.

------
nickpsecurity
Check out KCC, too, since the foundation is open-source, offers extra benefits
for languages modeled in it, and has a commercial tool for C with either low
or no false positives depending on how it measures up to claims. No
affiliation. I just find the K framework plus an executable, formal semantics
for C cool as heck. :)

[https://github.com/kframework/c-semantics](https://github.com/kframework/c-semantics)

[https://runtimeverification.com/match/](https://runtimeverification.com/match/)

They also have an article comparing it to Tis-interpreter here:

[https://runtimeverification.com/blog/?p=307](https://runtimeverification.com/blog/?p=307)

Their K has been used for languages from Scheme to Prolog to C. More recently,
they also did a semantics for a VM for smart contracts. They're doing a lot of
interesting stuff. Builds on Maude and rewriting logic if anyone is curious.

~~~
dhekir
tis-interpreter also has an open-source foundation, which is Frama-C
([http://www.frama-c.com](http://www.frama-c.com)). Both share some common
code.

Their purpose is not exactly the same as RV-Match's; they are more concerned
with soundness and absence of run-time errors (as well as other properties and
analyses), while RV-Match is a bug detection tool, focusing on not having
false positives. It's important to realize the difference when comparing such
kinds of tools.

Depending on the situation, one is more appropriate than the other, but in
some cases the results may be complementary (as in RV-Match's comparison,
where the reported bugs were different for each of them), so if there's
already some effort to try one of them (plus the *San's from LLVM), it may be
worth trying the others as well. Every bit of help is welcome when dealing
with a language such as C.

Disclaimer: I work with Frama-C.

