
µUBSan: clean-room reimplementation of the Undefined Behavior Sanitizer runtime - ingve
https://blog.netbsd.org/tnf/entry/introduction_to_%C2%B5ubsan_a_clean
======
alexhutcheson
Why was a clean-room approach necessary here? The UIUC License used by UBSan
is extremely permissive, so going through extra effort to avoid creating a
derivative work doesn't make much sense to me.

~~~
ajross
It's not. They're just using the term as a synonym for "rewrite". There's no
documentation of any actual IP isolation in the linked article. They just want
people to know it's new and not based on the existing LLVM or Linux runtimes.

~~~
alexhutcheson
This seems correct. Apologies for the unintentionally pedantic comment.

------
tux1968
This is an optional runtime used to improve error reporting with the Clang
sanitizer:

[https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html)

------
brandmeyer
Darned Good Stuff!

Is the runtime API shared between both GCC and LLVM/Clang?

------
berti
It's sad to see so much duplicated effort. What does this accomplish that a
few patches to the llvm implementation couldn't?

------
Sir_Cmpwn
>I've decided to write the whole µUBSan runtime as a single self-contained .c
soure-code file, as it makes it easier for it to be reused by every interested
party.

I don't really get why people do this. Linking is one of the easiest and most
broadly supported features of C environments on every platform.

~~~
cryptonector
You're not going to like how PuTTY is written then either. The entire initial
portion of the SSHv2 protocol, from version string exchange to the end of user
authentication, is coded as one huge (10Kloc) C function that uses a macro-
driven Duff's device to implement co-routine behavior (it's all async I/O
under the covers).

To me, one huge file or many small ones doesn't matter all that much because I
have cscope on my side. What matters is being able to find things quickly.
Still, one huge file means I can asterisk on a symbol in vim to quickly find
all references to it without having to switch to cscope -- that's not nothing.

Also,
[https://github.com/NetBSD/src/blob/trunk/common/lib/libc/mis...](https://github.com/NetBSD/src/blob/trunk/common/lib/libc/misc/ubsan.c)
is just not that big... only 1639 lines, 1310 sloc. Compare to, let's say,
OpenSSL, where 29 C source files have more lines than ubsan.c, with several
being more than twice the size. Maybe you think OpenSSL is not a fair example
because there's lots of tables and what not? Even if you look at files outside
crypto/, you'll find lots of big ones. Just for fun I looked at a variety of
other open source projects: Heimdal, MIT Kerberos, glibc, PostgreSQL -- you'll
be shocked when you look at their file sizes, even when you elide files that
are obviously mostly-data.

Source file size is not that interesting. The _contents_ is. When a set of
sources is small enough, organizing it into multiple files is not necessarily
a win. 1.3kloc doesn't seem like that big a file.

~~~
Sir_Cmpwn
1639 lines of code actually isn't bad at all. I think that's less of a selling
point and more happenstance, though. I certainly wouldn't want to reject
changes which split it up in the future as it gets more unweildy on the basis
of "but being in one file is a feature!".

~~~
cryptonector
You could have looked at the file size first. What if it had been 500sloc?
Would you still have made the same comment? What is the point at which you'd
absolutely insist on splitting it up if you were doing a code review? Surely
1.3ksloc is smaller than that. The commentary on splitting this up strikes me
as so much bikeshedding.

~~~
abenedic
I think, not to put words in his mouth, that he is objecting to the idea of
single file libraries as inherently good or better than a multi-file library.
I think the objection is more about future design choices the maintainer will
make. If you want to keep it single file, it may be necessary to avoid adding
some features which are too complex.

Complex features usually necessitate modularization, which is against the idea
of the single file library. Modularization in c and c++ is very poor and based
on having multiple files, some of which represent the interface and others
that represent the implementation. I at least partly think this is his
objection.

~~~
cryptonector
OK, sure,
[https://news.ycombinator.com/user?id=Sir_Cmpwn's](https://news.ycombinator.com/user?id=Sir_Cmpwn's)
comment was:

> > I've decided to write the whole µUBSan runtime as a single self-contained
> .c soure-code file, as it makes it easier for it to be reused by every
> interested party.

> I don't really get why people do this. Linking is one of the easiest and
> most broadly supported features of C environments on every platform.

Sure, this is true, and users already have to know how to link, unless they
are #include'ing this, but still, a single file is much easier to
share/distribute and use, and in any case, splitting up such a small file (by
the standards of.. a number of open source projects I looked at, it's small)
seems unnecessary.

~~~
abenedic
I totally agree with you, I personally love single file implementations.

I was just trying to explain what I thought his argument was against them,
since I don't think he really explicitly stated it outside of generally
speaking of maintenance issues. He just said he would prefer to contribute
patches to a project that had multiple files and a makefile based build
system.

I was assuming his argument was about the future directions the project could
go, which I can see as being a valid criticism. As I said the main issue with
a single file implementation is that potential users may end up using parts of
the implementation instead of just the public facing interface you would like
them to use.

~~~
cryptonector
Fair enough.

