
Google Address Sanitizer ("compile-time valgrind") to be part of GCC 4.8 - willvarfar
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00088.html
======
sparky
Have these patches been accepted? From the mailing list thread, it looks like
there are still some minor issues to work through, and I wasn't able to get a
sense of whether the mainline committers wanted to do this or not. The fact
that it's been in Clang+compiler-rt mainline for a while, and has been
supported, should be a point in their favor.

ASan works great! To save you a bit of effort in figuring out how it works,
this is the best source I found (besides the source):
[https://www.usenix.org/system/files/conference/atc12/atc12-f...](https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf)

~~~
beagle3
Do you have experience with valgrind? Can you compare & contrast the two?

Also, I thought that the compiler additions was implemented years ago, several
times (tristan gingold's "checker-gcc" first, then the bounds-checking patches
and most recently "-fmudflap"), so that asan's functionality would basically
only require rewriting libmudflap - but it apparently requires a much deeper
surgery of gcc. Can anyone familiar with asan and/or the previous
implementations comment on that?

~~~
sparky
This page is a good high-level summary: [http://code.google.com/p/address-
sanitizer/wiki/ComparisonOf...](http://code.google.com/p/address-
sanitizer/wiki/ComparisonOfMemoryTools) .

I've used most of the tools on that page, including Valgrind. Valgrind
_interprets_ your program, and hooks loads, stores, malloc/free, etc. to track
memory usage. ASan is a compiler pass that inserts extra instructions around
loads and stores to drive per-address state machines that live in a shadow
area of memory.

Valgrind is slow (20x-50x for serial programs, more for parallel programs
because Valgrind only executes one thread at a time), but can detect reads
from uninitialized memory because your program is essentially executing in a
VM. ASan is much faster, especially if the compiler pass can avoid
instrumenting some loads/stores based on static analysis, and cannot detect
reads from uninitialized memory, but can detect most other kinds of memory
areas.

They're both thread-safe, so you can use them to debug parallel programs. As
mentioned above, Valgrind is much slower at this, and until recently [1] you
would never see most race conditions in multithreaded programs because
Valgrind's schedule would run each thread until it yielded voluntarily, rather
than pre-empting and time-multiplexing between threads. In contrast, with ASan
all threads are running close to at-speed and simultaneously, so in my case
performance was >100x better _and_ I actually saw the race conditions I cared
about.

Valgrind supports more platforms than ASan (see [2] vs. [3]), and does not
require compiler support (so you can use it to debug code from any compiler),
but does tend to lag behind new platform features a bit. For example, until
recently it borked if you used the new x86_64 RDTSCP instruction.

Mudflap seems similar in concept, but there appear to be issues with the
implementation (see "Known Shortcomings" at [4])

[1] 3.8.0 and on have the --fair-sched=yes option
[http://valgrind.org/docs/manual/manual-core.html#manual-
core...](http://valgrind.org/docs/manual/manual-core.html#manual-
core.pthreads_perf_sched) [2] <http://valgrind.org/info/platforms.html> [3]
[http://llvm.org/releases/3.1/tools/clang/docs/AddressSanitiz...](http://llvm.org/releases/3.1/tools/clang/docs/AddressSanitizer.html)
[4] <http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging>

~~~
nnethercote
> Valgrind interprets your program

No, it has a JIT. It's running compiled code. See
<http://valgrind.org/docs/valgrind2007.pdf> for details, section 3 in
particular.

~~~
sparky
You're right (and you should know!), my mistake. I knew it too, and just
mistyped. I would edit the post above if I could.

~~~
andrewcooke
and is it still 20-50x slower? i haven't done measurements, but it feels like
a factor of 2 or 3 these days, to me (single threaded).

------
dllthomas
"Compile-time valgrind" seems incorrect. It is compile-time addition of run-
time valgrind-like instrumentation.

~~~
xyzzy123
Yes, that's right. It does do the same thing as valgrind but the practical
benefit of compile-time instrumentation rather than dynamic translation is
that ASAN is enormously faster.

A lot of people are using this on their fuzzing rigs with large software
applications like Firefox. ASAN hugely decreases cost per test cycle (and
therefore cost per bug, without changing fuzzers).

A friend and I chipped in to get a fuzz server (quad xeon X5660, 96 GB RAM,
dual SSD). It's paid for itself twice over in bug bounties and there are more
in the queue. Valgrind was always too expensive but using ASAN builds we can
find more bugs.

Having ASAN support in GCC would be really handy, because for large projects
it can be a major effort to get everything to compile in CLANG.

If you are sufficiently paranoid, and willing to accept a speed and memory hit
(roughly factor of two) you could use ASAN in production. Personally, I am
beginning to entertain the idea of using an ASAN-instrumented browser for day-
to-day use.

~~~
_delirium
> If you are ... willing to accept a speed and memory hit (roughly factor of
> two) you could use ASAN in production

That's a pretty interesting idea, and seems to be a practical realization of
something people have been trying to do for ages: produce a C variant with
more safety. The most prominent project I know trying to do that is the C-like
language Cyclone (<http://cyclone.thelanguage.org/>), but this seems like an
alternate approach that lets you get "C but safer" without actually moving
away from C.

~~~
pcwalton
We're also trying to do that with Rust (it's based on the work of Dan Grossman
and others with the Cyclone region-based memory management). It depends on
whether you consider it a C variant, of course.

Cyclone is something every programming language enthusiast should look into,
IMHO. It's extremely interesting, well-done work.

------
ThomasQue
One advantage of Valgrind over Address Sanitizer is that you're not limited to
checking C/C++ programs. For example I am developping a programming language
for fun and I can check for bad memory usages using Valgrind. From what I
understand, that wouldn't be possible with Google's tool.

------
mrich
I didn't know it is that close to becoming part of GCC, that's great!

I have been trying clang+ASan on a larger project with mixed success. While it
could catch a known bug, it seemed to miscompile some essential startup code
and therefore I could not get it running in regression tests. Never got around
to debugging it, but now I can try again with GCC.

There is also another interesting project out there, although it is not as far
along as ASAN:

<http://safecode.cs.illinois.edu/>

This project aims to track exact memory bounds of all objects. ASan will not
detect out of bounds accesses that go to allocated memory of another object,
but SAFECode would catch that.

------
hayksaakian
As a person unfamiliar with the concepts presented in the OP, I would not mind
if someone explained its significance in layman's terms.

~~~
chubot
Not sure how "layman" to get... but here you go :)

ASAN helps find difficult-to-reproduce bugs, e.g. data races. A data race
occurs when two or more threads access the same memory locations in a
undefined order. The OS can interrupt threads at any time, it's hard to test
all possible interleavings of instructions from multiple threads. Programmers
typically have some invariants in their mind but it's really easy to make a
mistake, and hard to detect it.

To use the tool, you compile a C/C++ program in a different way. The ASAN
toolchain instruments your memory accesses. Then you run your program (or unit
tests for library code). And it will tell you if there were memory errors like
data races. Then you look at your source code and fix the bug.

Valgrind does a similar thing and is a standard open source tool. As mentioned
on the page, this is faster.

~~~
mrich
What you described is Thread Sanitizer (clang toolchain). The corresponding
valgrind-based tools would be Helgrind and DRD.

ASan can only catch memory errors in C/C++ programs. A memory error is an
access outside the allocated memory of an object, e.g. due to incorrect
pointer arithmetics. Such errors would otherwise go undetected but can cause
all kinds of errors in the program, like corrupted data, crashes etc.

~~~
sparky
There's some overlap between the two tools. Race conditions often lead to
memory errors of the type that ASan can detect. TSan/Helgrind are the way to
go if you want to find races that happen to be benign most of the time, or
lock ordering problems, or races that lead to non-memory problems.

------
scoopr
Nice! Didn't know there was work to add ASan to gcc too.

ASan is already in Clang[1], and I've heard some of rumours of a ‘Thread
Sanitizer’ as well. All kinds of interesting projects brewing inside of
google, it seems :)

[1] <http://llvm.org/releases/3.1/docs/ReleaseNotes.html#whatsnew>

------
chacham15
Is there any chance in the future of this working on Windows?

~~~
xyzzy123
See: <http://code.google.com/p/address-sanitizer/wiki/WindowsPort> for info on
Clang and ASan.

Will probably see GCC 4.8 on Windows before Clang++ and ASan.

------
jameswilsterman
Speaking of which, does anyone know of any good USPS address sanitizer /
matching software?

~~~
DannyBee
Depends on what your goal is. The USPS has a number of products that do this
kind of thing (See <https://ribbs.usps.gov> under Address Quality Services).

So if you are looking for something specifically to clean up or test software
that cleans up USPS addresses, there you go.

If you want something that takes random address-looking things and tries to
find a USPS address that matches it, that's harder :)

