Hacker News new | past | comments | ask | show | jobs | submit login
Google Address Sanitizer ("compile-time valgrind") to be part of GCC 4.8 (gnu.org)
123 points by willvarfar on Nov 3, 2012 | hide | past | web | favorite | 27 comments



Have these patches been accepted? From the mailing list thread, it looks like there are still some minor issues to work through, and I wasn't able to get a sense of whether the mainline committers wanted to do this or not. The fact that it's been in Clang+compiler-rt mainline for a while, and has been supported, should be a point in their favor.

ASan works great! To save you a bit of effort in figuring out how it works, this is the best source I found (besides the source): https://www.usenix.org/system/files/conference/atc12/atc12-f...


Do you have experience with valgrind? Can you compare & contrast the two?

Also, I thought that the compiler additions was implemented years ago, several times (tristan gingold's "checker-gcc" first, then the bounds-checking patches and most recently "-fmudflap"), so that asan's functionality would basically only require rewriting libmudflap - but it apparently requires a much deeper surgery of gcc. Can anyone familiar with asan and/or the previous implementations comment on that?


This page is a good high-level summary: http://code.google.com/p/address-sanitizer/wiki/ComparisonOf... .

I've used most of the tools on that page, including Valgrind. Valgrind interprets your program, and hooks loads, stores, malloc/free, etc. to track memory usage. ASan is a compiler pass that inserts extra instructions around loads and stores to drive per-address state machines that live in a shadow area of memory.

Valgrind is slow (20x-50x for serial programs, more for parallel programs because Valgrind only executes one thread at a time), but can detect reads from uninitialized memory because your program is essentially executing in a VM. ASan is much faster, especially if the compiler pass can avoid instrumenting some loads/stores based on static analysis, and cannot detect reads from uninitialized memory, but can detect most other kinds of memory areas.

They're both thread-safe, so you can use them to debug parallel programs. As mentioned above, Valgrind is much slower at this, and until recently [1] you would never see most race conditions in multithreaded programs because Valgrind's schedule would run each thread until it yielded voluntarily, rather than pre-empting and time-multiplexing between threads. In contrast, with ASan all threads are running close to at-speed and simultaneously, so in my case performance was >100x better and I actually saw the race conditions I cared about.

Valgrind supports more platforms than ASan (see [2] vs. [3]), and does not require compiler support (so you can use it to debug code from any compiler), but does tend to lag behind new platform features a bit. For example, until recently it borked if you used the new x86_64 RDTSCP instruction.

Mudflap seems similar in concept, but there appear to be issues with the implementation (see "Known Shortcomings" at [4])

[1] 3.8.0 and on have the --fair-sched=yes option http://valgrind.org/docs/manual/manual-core.html#manual-core... [2] http://valgrind.org/info/platforms.html [3] http://llvm.org/releases/3.1/tools/clang/docs/AddressSanitiz... [4] http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging


> Valgrind interprets your program

No, it has a JIT. It's running compiled code. See http://valgrind.org/docs/valgrind2007.pdf for details, section 3 in particular.


You're right (and you should know!), my mistake. I knew it too, and just mistyped. I would edit the post above if I could.


and is it still 20-50x slower? i haven't done measurements, but it feels like a factor of 2 or 3 these days, to me (single threaded).


Thanks! That is very helpful. I'm looking forward to trying on the stable gcc


Long-time Valgrind user here, who has been using asan sporadically for a while inside Google. asan is way faster (and uses way less memory, I suspect), but is less exhaustive. Check out this page for more info: http://code.google.com/p/address-sanitizer/wiki/ComparisonOf...

Most notably, asan does not yet catch memory leaks, and will never be able to detect use of uninitialized values. Valgrind can tell you when you're using uninitialized values at bit-level granularity.

So Valgrind is still king in absolute capability, but it's likely that asan's speed will open it up to being used in cases where Valgrind simply isn't possible due to its speed/size overhead.


You're probably using ASAN a lot more than sporadically within Google. A lot of the continuous builds run with it on.

I think that's really the biggest advantage ASAN has - it's cheap enough that you can just make it the default when running tests or developing on your local machine, and catch a number of memory errors immediately instead of having to actively debug them.


"Compile-time valgrind" seems incorrect. It is compile-time addition of run-time valgrind-like instrumentation.


Yes, that's right. It does do the same thing as valgrind but the practical benefit of compile-time instrumentation rather than dynamic translation is that ASAN is enormously faster.

A lot of people are using this on their fuzzing rigs with large software applications like Firefox. ASAN hugely decreases cost per test cycle (and therefore cost per bug, without changing fuzzers).

A friend and I chipped in to get a fuzz server (quad xeon X5660, 96 GB RAM, dual SSD). It's paid for itself twice over in bug bounties and there are more in the queue. Valgrind was always too expensive but using ASAN builds we can find more bugs.

Having ASAN support in GCC would be really handy, because for large projects it can be a major effort to get everything to compile in CLANG.

If you are sufficiently paranoid, and willing to accept a speed and memory hit (roughly factor of two) you could use ASAN in production. Personally, I am beginning to entertain the idea of using an ASAN-instrumented browser for day-to-day use.


> If you are ... willing to accept a speed and memory hit (roughly factor of two) you could use ASAN in production

That's a pretty interesting idea, and seems to be a practical realization of something people have been trying to do for ages: produce a C variant with more safety. The most prominent project I know trying to do that is the C-like language Cyclone (http://cyclone.thelanguage.org/), but this seems like an alternate approach that lets you get "C but safer" without actually moving away from C.


We're also trying to do that with Rust (it's based on the work of Dan Grossman and others with the Cyclone region-based memory management). It depends on whether you consider it a C variant, of course.

Cyclone is something every programming language enthusiast should look into, IMHO. It's extremely interesting, well-done work.


Don't get me wrong; I don't mean to say it's not fantastic, just that the headline was misleading. My expectations for "Hey, we managed to move this stuff to compile time" is that it will be 1) more interesting theoretically, and 2) less interesting practically (at least in the short term). Either can be fantastic, and these expectations are sometimes violated besides, but I just wanted to give a heads up to others (at least, those that skim the HN comments first) or be corrected if my reading of it was wrong.


Yeah, it's not static analysis... it's runtime instrumentation. And BTW it's very useful and helped me find a race condition.


One advantage of Valgrind over Address Sanitizer is that you're not limited to checking C/C++ programs. For example I am developping a programming language for fun and I can check for bad memory usages using Valgrind. From what I understand, that wouldn't be possible with Google's tool.


I didn't know it is that close to becoming part of GCC, that's great!

I have been trying clang+ASan on a larger project with mixed success. While it could catch a known bug, it seemed to miscompile some essential startup code and therefore I could not get it running in regression tests. Never got around to debugging it, but now I can try again with GCC.

There is also another interesting project out there, although it is not as far along as ASAN:

http://safecode.cs.illinois.edu/

This project aims to track exact memory bounds of all objects. ASan will not detect out of bounds accesses that go to allocated memory of another object, but SAFECode would catch that.


As a person unfamiliar with the concepts presented in the OP, I would not mind if someone explained its significance in layman's terms.


Not sure how "layman" to get... but here you go :)

ASAN helps find difficult-to-reproduce bugs, e.g. data races. A data race occurs when two or more threads access the same memory locations in a undefined order. The OS can interrupt threads at any time, it's hard to test all possible interleavings of instructions from multiple threads. Programmers typically have some invariants in their mind but it's really easy to make a mistake, and hard to detect it.

To use the tool, you compile a C/C++ program in a different way. The ASAN toolchain instruments your memory accesses. Then you run your program (or unit tests for library code). And it will tell you if there were memory errors like data races. Then you look at your source code and fix the bug.

Valgrind does a similar thing and is a standard open source tool. As mentioned on the page, this is faster.


What you described is Thread Sanitizer (clang toolchain). The corresponding valgrind-based tools would be Helgrind and DRD.

ASan can only catch memory errors in C/C++ programs. A memory error is an access outside the allocated memory of an object, e.g. due to incorrect pointer arithmetics. Such errors would otherwise go undetected but can cause all kinds of errors in the program, like corrupted data, crashes etc.


There's some overlap between the two tools. Race conditions often lead to memory errors of the type that ASan can detect. TSan/Helgrind are the way to go if you want to find races that happen to be benign most of the time, or lock ordering problems, or races that lead to non-memory problems.


Nice! Didn't know there was work to add ASan to gcc too.

ASan is already in Clang[1], and I've heard some of rumours of a ‘Thread Sanitizer’ as well. All kinds of interesting projects brewing inside of google, it seems :)

[1] http://llvm.org/releases/3.1/docs/ReleaseNotes.html#whatsnew


Is there any chance in the future of this working on Windows?


See: http://code.google.com/p/address-sanitizer/wiki/WindowsPort for info on Clang and ASan.

Will probably see GCC 4.8 on Windows before Clang++ and ASan.


Speaking of which, does anyone know of any good USPS address sanitizer / matching software?


Depends on what your goal is. The USPS has a number of products that do this kind of thing (See https://ribbs.usps.gov under Address Quality Services).

So if you are looking for something specifically to clean up or test software that cleans up USPS addresses, there you go.

If you want something that takes random address-looking things and tries to find a USPS address that matches it, that's harder :)


You are probably wanting CASS software. The only vendor I've dealt with is Melissa Data.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: