Hacker News new | past | comments | ask | show | jobs | submit login
Twenty years of Valgrind (nnethercote.github.io)
626 points by nnethercote on July 26, 2022 | hide | past | favorite | 112 comments



I sort of owe callgrind a big chunk of my career.

I was working at a company full of PhDs and well seasoned veterans, who looked at me as a new kid, kind of underqualified to be working in their tools group. I had been at the firm for a while, and they were nice enough, but didn't really have me down as someone who was going to contribute as anything other than a very junior engineer.

We had a severe problem with a program's performance, and no one really had any idea why. And as it was clearly not a sophisticated project, I got assigned to figure something out.

I used the then very new callgrind and the accompanying flamegraph, and discovered that we were passing very large bit arrays for register allocation by value. Very, very large. They had started small enough to fit in registers, but over time had grown so large that a function call to manipulate them effectively flushed the cache, and the rest of the code assumed these operations were cheap.

Profiling tools at the time were quite primitive, and the application was a morass of shared libraries, weird dynamic allocations and JIT, and a bunch of other crap.

Valgrind was able to get the profiles after failing with everything else I could try.

The presentation I made on that discovery, and my proposed fixes (which eventually sped everything up greatly), finally earned the respect of my colleagues, and no phd wasn't a big deal after that. Later on, those colleagues who had left the company invited me to my next gig. And the one after that.

So thanks!


I have a very similar experience, but with a different profiling tool. When I first graduated from school and joined a big internet company, I'm not that "different". The serving stack was all in C++. My colleagues were really capable but not that into "tools", they'd rather depend on themselves (guess, tune, measure).

But I, as a fresh member in the team, learned and introduced Google perftools to the team and did a presentation of the breakdown of the running time of the big binary. I have to say that presentation was a life-changing moment in my career.

So together with you, I really want to thank those who devoted heavily into building these tools. When I was doing the presentation, I really felt standing on the shoulders of giants and those giants were helping me.

And over years, I used more and more tools like valgrind, pahole, asan, tsan.

Much appreciated!


I've mentioned this before on HN as a way for a "newbie" to look like a superhero in a job very quickly; nice to hear a story of it actually working!

There is so much code in the world that nobody has even so much as glanced at a profile of, and any non-trivial, unprofiled code base is virtually guaranteed to have some kind of massive performance problem that is also almost trivial to fix like this.

Put this one in your toolbelt, folks. It's also so fast that you can easily try it without having to "schedule" it, and if I'm wrong and there aren't any easy profiling wins, hey, nobody has to know you even looked. Although in that case, you just learned something about the quality of the code base; if there aren't any profiling quick wins, that means someone else claimed them. As the codebase grows the probability of a quick win being available quickly goes to 1.


Always find it weird when people berate C++ tooling, Valgrind and adjacent friends are legitimately best in class and incredibly useful. Between RAII and a stack of robust static analyzers you'd have to deliberately write unsafe code these days.


That sounds great until you realise in other languages you get that by default without any tooling. And with better guarantees too (C++ static analysers aren’t foolproof).

Where C++ tooling really lacks is around library management and build tooling. The problem is less that any of the individual tools don’t work and more that there are many of them and they don’t interoperate nicely.


What language that has anything like cachegrind which is the topic of this thread? Cache misuse is one of the largest causes of bad performance these days, and I can't think of any language that has anything built in for that.

Sure other languages have some nice tools to do garbage collection (so does C++, but it is optional, and reference counting does have drawbacks), but there are a lot more to tooling than just garbage collection. Even rust's memory model has places where it can't do what C++ can. (you can't use atomic to write data from two different threads at the same time)

No language has good tools around library and builds. So long as you stick to exactly one language with the build system of that language things seem nice. However in the real world we have a lot of languages, and a lot of libraries that already exist. Let me know what I can use any build/library tool with this library that builds with autotools, this other one from cmake, here is one with qmake (though at least qt is switching to cmake which is becoming the de-facto c++ standard), just to name a couple that handle dependencies in very different ways.


> Even rust's memory model has places where it can't do what C++ can. (you can't use atomic to write data from two different threads at the same time)

Perhaps not in safe Rust, but can you provide an example of something Rust can't do that C++ can? It has the same memory model as C++20: https://doc.rust-lang.org/nomicon/atomics.html



The atomics themselves sure, but I guess often they'll be used as a barrier to protect an UnsafeCell or something, like in the implementation of Lazy<T>: https://docs.rs/lazy-init/0.5.0/src/lazy_init/lib.rs.html#85


To be fair as an outsider to both Rust and Js they seem to have pretty robust package management between cargo and npm, although npm is kinda cheating as collating scripts isn't quite as complex building binaries whereas PIP's absolutely unberable with all the virtual env stuff.

I've been quite lucky with CMake, after the initial learning period I've found everything "just works" as it is quite well supported by modern libs.


Cargo and npm are very robust so long as you stick only to their respective ecosystems. However as soon as you need something from a different eco system they each become hard. The initial import into an ecosystem isn't hard, but the reimport after every update upstream is very annoying.


I love this story. I'm becoming an older dev now and I've often been blindsided by some insight or finding by juniors - it's really great to see & you've always got to make sure they get credit!


I’m surprised to see the attribution to the tools and not your proposed fixes. Sure the discovery was the first step in the order of operations, but can you elaborate on what enabled you to understand the problem statement and subsequent resolution?

There has to be a deeper understanding I think


I can share mine. It's an ads retrieval system. Latency is very sensitive and it has to be efficient. To avoid mem allocations, special hashtables with fixed number of buckets (also open addressing) are used in multiple places in query processing. Default is 1000. However, there are cases that number of elements are only a handful. Then in this case, it fails to utilize the cache, hence slower.

The solution is to tune number of buckets from info derived from the pprof callgraph.

There were others too, like redundant serialization, etc. But this one is the most interesting.


That's surprising. If I was writing this I'd have instrumented the code for the buckets to (optionally) log the use, and probably add an alert.

(being an armchair expert is easy though)


I also heavily used callgrind/cachegrind to tune critical paths in our high performance web proxy, we’re each micro/milliseconds counts… For example, in media type detection that is called multiple times per request (minimum twice for request/response), etc.


Sounds like the solution probably had something to do with switching to passing by reference + other changes I would assume.


A big pain point for using coroutines is having to pass-by-value more frequently due to uncertain lifetimes.. it's jarring when you come from zero copy programming.


That is what many people fail to understand as to why us C programmers dislike C++


Indeed, because languages with reference parameters preceed C for about 15 years, and are present in most ALGOL derived dialects.


I have a similar experience with xdebug for a PHP shop I used to work at. It feels very similar to being a nerd back at school, rescuing peoples home work, and being rewarded with some respect.


I wish I hadn't read this article because now I know that I've been mispronouncing Valgrind for nearly 20 years but I'm not going to stop.

(Kidding. Thanks for Valgrind! I still use it for assessing memory corruption vulnerabilities along with ASan.)


Our pipelines have asan ( and cpp check clang tidy coverity and coverage stuff) but no valgrind, is there something it is good at that we are missing?


ASAN on its own doesn't detect uninitialized memory. MSAN can, though. Valgrind is also more than just the memcheck sub-tool - there are others, like Cachegrind, which is a cache and branch-prediction profiler.

https://github.com/google/sanitizers/wiki/AddressSanitizerCo... https://github.com/google/sanitizers/wiki/MemorySanitizer https://valgrind.org/docs/manual/manual.html


Yeah, valgrind can report L1/L2 cache misses and report the percentage of branch mispredictions. It also reports the exact number of instructions processed, and how many of those instructions cache missed. It's great for improving small code that needs to be performant.

I'd use asan over valgrind only for memory leaks. It's faster.


If you only want memory leaks, LSan will do that for you.

In general, I tend to use ASan for nearly everything I used Valgrind for back in the day; it's faster and usually more precise (Valgrind cannot reliably detect small overflows between stack variables). Valgrind if I cannot recompile, or if ASan doesn't find th issue. Callgrind and Cachegrind never; perf does a much better job, much faster. DHAT never; Heaptrack gives me what I want.

Valgrind was and is a fantastic tool; it became part of my standard toolkit together with the editor, compiler, debugger and build system. But technology has moved on for me.


Amen. Between the various sanitizers and perf, I stopped needing valgrind a few years ago.

But when it was the only option it was fantastically useful.


If I understand correctly valgrind (cachegrind) reports L1/L2 cache misses based on a simulated CPU/cache model.

On Linux, you can easily instrument real cache events using the very powerful perf suite. There is an overwhelming number of events you can instrument (use perf-list(1) to show them), but a simple example could look like this:

  $ perf stat -d -- sh -c 'find ~ -type f -print | wc -l'
  ^Csh: Interrupt
   Performance counter stats for 'sh -c find ~ -type f -print | wc -l':
  
               47,91 msec task-clock                #    0,020 CPUs utilized
                 599      context-switches          #   12,502 K/sec
                  81      cpu-migrations            #    1,691 K/sec
                 569      page-faults               #   11,876 K/sec
         185.814.947      cycles                    #    3,878 GHz                      (28,71%)
         105.650.405      instructions              #    0,57  insn per cycle           (46,15%)
          22.991.322      branches                  #  479,863 M/sec                    (46,72%)
             643.767      branch-misses             #    2,80% of all branches          (46,14%)
          26.010.223      L1-dcache-loads           #  542,871 M/sec                    (36,80%)
           2.449.173      L1-dcache-load-misses     #    9,42% of all L1-dcache accesses  (29,62%)
             517.052      LLC-loads                 #   10,792 M/sec                    (22,53%)
             133.152      LLC-load-misses           #   25,75% of all LL-cache accesses  (16,02%)
  
         2,403975646 seconds time elapsed
  
         0,005972000 seconds user
         0,046268000 seconds sys
Ignore the command, it's just a placeholder to get meaningful values. The -d flag adds basic cache events, by adding another -d you also get load and load miss events for the dTLB, iTLB and L1i cache.

But as mentioned, you can instrument any event supported by your system. Including very obscure events such as uops_executed.cycles_ge_2_uops_exec (Cycles where at least 2 uops were executed per-thread) or frontend_retired.latency_ge_2_bubbles_ge_2 (Retired instructions that are fetched after an interval where the front-end had at least 2 bubble-slots for a period of 2 cycles which was not interrupted by a back-end stall).

You can also record data using perf-record(1) and inspect them using perf-report(1) or - my personal favorite - the Hotspot tool (https://github.com/KDAB/hotspot).

Sorry for hijacking the discussion a little, but I think perf is an awesome little tool and not as widely known as it should be. IMO, when using it as a profiler (perf-record), it is vastly superior to any language-specific built-in profiler. Unfortunately some languages (such as Python or Haskell) are not a good fit for profiling using perf instrumentation as their stack frame model does not quite map to the C model.


If your tests can take the performance hit, Valgrind would tell you about uninitialized memory reads, which isn't covered by those tools you mentioned. If however, you are able to add MSAN (i.e. able to rebuild the entire product, including dependencies, with -fsanitize=memory) to the pipeline, then you would have the same coverage as Valgrind.


The main reason for Valgrind would be if you're working with a binary that you can't recompile to add the ASAN instrumentation.


Fwiw I've literally worked with Nicholas (but not on valgrind) and I only learned this today somehow.


I was introduced to valgrind by Andrew Tridgell during the main content of a vaguely famous lecture he gave that finished with the audience collectively writing a shellscript bitkeeper client [1] demonstrating beyond doubt that Tridge had not in any way acted like a "git" when bitkeeper's licenseholder pulled the license for the linux kernel community.

Tridge said words to the effect "if you program in C and you don't aren't using valgrind you flipping should be!" And went on to talk about how some projects like to have a "valgrind clean" build the same way they compile without warnings and that it's a really useful thing. As ever well expressed with examples from samba development.

He was obviously right and I started using valgrind right there in the lecture theatre. apt-get install is a beautiful thing.

He pronounced it val grind like the first part of "value" and "grind" as in grinding coffee beans. I haven't been able to change my pronunciation since then regardless of it being "wrong".

[1] https://lwn.net/Articles/132938/

Corbett's account of this is actually wrong in the lwn link above. Noted by akumria in the comments below it. Every single command and suggestion came from the audience, starting with telnetting to Ted Tso's bitkeeper ip & port that he made available for the demo. Typing help came from the audience as did using netcat and the entire nc command. The audience wrote the bitkeeper client in 2 minutes with tridge doing no more than encouraging, typing and pointing out the "tridge is a wizard reverse engineer who has used his powers for evil" Was clearly just some "wrong thinking." Linus claimed thereafter that Git was named after himself and not Tridge.


Tridgell is possibly the most intelligent person I've ever met, and I've met Torvalds and a bunch of other Linux developers -- not that they aren't intelligent too, among them might be a challenger to that title.

Tridge has a way of explaining complicated ideas in a way that pares them down to their essence and helps you to understand them that just really struck me (a smart person is able to talk about a complicated thing in a way that makes you feel dumb, a really smart person is able to talk about a complicated thing in a way that makes you feel like a genius). As well as the ability and intellectual curiosity to jump seemingly effortlessly across disciplines.

And he's a fantastic and very entertaining public speaker. Highly recommend any talk he gives.


I've been promoting proper pronunciation of Valgrind at work, an am making passable progress...


Valarie smiled. Is how I will remember it.

That said I sometimes get the "V" tools mixed up (Vagrant, Valgrind, Varnish)


I've known the right pronunciation for about 10 years. I still say it wrong.


It's giving me flashbacks to the hard G vs soft G in gif image format.


I learned of the tool from a native German speaker who pronounced it wall-grinned, which is apparently half-right. Like latex, I can't keep the pronunciation straight from one sentence to the next.


How do you pronounce it? I hoped it'd be near the start, but several paragraphs in and I'm still not sure.

edit: val as in value + grinned


What other ways are there to (mis)pronounce it?


There are so many amazing ways! ;)

Since it’s an old Norse word, try using Google Translate to hear what happens in Danish, Dutch, German, Icelandic, Norwegian, and Swedish. I don’t know if it’s a modern word in those languages, but Translate is showing translations “election gate” for several languages, and “fall gravel” for Swedish.

According to the audio pronunciations on Translate…

Danish: “vale grint”, long a, hard tapped r, hard d sounds like t

Dutch: sounds like “fall hint” but there’s a slight throaty r in there hard to hear for English speakers, so maybe “hrint”

German: “val grinned”, val like value, grinned with the normal German r

Icelandic: “vall grint”, vall like fall, hard tapped r

Norwegian: “vall grin”, hard tapped r, almost “vall g’din”, silent or nearly silent d/t at the end.

Swedish: “voll grint / g’dint”, hard tapped r, hard d

German is the only one that has “Val” like “value”, all the rest sound more like “fall”. The word valgrind is the door to Valhalla, which means literally “fall hall”, as in hall of the fallen. For that reason, I suspect it makes the most sense to pronounce valgrind like “fall grinned”, but Old Norse might have used val like value, I’m not sure.

BTW Valhalla has an equally amusing number of ways to pronounce it across Germanic languages, “val” sometimes turns into what sound like “fell” instead of “fall”, and in Icelandic the double ell makes it fall-hat-la.

Languages are cool!


Pronouncing the "-grind" like the word "grind". I think that's probably how most English-speakers first assume it's pronounced.


Safe to assume many pronounce grind as "grind".


Now you are really on track to mispronounce Valgrind for nearly 21 years :P


I once submitted a bug fix for an obscure issue to valgrind. They asked for a test case, which I managed to provide, but I was a bit nervous as I couldn't immediately see how to fit in their test suite.

The response from Julian Seward was so nice it set a permanently high bar for me when random people I don't know report bugs on my projects!

We still run our entire testsuite under valgrind in CI. Amazing tool!


What was the response?


Well damn, no wonder he’s so good at optimising the Rust compiler. He literally has a PhD in profiling tools!


Valgrind is an amazingly useful tool. The biggest pain point, though, has always been to read through and process the huge amount of false positives that typically come from 3rd-party support libraries, such as GLib. It provides some suppression files to be used with Valgrind, but still, GLib has its own memory allocator, so things tend to go awry.

Running Helgrind or DRD (for threading issues) with GLib has been a bit frustrating, too. If anyone has some advice to share about this, I'm all ears!

(EDIT: I had mistakenly left out the phrase about suppression files)


Hah, I teach my students to use Valgrind, and I’ve been pronouncing it wrong this whole time. Guess I’ll have to make sure to get that right next semester :)

The magic of Valgrind really lies in its ability to detect errors without recompiling the code. Sure, there’s a performance hit, but sometimes all you have is a binary. It’s damn solid on Linux, and works even with the custom threading library we use for the course; shame the macOS port is barely maintained (last I checked, it only worked on OSes from a few years back - anything more recent will execute syscalls during process startup that Valgrind doesn’t handle).


There are times when LeakSanitizer (in gcc-8.2) would not give me the full backtrace of a leak, while valgrind would, so to me it's still an indispensable tool for debugging leaks. One caveat is that it's magnitudes slower than valgrind. Now, if only I know how to make valgrind run as fast as LeakSanitizer... (command line options?)


You might need to add -fno-omit-frame-pointer to help ASAN unwind the stack.


This is definitely an option you want to be using when using ASan or LSan. You may also want to consider additionally using -momit-leaf-frame-pointer to skip frame pointers only in leaf functions while keeping frame pointers for non-leaf functions. This can make small leaf functions significantly shorter, limiting some of the negative impact of using -fno-omit-frame-pointer alone.

Sometimes even -fno-omit-frame-pointer won't help, like if the stack is being unwound through a system library that was built without frame pointers. In that case you can switch to the slow unwinder. Set the environment variable `ASAN_OPTIONS=fast_unwind_on_malloc=0` when running your program. But note that this will make most programs run significantly slower so you probably want to use it only when you really need it and not as the default setting for all runs.


Happy birthday Valgrind. Next year you'll be able to drink in the US!

Being a UK PhD holder, a sentence stood out out to me was a commentary/comparison between UK and US PhDs: "This was a three year UK PhD, rather than a brutal six-or-more year US PhD."

My cousin has a US PhD and judging from what he tells me. It is a lot more rigorous than UK PhDs.


The UK PhD is 3 yrs, after a 1 yr Masters and 3 yr bachelors. (7 years)

The US PhD is usually 4-5 years after a 4 year bachelors (8-9 years). It is a little bit longer with more graduate-level coursework.

That said, the US bachelors starts at age 17 while a UK bachelors starts after 2 years of A-levels. So in terms of length it’s a wash.


FWIW, you have to be slightly careful as Scotland has a different post-16 education provision.

AIUI you can do Highers (equivalent to GCSE, at 16) and enter Uni then with sufficiently high grades (aged 16/17). Or, stay on for one more year to do Advanced Higher (most common). Uni courses can then be 4 or occasionally 3 years. Don't quote me!


US college starts around age 18, which I understand is about the time A-levels are completed, so I believe there are 2 more years of education associated with a US PhD.


It took me four years for my US PhD, but I had a masters and industrial experience which might have helped speed things up.


I was working on an application for Symbian mobile phones and I was able to implement large parts of it as a portable library - the bits which compressed results using a dictionary to make them tiny enough to fit into an SMS message or a UDP frame. This was before the days of flat-rate charges for internet access and we were trying to be very economical with data.

I was able to build and debug them on Linux with Valgrind finding many stupid mistakes and the library worked flawlessly on Symbian.

It's just one of the many times that Valgrind has saved my bacon. It's awesome.


Many thanks for Valgrind. I can honestly say that it helped me become a better C++ programmer.


Beyond raw technical ability, Nick and Julian were the kindest, most reasonable developers I've ever interacted with. I think a lot of Valgrind's success stems from combination of sophisticated tech and approachability of the core team.


Valgrind's maintainers are super pleasant and have been quite helpful in a number of cases I've personally had to reach out to them.

Lovely piece of software toward which I owe a lot of gratitude.


I am old enough that I started with Purify and I used Valgrind starting from the version 1.0, because Purify was commercial and Solaris only. It saved my behind multiple multiple times.


Purify was an amazing tool. I recently noticed that one of my libraries (libffi) still has an --enable-purify configure option, although it probably hasn't been exercised in.. 20 years? A Purify patent prevented work-alikes for many years, but valgrind eventually emerged as a more-than-worthy successor.

Fun fact: the creator of Purify went on to found Netflix and is still their CEO.


Ha! And I thought that the same person writing bzip2 and Val grind is my surprise for the day.


And Ardupilot


And BoundsChecker was also great!

https://en.m.wikipedia.org/wiki/BoundsChecker


That tool saved me tons of time tracking down bugs. It also taught me to be a better C/C++ programmer. Run time sanitizers like Purify/Valgrind/Boundchecker do not tolerate poor C code. What is kind of cool is you can find whole classes of bugs in your code. Because as devs we get something working once we tend to copy and paste that pattern everywhere. So find a bug in one place you will probably find it a few dozen other places in your codebase.


I worked at a company 11 years ago that was still using Purify!


I used Purify 8 years ago. On Windows. I don't remember the specifics but the company kept a few XP machines around just so they could continue using Purify.


Purify fanboy over here.


> Speaking of software quality, I think it’s fitting that I now work full time on Rust, a systems programming language that didn’t exist when Valgrind was created, but which basically prevents all the problems that Memcheck detects.

Just like Ada has been doing since 1983.


My understanding is that dynamically freeing memory is an unsafe operation in Ada, do I have that right?


Depends on which dynamic memory you are talking about.

Ada can manage dynamic stacks, strings and arrays on its own.

For example, Ada has what one could call type safe VLAs, instead of corrupting the stack like C, you get an exception and can redo the call with a smaller size, for example.

As for explicit heap types and Ada.Unchecked_Deallocation, yes if we are speaking about Ada 83.

Ada 95 introduced controlled types, which via Initialize, Adjust, and Finalize, provide the basis of RAII like features in Ada.

Here is an example on how to implement smart pointers with controlled types,

https://www.adacore.com/gems/gem-97-reference-counting-in-ad...

There is also the possiblity to wrap heap allocation primitives with safe interfaces exposed via storage pools, like on this tutorial https://blog.adacore.com/header-storage-pools

Finally thanks to SPARK, nowadays integrated into Ada 2012[0], you can also have formal proofs that it is safe to release heap memory.

In top of all this, Ada is in the process of integrating affine types as well.

[0] - Supported in PTC and GNAT, remaining Ada compilers have a mix of Ada 95 - 2012 features, see https://news.ycombinator.com/item?id=27603292


That said, I still use valgrind because we have to integrate C libraries sometimes (libpcl is my favorite culprit, only because I'm trying , and there's still possibility to blow the stack (yeah you can use gnatstack to get a good idea of your maximum stack size, but it's doesn't cover the whole Ada featureset and stack canaries - fstack-check don't catch everything.

edit Also massif, call/cachegrind and hellgrind have saved our bacon many, many times.

Even more interesting is writing your own tools with valgrind. Here https://github.com/AdaCore/gnatcoverage/tree/master/tools/gn... is the code of a branch-trace adapter for valgrind (outputs all branches taken/not-taken in 'qemu' format). Very useful if you can run a pintool or Intel Processor Trace just for that.

And if you keep digging, the angr symbolic execution toolkit use (used?) VEX as an intermediate representation. end of edit

Ada doesn't catch uninitialized variables by default (although warnings are getting better). You can either go Spark 'bronze level' (dataflow proof, every variable is initialized) or use 'pragma Initialize_Scalars' combined with -gnatVa.

Some of these techniques described in that now old blog post full of links https://blog.adacore.com/running-american-fuzzy-lop-on-your-... (shameless plug) where one can infer that even proof of absence of runtime errors isn't a panacea and fuzzing still has its use even on fully-proved SPARK code.


When we moved to Linux, Valgrind was THE tool that saved our as*s day after day after day. An issue in production? rollback, valgrind, fix, push, repeat. Thank you for all the hard work, in fact i don't i can thank you enough.


It's unfortunate that so many of these great tools (like `perf` and I believe `valgrind`) are basically not available locally on the Mac.

And running in a container is not really a solution for most of these.


Sanitizers and electric fence are ultra portable, they're definitely available on macos. The feature set from valgrind is a bit richer but not by much.


Valgrind does a lot of low level trickery so it hasn’t always supported the latest macOS releases straight away (or sometimes would support them with serious gotchas/limitations)


I am not familiar with electric fence but I remember from my experience that there are definitely important things that I got from `perf` and `valgrind` that the alternative sanitizers did not provide. Can't recall what now of course.


asan/ubsan do not detect uninitialized memory reads (though ubsan can detect when bools take on invalid bit patterns from uninitialized memory), and msan requires rebuilding the standard library or something, so I've never used msan. Valgrind is slow, but detects uninitialized memory reads properly, and doesn't require rebuilding the app (which is useful when running a complex or prebuilt app for short periods of time).

On the topic of profiling, callgrind can count exact function calls and generate accurate call graphs, which I find useful for not only profiling, but tracing the execution of unfamiliary code. I just wish rr had similarly fast tooling (pernosco is close enough to be useful, but I think there's value in exploring different workflows than what they picked).


>msan requires rebuilding the standard library or something

Yes, which is a PITA. But even then, macOS is not supported anyway:

https://clang.llvm.org/docs/MemorySanitizer.html#supported-p...


valgrind is available on mac. From the homepage: "It runs on the following platforms: (...) X86/Darwin and AMD64/Darwin (Mac OS X 10.12).". There's a notable omission of ARM64/Darwin in there, and I don't think it's an oversight.

What Mac is definitely lacking, though, is reverse debugging. Linux has rr, Windows has Time Travel Debugging. macOS still doesn't have an equivalent.


Valgrind, as I understand it, was essentially maintained by one engineer at Apple who has since left the company, so nobody has really updated it.


That's my understanding too, and I believe you're referring Greg Parker:

http://www.sealiesoftware.com/valgrind/


He's still at Apple, but he works on the Swift runtime these days rather than C/C++ tooling.


There have been 6 major releases since 10.12 (which was from late 2016). In other words, valgrind has basically stopped supporting macOS.


I don't think it means it doesn't work with newer versions.


I'm afraid you're wrong. It does not work with newer macOS versions, I've tried.


One problem with Valgrind is that the thing you're debugging should have been tested with Valgrind from the start, otherwise you're just going to be flooded with false triggers.

Now imagine that you're developing a new application and you want to use some library, and it hasn't been tested with valgrind and generates tons of false messages. Should you then use it? Or look for an alternative library?


I see the article mentions Solaris, an OS that I am very familiar with, which had me thinking about the memory corruption detection Solaris offerred. Among the development features Solaris supported were two memory corruption checking libraries (libumem, watchmalloc) that could easily be used without have to recompile binaries to link with them. Libumem had support for detecting memory leaks, buffer overruns, multiple frees, use of uninitialized data, use of freed data, etc... but it could not detect a read past an allocated buffer which is where watchmalloc came in handy. To use either with an executable binary was as easy as:

$ LD_PRELOAD=libumem.so.1 <executable filename>

I found a lot of memory corruption bugs using libumem in particular including some in MIT Kerberos that were severe enough to be considered security vulnerabilities. Sadly, Solaris is now in support mode thanks to Ellison and friends at Oracle.


Valgrind is fantastic.

Memcheck decreases the memory safety problem of C++ by about 80% in my experience - it really is a big deal. The compiler-based tools that require recompiling every library used are a bit impractical for large stacks such as the ones under Qt-based GUI applications. Several libraries, several build systems. But I hear that they are popular for CI systems in large projects such as web browsers, which probably have dedicated CI developers. There are also some IME rare problems that these tools can find that Memcheck can't, which is due to information unavailable in compiled code. Still, Memcheck has the largest coverage by far.

Callgrind and Cachegrind give very precise, repeatable results, complementary to but not replacing perf and AMD / Intel tooling which use hardware performance counters. I tend to use all of them. They all work without recompiling.


Not sure if it was still doing it in 2001, but in the 1997-1998 time-frame Purify also ran on HP-UX. The company I was working for at the time used it and we ended up finding a two-byte (IIRC) leak in the HP gethostbyname() library call (well, at least I think it was gethostbyname, it's more than two decades ago).

That was one of the more annoying tickets to file. We could of course send them the binary, but it would not run without the Purify license file, and we weren't comfortable to send off the license file as well. But, in the end, they accepted the bug. Not sure if there was every any fix, though.


Using Cachegrind to get hardware independent performance numbers (https://pythonspeed.com/articles/consistent-benchmarking-in-...)

Also used by SQLite in their performance measurement workflow(https://sqlite.org/cpu.html#performance_measurement)


First of all congratulations to Valgrind and the team behind it! This is an essential tool that help me personally over the years while developing.

What needs to be done to get Valgrind binaries available for MacOS (M1) ?, from a company perspective we are happy to support this work. If you know who's interest and can accomplish this pls drop me an email to eduardo at calyptia dot com.


I still use Valgrind memcheck for memory leak verification of a large piece of code I have developed, with a long end-to-end test.

Also, it has a nice integration with Eclipse which reflects the Valgrind memcheck output to the source files directly, enabling you to see where problems are rooted.

All in all, Valgrind is a great toolset.

P.S.: I was pronouncing Valgrind correctly! :)


> I still use Cachegrind, Callgrind, and DHAT all the time. I’m amazed that I’m still using Cachegrind today, given that it has hardly changed in twenty years. (I only use it for instruction counts, though. I wouldn’t trust the icache/dcache results at all given that they come from a best-guess simulation of an AMD Athlon circa 2002.)

I'm pretty sure I've seen people using the icache/dcache miss counts from valgrind for profiling. I wonder how unreliable these numbers are.


https://sqlite.org/cpu.html#microopt -

Cachegrind is used to measure performance because it gives answers that are repeatable to 7 or more significant digits. In comparison, actual (wall-clock) run times are scarcely repeatable beyond one significant digit [...] The high repeatability of cachegrind allows the SQLite developers to implement and measure "microoptimizations".

There's a bunch of ways for caches to behave differently but have they changed much over the past 20 years? i.e. is the difference between [2022 AMD cache, 2002 AMD cache] significantly greater than the difference between [2002 PowerPC G4 cache, 2002 AMD cache, 2002 Intel cache] ?


I would guess yes, just based on the L1/L2 (later L3) use and sizing between all those systems. 2002 vs 2022 is K8 vs 5800X3D for AMD, so you're looking at having 1 core and 64+64KB of L1 cache, 512KB of L2 cache[1] vs 8 cores (+ht) and 32+32KB L1 per core, 512KB L2 per core, 96MB L3.

Just managing the cache access between L2 and L3 I think would be additional consideration, but then you have to consider the actual architectural differences and on server chips locality will matter quite a bit.

[1]: https://en.wikipedia.org/wiki/Athlon_64


I don't know how sophisticates the streaming/prefetch/access pattern prediction the 2002 cpus did was.

I'm speculating, but if that's not modeled, cachegrind may pessimize some less simple predictable patterns and report a lot of expected misses when the cpu would have been able to prefetch it


Agreed, I suspect it'd be most accurate to say the SQLite folks are minimizing their working set.

I picked a couple of random performance commits out of their code repo, and they look like they might keep 1 or 2 lines out of i-cache: https://sqlite.org/src/info/f48bd8f85d86fd93 https://sqlite.org/src/info/390717e68800af9b


What other great tools are there in the vein of valgrind and AFL?


In my obviously biased opinion, very specialised, but sometimes exactly what you needed (I have used this in anger maybe 2-3 times in my career since then, which is why I wrote the C version):

https://github.com/tialaramex/leakdice (or https://github.com/tialaramex/leakdice-rust)

Leakdice implements some of Raymond Chen's "The poor man’s way of identifying memory leaks" for you. On Linux at least.

https://bytepointer.com/resources/old_new_thing/20050815_224...

All leakdice does is: You pick a running process which you own, leakdice picks a random heap page belonging to that process and shows you that page as hex + ASCII.

The Raymond Chen article explains why you might ever want to do this.


rr, for record and replay

I'm also a fan of systemtap, for when your probing problems push into peeking at the kernel


Seconding `rr` as suggested by @tux3, it's great for debugging.

Also, the sanitizers for GCC and Clang (https://github.com/google/sanitizers), and the Clang static analyzer (and tidy too) through CodeChecker (https://codechecker.readthedocs.io/).

For the Clang static analyzer, make sure your LLVM toolchain has the Z3 support enabled (OK in Debian stable for example), and enable cross translation units (CTU) analysis too for better results.


Starting to stretch, but would have to pick strace next. Can't believe macOS devs don't get to use it (at least without hoops like disabling SIP).


Are people using Valgrind on Python packages?

It seems some packages (even basic ones) are not compatible with Valgrind, thereby spoiling the entire debugging experience.


I live not far from Valgrindvegen (Valgrind road); I've always wondered whether the developers knew it existed. :-)


Is Valgrind any use in Rust?


I work full-time with Rust, use it all the time to see how much memory is being allocated to the heap, make a change and then see if there's a difference, and also for cache misses:

valgrind target/debug/rustbinary

==10173== HEAP SUMMARY:

==10173== in use at exit: 854,740 bytes in 175 blocks

==10173== total heap usage: 2,046 allocs, 1,871 frees, 3,072,309 bytes allocated

==10173==

==10173== LEAK SUMMARY:

==10173== definitely lost: 0 bytes in 0 blocks

==10173== indirectly lost: 0 bytes in 0 blocks

==10173== possibly lost: 1,175 bytes in 21 blocks

==10173== still reachable: 853,565 bytes in 154 blocks

==10173== suppressed: 0 bytes in 0 blocks

==10173== Rerun with --leak-check=full to see details of leaked memory

valgrind --tool=cachegrind target/debug/rustbinary

==146711==

==146711== I refs: 1,054,791,445

==146711== I1 misses: 11,038,023

==146711== LLi misses: 62,896

==146711== I1 miss rate: 1.05%

==146711== LLi miss rate: 0.01%

==146711==

==146711== D refs: 793,113,817 (368,907,959 rd + 424,205,858 wr)

==146711== D1 misses: 757,883 ( 535,230 rd + 222,653 wr)

==146711== LLd misses: 119,285 ( 49,251 rd + 70,034 wr)

==146711== D1 miss rate: 0.1% ( 0.1% + 0.1% )

==146711== LLd miss rate: 0.0% ( 0.0% + 0.0% )

==146711==

==146711== LL refs: 11,795,906 ( 11,573,253 rd + 222,653 wr)

==146711== LL misses: 182,181 ( 112,147 rd + 70,034 wr)

==146711== LL miss rate: 0.0% ( 0.0% + 0.0% )


Not used it with Rust, but have used it with OCaml, Perl, Ruby, Tcl successfully. In managed languages it's mainly useful for detecting problems in C bindings rather than the language itself. Languages where it doesn't work well: Python and Golang.


Depends how much unsafe code blocks you make use of.


ive used valgrind quite extensively. a big thank you to the folks behind this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: