rr is insanely useful. There are some debugging patterns that commonly come up f...

db48x · on May 4, 2021

These are excellent examples. I’ll throw one of my stories into the pot:

One time I had a client that was using a large and complicated C++ library to build a larger and yet more complicateder C++ program. It had an intermittent crash that they just couldn’t track down. The stack traces they showed me were always deep in the bowels of the C++ library, in places no crash should ever be. The library was open source and widely used; I _knew_ that it didn’t just crash in any of those places.

I recorded it in rr a few times until I captured the crash. I set a watchpoint on the memory address of the crash, and ran the program backwards from the crash. (Memory addresses are stable in the replay, making this kind of thing super easy.) A few seconds later it stopped on the line that was responsible; turns out they were accidentally overwriting the method pointer to the deconstructor in the vtable of this one class. Eventually one of those objects would go out of scope on some random thread, and need to be deconstructed. BOOM. I checked a few of the recordings where it didn’t crash, and in those cases it had just overwritten something much less obvious, like a string containing HTML that it had downloaded.

My advice? First, always make sure your pointers are initialized correctly before you go dereferencing them. You won’t like what happens otherwise.

Second, learn rr. With rr, and a little brain–sweat, you can do in a couple of hours what an entire team of engineers couldn’t do in months. Admittedly they had a lot of things on their plate, but it’s definitely a superpower. I think this might have been the first time I ever ran rr, too; some of that time was getting rr built and installed. I think I even had to ask a question on the IRC channel, because there was a confusing error message.

Third, learn Rust. This was a few years back, when Rust was little more than a crazy idea. If the program had been written in Rust, there would have been a lot fewer landmines for them to step on. That’s a rather different kind of superpower. Incidentally, you can combine these superpowers… but that’s a story for another day.

ncmncm · on May 4, 2021

Compare the time spent using rr to track down the occasional failure against the sum of time spent waiting for the Rust compiler, every build, every day. I know which number is bigger, at least for the sort of code I work on.

In the last 10 years I have spent strictly more time preparing bug reports against compilers than on tracking down memory usage errors. So, while Rust solves a problem all C coders have, it is not a problem that modern C++ coders necessarily experience enough to justify the extra coding effort, build time, and tool maturity risk.

Rust, at this stage of maturity, is fun in a puzzle-solving sense, and modern, and enlightening. No one who learns Rust will regret the effort spent. The only serious risk is that it may take many months to restore one's habit of putting a terminating semicolon where C++ demands one but Rust does not. A shift in preference against designs requiring mutex locks may improve the performance of your C++ code.

db48x · on May 5, 2021

The real cost to my client was that for months or years their system had been crashing, and they could do nothing about it. They just had to live with it, rerunning jobs that had crashed. Maybe the compile times would have been unfortunate, but I think they would have come out ahead. On the other hand, they probably wouldn’t have had to hire me.

Speed isn’t everything: https://www.youtube.com/watch?v=2wZ1pCpJUIM

ncmncm · on May 5, 2021

I watched it (at 2x, 17 mins): The enthusiasm of the convert.

Values are in tension, but time is fungible. When you add up hundreds of hours waiting for very, very, very slow builds, you should wonder if maybe those hundreds of hours would be better spent elsewhere. If you have traded them for two hours of debugging, have you come out ahead?

(Is there any objective reason for the Rust compiler to be two orders of magnitude slower than a normal compiler? Maybe an alternative implementation strategy would help?)

It was telling when he posted his "values" of C++. Suffice to say, when you need to lie to make your case, the argument is already lost.

Your client lived with crashes because they couldn't be bothered to use valgrind, to use the address sanitizer, to use the UB sanitizer, to set a watchpoint in gdb? They didn't need rr.

There is no substitute for competence. If they had been competent to (re-?)write it in Rust, they would be more than competent to spend 0.1% as much time just fixing it, or (better) not coding the bug in the first place. But who could they hire to code it in Rust? There are orders of magnitude too few Rust coders for that to work.

Rust might be a way not to code bugs in the first place, but modern C++ is another way. It doesn't pretend to make bugs impossible, as Rust pretends, but it does remove temptation to bugs: bug-prone code is ugly, and better ways are equally fast. And, modern C++ is mature, fun, fast building, and can "#include" C headers for essential libraries unchanged.

db48x · on May 5, 2021

People frequently overstate the slowness of the Rust compiler. Sometimes it’s slower than you’d like, but when you investigate you find that you are doing specific things that are really hurting your compile times. Rust’s procedural macros are powerful, but can turn out to be surprisingly expensive. Breaking your code up into smaller crates can have an unexpectedly large beneficial effect on compile times as well. Etc.

Contrast this with C and C++ where the compile time for a large project is often dominated not by the compiler itself, but by running the linker at the end. For the project I mention here, a full rebuild took over an hour, with the linker taking a few minutes of that. A partial build, when you have changed only one file, took several minutes. Running the compiler on the one file that you changed took just a second or two, but running the linker takes the same amount of time in either case.

But yes, they certainly could have saved time and money if their builds had been faster.

> It was telling when he posted his "values" of C++. Suffice to say, when you need to lie to make your case, the argument is already lost.

Which value or values of C++ do you think he got wrong?

> There is no substitute for competence.

This is true, but I suspect that we disagree on the definition of competence. In my experience, using any of these tools, ever, puts you in a better situation than most programmers. Of course, programmers that use safe languages never need to run Valgrind, UBSan, or similar tools (because the language takes care of memory safety for them), so they’re ahead of the pack as well. But I would bet that 90% of programmers have never used a profiler either. I don’t think that we can call 90% of programmers incompetent simply because they’ve never used a profiler.

You might even say that even the existence of tools like Valgrind and UBSan is a pretty strong condemnation of C and C++.

The reason I recommend learning rr is not merely because it can help you find memory safety bugs in your C++ programs. If all you want to do is that, then there are other more specialized tools that will do the job.

rr is a _general purpose_ tool. It is a debugging system that has powerful features that can be used to debug _any_ problem your program exhibits. I did not at the time know that this bug was due to memory corruption. (Obviously since it was an intermittent crash in a C++ program, it was pretty high on my list of suspects.) I didn’t know that they had never run valgrind on the thing either. But because rr is a general purpose tool, I didn’t need any more information about the nature of the problem in order to find the bug and fix it.

I think that these powerful features should and eventually will be available in every debugger. Do you use pdb to debug your Python code? Then pdb should be able to record a Python program and replay that recording for debugging. Do you use your browser’s dev tools to debug Javascript? Then you should be able to trace the data flow of a value backwards in time to where it originated. Do you use desed to debug your sed scripts? Bashdb? Edebug? All of them will gain these features eventually. In the mean time, people should learn rr.

ncmncm · on May 5, 2021

The need for valgrind etc. is indeed a black eye for C. But, as I said, I have not found occasion to use it on C++ code.

Certainly there is plenty of C code, and also plenty of bad and un-modern C++ code, that could benefit from these tools. There is not much Rust code in existence. Imagine running the Rust compiler just once over an amount of Rust code commensurate with those bodies. How many core-millennia would that take? The mind boggles.

Familiarity with essential tools is certainly a prerequisite for competence. One who has not used a profiler might never have been asked to make a program faster. One who has not used valgrind uses some other means to discover the causes of problems; if they spend notably more time than using valgrind or other tools would have needed, that would mark incompetence.

It seems like rr could save many people a great deal of time. It would not have saved me much, this past decade, because it could save no more time than I did spend tracking down proximate causes of trouble.

In general, it is always much better to achieve correctness by construction, using tools that only produce correct or, at worst, easily diagnosed results. Such a toolset might include a compiler alone, but I have found a powerful language that enables good libraries, and such good libraries, yield the same benefit. You need good libraries anyway.

> Which value or values of C++ do you think he got wrong?

Srsly?

albinofrenchy · on May 5, 2021

This kind of thing would have been found by address sanitizer immediately though no? rr seems useful in it's own right but not for this kind of issue.

db48x · on May 5, 2021

Sure. In fact, once this bug was fixed I ran the same program through some tests with valgrind and found two more problems that were occurring less frequently. The difference is that with rr I can record and diagnose a specific crash. With valgrind I would have found three problems, fixed them, and the intermittent crashes would have gone away. On the one hand that’s good enough, but on the other hand I wouldn’t have found out that the weird crashes that couldn’t happen were because we were overwriting a vtable.

ynik · on May 4, 2021

What I like most: you can debug data flow!

To see where a value came from, set a memory-breakpoint and reverse-continue. This is not just useful when you suspect memory corruption (your point 2). I use it more often with a large legacy C code base. It outputs a complex data structure. There's often several code paths computing values for a particular field. For someone unfamiliar with the code base, it can be tricky to tell which code path was responsible for computing the value in a particular instance of the struct (where I see incorrect output). By reverse debugging with memory breakpoints, I can trace the data flow backwards until I find where the computation went wrong.

Note: Since I'm developing on Windows, I'm not using rr for that; but Microsoft's Time-Travel Debugger (WinDbg Preview).