Reverse Execution in GDB 7 is imminent

bigmac · on Sept 23, 2009

IIRC, The gdb team was previously talking implementing this by forking after every instruction. Does anyone know if the current functionality is being built on that model?

I guess the advantage of doing it that way is that the majority of the memory pages can be shared if the OS uses a copy-on-write virtual memory system.

This:

Breakpoints and watchpoints will work in reverse -- allowing you for instance to proceed directly to the previous point at which a variable was modified.

is going to absolutely killer functionality for reverse engineers.

msnyder · on Nov 2, 2009

No, it doesn't fork at all, it simply records the values of registers and memory that will be modified by each instruction, so that they can be restored later.

wanderingmarker · on Sept 23, 2009

Very slick. I wonder if this upcoming release adds proper support for C++ templates. As is, I don't know of anything other than Visual Studio which makes debugging STL or Boost-heavy C++ code a reasonable task.

grogers · on Sept 23, 2009

For that you can check out Project Archer:

http://sourceware.org/gdb/wiki/ProjectArcher

mc · on Sept 24, 2009

yawn check out the ODB (for Java) which allows backward and forward debugging.

http://www.lambdacs.com/debugger/

watch the author give a talk at Google: http://video.google.com/videoplay?docid=3897010229726822034

bokchoi · on Sept 23, 2009

Ever since I saw Omniscient Debugger (ODB), I've wanted this feature integrated into my IDE. ODB was interesting but didn't seem to take off. I'm glad GDB is raising the debugger bar.

http://www.lambdacs.com/debugger/

yason · on Sept 24, 2009

I've been toying with the idea myself, I even considered writing a Valgrind plugin to keep a record of how values change in register and memory until I got scared enough of the complexity and enormity of the code.

I wonder how much memory will it take to run a "usefully large" program with state recording on? In theory you could compress some memory mutations pretty easily (like bzero()ing something) while other mutations would need more memory (writing single ints to various locations). I bet you could generate some nifty 10GB logs with this thing!

msnyder · on Nov 2, 2009

You're right of course. It consumes tens of bytes of ram per executed instruction. But computers have gigabytes of ram these days, so it all comes down to what you consider a "usefully large" program. Millions of instructions -- no problem. Billions -- can't do that. ;-)

nathanb · on Sept 23, 2009

I assume that this will only work if you've executed the program inside of GDB, not if you load a core file that was dumped earlier outside of GDB?

yan · on Sept 23, 2009

Thinking about it, that would be the ultimate hack. If the core file does indeed have all the regions mapped when the core was dumped, you can theoretically load them as before then back-step each instruction since they are fairly well defined.

The huge issue here is, the most probable reason for segfaulting was an unexpected value coming from the environment and it's impossible to know the environment the process was running in post-mortem.

lallysingh · on Sept 24, 2009

Lots of state is lost! Like the prior values of variables.

duskwuff · on Sept 24, 2009

Or, even simpler: you can't reverse an infinite loop without some prior knowledge of when (and how) you entered it.

varjag · on Sept 23, 2009

Given that it has to record state changes somehow, that makes sense.

msnyder · on Nov 2, 2009

In fact, gdb can record the state changes and then store them into an "enhanced" core file, which can then later be used to debug the program in reverse.

It's rather like adding a "time" dimension to your core file. ;-)

yan · on Sept 23, 2009

This has great potential for security researchers.

ciupicri · on Sept 23, 2009

Can you elaborate on this? Are talking about the fact that some silly bugs like buffer overflows will be fixed more easily?

yan · on Sept 23, 2009

I was approaching this from the opposite direction; writing and debugging shellcode will be more pleasant. Or rather, when I have spent time writing shell code, I wished for such a feature.

muon · on Sept 23, 2009

Till now, I believed this was impossible to do.

adamt · on Sept 23, 2009

It's been possible for quite some time. Someone I vaguely know set up a startup back in 2006 looking to do just this: http://undo-software.com/

zandorg · on Sept 23, 2009

In 2006, I wanted to see if it was possible, so I wrote (in C++) a virtual CPU with 2 registers (A and Program Counter) and input (In/Out ports) and the basic idea was it'd log everything (per cycle) in a reversable form.

Unfortunately I was too lazy to finish it, but I think it would have worked. It was just a proof of concept.

mahmud · on Sept 24, 2009

As long as you run it forward once, you should be able to do reverse execution. What you consider "impossible" is taking a binary and executing it from its last instruction up to the first: that would be impossible, or at least ill specified, because a program could have many exit points, but only one entry point.

huhtenberg · on Sept 23, 2009

You believe incorrectly. Borland C++ IDE supported reverse execution back in early 90s.

msnyder · on Nov 2, 2009

Yes, but AFAIK that was only for a very limited duration trace buffer. You're right, it is basically the same thing, but gdb can now do it for millions of instructions.

kijiki · on Sept 24, 2009

VMware Workstation can do this for entire VMs. They call it "replay" IIRC.

woadwarrior01 · on Sept 23, 2009

Ocaml's debugger has a similar time travelling feature.

TwoBit · on Sept 24, 2009

FWIW, Borland C++ had this on PC back in ~1994. It worked pretty well most of the time.

msnyder · on Nov 2, 2009

I remember that -- but I never knew any details. Do you? Did it have a fixed-size buffer for instruction trace? Did it perform well on reverse step into/over functions? How was the speed performance?

wlievens · on Sept 25, 2009

Do they need any substantial changes in DWARF to pull this off?

msnyder · on Nov 2, 2009

No changes in elf or dwarf.

ars · on Sept 23, 2009

Can this reverse through a seg fault?

ori_b · on Sept 23, 2009

Why wouldn't it be able to? A segfault is just a signal like any other.

ars · on Sept 23, 2009

I thought it couldn't be caught, but I checked the docs, and it seems it can.

Hmm, maybe I should put that in all my programs - then I won't ever have bugs from segfaults.

:)

bwd2 · on Sept 23, 2009

Back in my student days long ago, a friend of mine who was working on a very difficult assignment hatched a plan to include a SIGSEGV handler that would print "NFS server not responding, still trying..." for use on the day when we had to demo the results to the TAs. The NFS system was pretty flaky and his hope was that the TA would move on to the next demo and get back to him, giving him time to fix whatever problem was happening. Not sure if he actually put it in though.

huhtenberg · on Sept 23, 2009

Back in my student days for DOS-based labs we just did

  char foo[640*1024];
  int main(int argc, char ** argv) { return 0; }

When ran, it made the OS say "Out of memory" and then it was lab assistant's headache to unload all that resident stuff that was sitting there and eating good 20-30% of available RAM. They typically opposed to doing that, so the TAs assumed the program actually worked. The end :)

mahmud · on Sept 24, 2009

I must be the only moron who used to be eager for others to see his programs. No teacher ever looked at it long enough to appreciate it though; I quickly established a reputation as a computer show off and the teachers were dismissive of me, spending more time with kid who actually had problems.

jhg · on Sept 24, 2009

>> I quickly established a reputation as a computer show off

Hmm ... :)

yan · on Sept 23, 2009

Because the state of a process is undefined after a seg fault.

cracki · on Sept 23, 2009

"undefined" doesn't mean it has no state. it does have some state. it's just that this state can't be predicted. it can still be recorded and used in debugging backwards.

ori_b · on Sept 23, 2009

2 comments about that: First, it must be well defined in order for the operating system to be able to do something about the process. It may not be consistent, predictable to userspace, or runnable, but it must be something possibly sensible. Therefore, it doesn't matter, the debugger can still inspect it, and the programmer can make sense of this state.

Second, the debugger generally catches the signal before it gets handled.

Aegean · on Sept 23, 2009

Finally!