Hacker News new | past | comments | ask | show | jobs | submit login
Reverse Execution in GDB 7 is imminent (gnu.org)
124 points by grogers on Sept 23, 2009 | hide | past | favorite | 40 comments



IIRC, The gdb team was previously talking implementing this by forking after every instruction. Does anyone know if the current functionality is being built on that model?

I guess the advantage of doing it that way is that the majority of the memory pages can be shared if the OS uses a copy-on-write virtual memory system.

This:

Breakpoints and watchpoints will work in reverse -- allowing you for instance to proceed directly to the previous point at which a variable was modified.

is going to absolutely killer functionality for reverse engineers.


No, it doesn't fork at all, it simply records the values of registers and memory that will be modified by each instruction, so that they can be restored later.


Very slick. I wonder if this upcoming release adds proper support for C++ templates. As is, I don't know of anything other than Visual Studio which makes debugging STL or Boost-heavy C++ code a reasonable task.


For that you can check out Project Archer:

http://sourceware.org/gdb/wiki/ProjectArcher


yawn check out the ODB (for Java) which allows backward and forward debugging.

http://www.lambdacs.com/debugger/

watch the author give a talk at Google: http://video.google.com/videoplay?docid=3897010229726822034


Ever since I saw Omniscient Debugger (ODB), I've wanted this feature integrated into my IDE. ODB was interesting but didn't seem to take off. I'm glad GDB is raising the debugger bar.

http://www.lambdacs.com/debugger/


I've been toying with the idea myself, I even considered writing a Valgrind plugin to keep a record of how values change in register and memory until I got scared enough of the complexity and enormity of the code.

I wonder how much memory will it take to run a "usefully large" program with state recording on? In theory you could compress some memory mutations pretty easily (like bzero()ing something) while other mutations would need more memory (writing single ints to various locations). I bet you could generate some nifty 10GB logs with this thing!


You're right of course. It consumes tens of bytes of ram per executed instruction. But computers have gigabytes of ram these days, so it all comes down to what you consider a "usefully large" program. Millions of instructions -- no problem. Billions -- can't do that. ;-)


I assume that this will only work if you've executed the program inside of GDB, not if you load a core file that was dumped earlier outside of GDB?


Thinking about it, that would be the ultimate hack. If the core file does indeed have all the regions mapped when the core was dumped, you can theoretically load them as before then back-step each instruction since they are fairly well defined.

The huge issue here is, the most probable reason for segfaulting was an unexpected value coming from the environment and it's impossible to know the environment the process was running in post-mortem.


Lots of state is lost! Like the prior values of variables.


Or, even simpler: you can't reverse an infinite loop without some prior knowledge of when (and how) you entered it.


Given that it has to record state changes somehow, that makes sense.


In fact, gdb can record the state changes and then store them into an "enhanced" core file, which can then later be used to debug the program in reverse.

It's rather like adding a "time" dimension to your core file. ;-)


This has great potential for security researchers.


Can you elaborate on this? Are talking about the fact that some silly bugs like buffer overflows will be fixed more easily?


I was approaching this from the opposite direction; writing and debugging shellcode will be more pleasant. Or rather, when I have spent time writing shell code, I wished for such a feature.


Till now, I believed this was impossible to do.


It's been possible for quite some time. Someone I vaguely know set up a startup back in 2006 looking to do just this: http://undo-software.com/


In 2006, I wanted to see if it was possible, so I wrote (in C++) a virtual CPU with 2 registers (A and Program Counter) and input (In/Out ports) and the basic idea was it'd log everything (per cycle) in a reversable form.

Unfortunately I was too lazy to finish it, but I think it would have worked. It was just a proof of concept.


As long as you run it forward once, you should be able to do reverse execution. What you consider "impossible" is taking a binary and executing it from its last instruction up to the first: that would be impossible, or at least ill specified, because a program could have many exit points, but only one entry point.


You believe incorrectly. Borland C++ IDE supported reverse execution back in early 90s.


Yes, but AFAIK that was only for a very limited duration trace buffer. You're right, it is basically the same thing, but gdb can now do it for millions of instructions.


VMware Workstation can do this for entire VMs. They call it "replay" IIRC.


Ocaml's debugger has a similar time travelling feature.


FWIW, Borland C++ had this on PC back in ~1994. It worked pretty well most of the time.


I remember that -- but I never knew any details. Do you? Did it have a fixed-size buffer for instruction trace? Did it perform well on reverse step into/over functions? How was the speed performance?


Do they need any substantial changes in DWARF to pull this off?


No changes in elf or dwarf.


Can this reverse through a seg fault?


Why wouldn't it be able to? A segfault is just a signal like any other.


I thought it couldn't be caught, but I checked the docs, and it seems it can.

Hmm, maybe I should put that in all my programs - then I won't ever have bugs from segfaults.

:)


Back in my student days long ago, a friend of mine who was working on a very difficult assignment hatched a plan to include a SIGSEGV handler that would print "NFS server not responding, still trying..." for use on the day when we had to demo the results to the TAs. The NFS system was pretty flaky and his hope was that the TA would move on to the next demo and get back to him, giving him time to fix whatever problem was happening. Not sure if he actually put it in though.


Back in my student days for DOS-based labs we just did

  char foo[640*1024];
  int main(int argc, char ** argv) { return 0; }
When ran, it made the OS say "Out of memory" and then it was lab assistant's headache to unload all that resident stuff that was sitting there and eating good 20-30% of available RAM. They typically opposed to doing that, so the TAs assumed the program actually worked. The end :)


I must be the only moron who used to be eager for others to see his programs. No teacher ever looked at it long enough to appreciate it though; I quickly established a reputation as a computer show off and the teachers were dismissive of me, spending more time with kid who actually had problems.


>> I quickly established a reputation as a computer show off

Hmm ... :)


Because the state of a process is undefined after a seg fault.


"undefined" doesn't mean it has no state. it does have some state. it's just that this state can't be predicted. it can still be recorded and used in debugging backwards.


2 comments about that: First, it must be well defined in order for the operating system to be able to do something about the process. It may not be consistent, predictable to userspace, or runnable, but it must be something possibly sensible. Therefore, it doesn't matter, the debugger can still inspect it, and the programmer can make sense of this state.

Second, the debugger generally catches the signal before it gets handled.


Finally!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: