
Time Travel Debugging - clouddrover
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-overview
======
timmisiak
I'm the dev lead for WinDbg and the windows debugging platform, and I'm a
former dev working on time travel debugging (I wrote the cpu emulation for the
project). Happy to answer any questions that anyone has.

Since I'm "posting too fast", let me put my answers here for now:

qod: Right now you can attach TTD to a process after initialization (or any
point in time prior to the repro), but you can't trace starting from a
breakpoint. We'd love to add that functionality though, so I'm sure you'll see
it in a future version of the tool.

Glad you like the new WinDbg! It's been very polarizing (which you can see
just reading the comments on this post!), so it's good to hear the positive
feedback sometimes :)

We've got a ton of plans to make debugging even faster and more effective in
WinDbg, so hopefully we'll win more people over as we make our tools easier to
use and more powerful.

~~~
qod
First I would just like to say how impressed I am with the new WinDbg and TTD,
this is fantastic work and a huge improvement on the existing debugging tools.

My question is: if I want to start taking a TTD trace after hitting a
particular breakpoint in WinDbg, can this be done with the current tooling?
Use case would be if there is a large amount of initialization code that
doesn't need to be traced because the bug or crash is known to occur at a
later stage in the program.

~~~
timmisiak
Right now you can attach TTD to a process after initialization (or any point
in time prior to the repro), but you can't trace starting from a breakpoint.
We'd love to add that functionality though, so I'm sure you'll see it in a
future version of the tool.

Glad you like the new WinDbg! It's been very polarizing (which you can see
just reading the comments on this post!), so it's good to hear the positive
feedback sometimes :)

We've got a ton of plans to make debugging even faster and more effective in
WinDbg, so hopefully we'll win more people over as we make our tools easier to
use and more powerful.

------
kuwze
There is also undodb[0] and gdb supports reversing[1]. There is also the rr[2]
project from Mozilla. All of those support Linux.

I also recently discovered WDD[3] (which is for Windows) as well.

[0]: [https://undo.io/](https://undo.io/)

[1]:
[https://www.gnu.org/software/gdb/news/reversible.html](https://www.gnu.org/software/gdb/news/reversible.html)

[2]: [http://rr-project.org/](http://rr-project.org/)

[3]: [https://github.com/ipkn/wdd](https://github.com/ipkn/wdd)

~~~
timmisiak
I believe all of those projects are based on single-core debugging.
Restricting the trace to a single core is one way of introducing determinism
in the process as a way of allowing replay later. Microsoft time travel
debugging is a multi-core technology, which allows you to capture interactions
between threads in multi-threaded processes.

Edit: Since I'm still "posting too fast", let me respond here to clarify. Both
rr and undodbg support multiple threads, but do not record them
simultaneously. They restrict execution to a single core. That's what I mean
by recording interactions between threads. For instance, if you have shared
memory between processes, that won't work with rr/undodb generally (although I
know at least rr can work around this by recording both processes).

I'm not saying one is intrinsically better than the other since there are
tradeoffs with both approaches. There is a high constant overhead for
recording all cores, but it does have the advantage of scaling well to large
numbers of cores.

~~~
khuey
Yes, there is definitely a significant tradeoff here. If you're recording
something like Exchange Server having true multi-core execution is a big deal.
If you're recording a web browser it's not as important, because most things
are still gated on the UI thread. Hopefully y'all can get the constant factor
overhead for recording down further over time.

With rr and Firefox we found that with sufficiently pathological scheduling we
can eventually tease out many races and other bad thread interactions. This
became rr's "chaos mode". I think the biggest drawback of not having true
multi-core support in rr today is simply the performance hit that a parallel
program will take during recording.

------
JoshTriplett
Here's a review from the Mozilla folks:
[http://robert.ocallahan.org/2017/10/thoughts-on-
microsofts-t...](http://robert.ocallahan.org/2017/10/thoughts-on-microsofts-
time-travel.html)

------
RyanRies
Here is a recent CppCon video:
[https://www.youtube.com/watch?time_continue=3389&v=l1YJTg_A9...](https://www.youtube.com/watch?time_continue=3389&v=l1YJTg_A914)

I'm an escalation engineer at Microsoft, and we have been using this for many,
many years to solve complex customer cases. Having it go public is a pretty
big deal for us.

------
mncharity
For more than two decades now, I've been using the limited availability of
reversible debugging as an example of just how dysfunctional software
engineering is as a field. The shoemaker's family's tattered feet.

Reversible debugging been around for six decades (eg EXDAMS on Multics), been
generally practical for two or three, been available in gdb for one, and now,
finally, here we are. It's taken two bloody human generations to get this
close to widely deploying TTD.

So yay. Maybe I can now retire TTD as an example of glacial progress. There's
no shortage of others. Sigh.

~~~
roca
As far as I can tell EXDAMS was never implemented. The RAND EXDAMS memo is
more of a thought experiment than anything else.

Getting reverse-execution debugging to work adequately at scale is difficult.
gdb's approach doesn't scale at all. I dispute that reversible debugging has
been "generally practical for two or three decades"; as far as I know the
first practical products were UndoDB and VMWare.

~~~
mncharity
> As far as I can tell EXDAMS was never implemented. The RAND EXDAMS memo is
> more of a thought experiment than anything else.

The memo[1] mentions

> Since EXDAMS is currently being debugged and is not operational, no
> performance statistics are available. [p12]

Suggesting implementation was attempted, though perhaps not successfully?

[1] "EXDAMS: extendable debugging and monitoring system" Memo RM-5772-ARPA
APRIL 1969 [http://www.dtic.mil/get-tr-
doc/pdf?AD=AD0686373](http://www.dtic.mil/get-tr-doc/pdf?AD=AD0686373)

------
dmitrygr
We at VMware had this working and released in 2008. We supported windows and
Linux. We even supported things like network applications that had lots of
outside state. It was called replay debugging and besides Mozilla, few used
it. Sadly it was killed when workstation 8.0 came out.

~~~
pmoriarty
_" Sadly it was killed when workstation 8.0 came out."_

What ever happened to the intellctual property for that? Any chance it could
be open sourced?

~~~
dmitrygr
Internal politics.

------
arbesfeld
Very cool stuff! Microsoft also developed a TTD system for JavaScript/Node.js:
[https://www.microsoft.com/en-us/research/publication/time-
tr...](https://www.microsoft.com/en-us/research/publication/time-travel-
debugging-javascriptnode-js/)

At LogRocket ([https://logrocket.com](https://logrocket.com)) we're building a
TTD of sorts for recording production users in your web app. While we can't
replay exact code, we can get you most of the way there with DOM
recording/logs/network waterfall.

------
revelation
Oh my god they redesigned WinDbg with a Microsoft Office UI lookalike starter
kit including ribbons. And at least one of the screenshots showed JavaScript.

So many fingers crossed hoping they don't ruin this indispensable basic tool.
Every platform needs to have a no-nonsense debugger that doesn't fall over
immediately when it has no symbols and source. WinDbg (was?) that.

~~~
asveikau
I like the 90s UI of old windbg too. Maybe stick to cdb.exe to avoid bells and
whistles.

------
alexashka
Looks very nifty. Is this being developed for Windows only?

------
DSingularity
Mozilla has a similar tool, check it out:

[https://github.com/mozilla/rr](https://github.com/mozilla/rr)

------
jasonwelk
This is cool stuff to see in the world of C++. It's great to see language
communities feed off each other and drive innovation like this.

~~~
cma
I think it has been in gdb before Swift.

~~~
userbinator
Yes:
[https://www.gnu.org/s/gdb/news/reversible.html](https://www.gnu.org/s/gdb/news/reversible.html)

This has approximately nothing to do with C++ or even "language communities".

~~~
jasonwelk
I can't speak for Microsoft itself, but wouldn't the effort to implement a
production-ready time-traveling debugging system (as explained here by
SteveJS) be at least somewhat influenced by the growing popularity of time-
travel debugging in other languages?

~~~
zokier
What other languages got timetravel first? The biggest I've heard is rr from
2014, which is for native code or basically C/C++.

~~~
jasonwelk
One language that has popularized it in very recent years is Elm. It's
mentioned everywhere.

------
lj3
Reminds me of Qira. [http://qira.me/](http://qira.me/)

------
libeclipse
Just waiting for the rust/lisp squad to come along and tell us that we
wouldn't need all this if we just joined them.

~~~
roca
I'm a notorious Rust fan and also the lead author of rr :-).

------
userbinator
_TTD is efficient and works to add as little as possible overhead as it
captures code execution in trace files._

...and then in the table below the screenshot...

 _Large overhead at record time. May collect more data that is needed. Data
files can become large._

Based on past experiences with debugging using tracepoints etc., I'm more
inclined to believe the latter.

Also, as a side-note, the "more modern visuals" of "WinDbg Preview" look
horrible. It's a debugger, not a toy for the barely-computer-literate. Those
who want the "friendly experience" will use Visual Studio instead, which
amusingly enough continues to have menus instead of the disgusting ribbons as
of the latest 2017 version. The screenshot also shows _two_ lines of slightly-
misaligned "Command" "Memory" "Source". Yuck.

(For those who don't understand, this what the original WinDbg looks like ---
simple and functional:
[http://sandsprite.com/blogs/images/main_ui.png](http://sandsprite.com/blogs/images/main_ui.png)
)

~~~
timmisiak
The overhead varies depending on how cpu/IO bound the application is. IO isn't
really affected, so IO bound applications tend to not see a big slowdown. In
theory, you could see a very large slowdown in the worst case, but in the
average case for a "medium sized" process the slowdown would be noticeable but
not affect the usability. This technology isn't based on tracepoints, it's
based on in-process cpu emulation. The emulation overhead is on the order of
10-20x in many cases, whereas tracepoint overhead would be on the order of
1000x I believe (maybe worse).

If there is something specific you dislike about the visuals of WinDbg
Preview, let us know through the feedback hub or emailing
windbgfb@microsoft.com. We realize that folks that have been using WinDbg for
20 years are likely to not be interested in a new UI, but we face 20 years of
legacy code every time we want to add a new feature to the UI. As an example,
the new WinDbg UI has a javascript window for writing scripts that extend and
automate the debugger. It took us approximately 8x less time to implement in
WinDbg Preview than what we estimated it would cost in the legacy UI (and a
much more junior dev was able to do it as well). We want to innovate without
disrupting folks that have effective workflows in WinDbg, so we really want to
hear feedback on the new UI. If there are specific things that we can change
to make you more efficient in the new UI, please let us know.

~~~
userbinator
_We realize that folks that have been using WinDbg for 20 years are likely to
not be interested in a new UI, but we face 20 years of legacy code every time
we want to add a new feature to the UI._

You can rewrite the UI code so it's easier for you to work on internally, but
to the user it looks and acts like it was before --- but proceed
cautiously[1]. I'm not against adding features and TTD sounds extremely
useful, but having in the overview page some slight contradictions and a
screenshot of a dumbed-down UI with obvious WTFs like the two duplicate lines
_really_ soured the first impression for me. (Looking at it again, I now see
the path in the titlebar has been cut off, despite plenty of empty space after
it...)

[1] [https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

~~~
timmisiak
What duplicate lines are you talking about?

The debugger isn't written from scratch, just the UI. All of the underlying
functionality is essentially the same, just in a more usable shell, and if you
collapse the ribbon and retheme the UI to look like the 90s, you could almost
squint and think it was the old WinDbg. The change is clearly very polarizing,
but we're nearly at parity with what you could do in the old WinDbg UI, and
we've already been able to give people features that we could have never
dreamed of supporting in the old UI (not for lack of trying). The JavaScript
support is just one example.

(Also, the cut-off title bar was an issue in the Fluent.Ribbon component we
use, and I think it's fixed in an updated version that we're taking soon)

~~~
bluem-ap
> What duplicate lines are you talking about?

I'd guess that userbinator is referring to the ribbon items "Command",
"Memory", and "Source" where the group name on the ribbon has been given the
same name as the single menu below it (in contrast to something like Word
where the contextual ribbon items for a table are Table, with Design and
Layout as children).

~~~
timmisiak
Oh OK, that makes sense then. Yeah, that's been fixed in current internal
builds and will go out on the store soon.

