Hacker News new | past | comments | ask | show | jobs | submit login
Debugging in the Multiverse (antithesis.com)
201 points by wwilson 8 days ago | hide | past | favorite | 58 comments





Would love to hear a technical comparison between this and King et al.'s classic paper on Time-Traveling VMs from USENIX ATC 2006: "Debugging operating systems with time-traveling virtual machines" (https://www.usenix.org/legacy/events/usenix05/tech/general/k..., 505 citations).

Seguing to talks regarding literal time traveling VMs, I'm reminded of Damian Conway's "Temporally Quaquaversal Virtual Nanomachine Programming In Multiple Topologically Connected Quantum-Relativistic Parallel Spacetimes... Made Easy!" presentation.

Essentially, it involves a series of sci-fi concepts, and then showing the kind of program (in modified perl) that someone might write to take advantage of those capabilities.


Does anything like the Antithesis hypervisor exist as open source?

The closest I've seen is Qemu record/replay, but that's very slow (no KVM acceleration, no multicore), and broken in current Qemu versions (replayed system just gets stuck).


https://github.com/facebookexperimental/hermit but it hasn't worked for me and is now unmaintained.

There's languages that support time travel debugging, like RR for GDB, or smalltalk, but no open source system wide thing like Antithesis that I know of yet.

rr can record process trees; i.e. basically any part/descendant of a process you spawn will be recorded and can be replayed (userspace CPU & memory, that is); won't record the entire OS though.

My experience with RR is that the chance of it working without hitting a missing syscall or desync is only about 50%, which is why I want a different solution that doesn't rely on the fragile syscall recording approach.

Huh. In my experience it works nearly flawlessly, certainly far above 50%. And even when there are spurious failures in replaying it's easy enough to just re-replay (though I do wish there was some way to export the current position & checkpoints with instruction-level precision to import in a fresh replay). I suppose it depends massively on the recorded program (most of mine are simple C programs, but also a decent bit of Java for JIT inspection or FFI, and I've also recorded an Electron app a couple times, and for fun Factorio)

Same, I haven't had it have too many problems but it's not perfect & missing support for io_uring is a problem (they'll add it eventually I suspect once someone ponies up the money for it).

It is really interesting to me that this sort of thing didn’t come from programming language folks like I’d expect. You’d think PLs are in the absolute perfect spot to implement things, because they define the semantics and runtime. And there are a few PLs who have time-travel demos, but they’ve never really been seen as more than a cool tech demo.

Perhaps the language is too small a vantage point to really get into what’s happening when debugging.


From the little I have seen, most programming language folks don't seem to care much about debugging. They care a lot about bugs not happening in the first place, which is good, testability is sometimes taken in consideration, but not much about what to do after a bug happened.

No language will prevent you from misimplementing the specs, but languages can be designed in such a way that it easy to trace back why the button is green and not red.

It seems like those who are the most serious about debugging are from the video game industry. They get all the cool stuff with time travel, hot reload, etc... So much that I expected to see something about video games, and was surprised it wasn't.


Coming out of the games industry, I am constantly amazed by how rarely people outside of games use debuggers. And, how slow they are to debug everything because of that...

You’re not wrong, but I do feel like the distributed nature of a lot of systems today is a big reason for that.

And Antithesis is (theoretically) one solution to that, which is very neat!

(I know modern games are also often operating in a distributed environment, but to generalize very broadly, there’s a lot more happening “in one place”)


> No language will prevent you from misimplementing the specs

If the spec is written in the language itself, then some languages certainly will.

See Lean, Rocq, Isabelle, etc


Unfortunately we do not yet have a way to write reality in any of these languages, so there is always at-least-one-level of translation where errors in representation may creep in.

> Perhaps the language is too small a vantage point to really get into what’s happening when debugging

A little bit. The big thing that others are missing is that it's basically impossible for a PL to accomplish this. Antithesis is basically recording all the state including I/O, network I/O, all RNGS (including the OS) and the big one which everyone has trouble with which is time. So basically you don't need to set up your code and how it interfaces with its environment to be deterministic - you can run within a deterministic container instead which flips the problem on its head and makes it much easier. I'm sure there are tradeoffs. A noteable one is how expensive and slow this approach is vs making your code deterministic. But given how basically no one bothers to make their code deterministic and this is a drop-in solution for scenarios like that, it's really worth it. Additionally, unlike approaches like rr which offer similar capabilities, this is even more generic & not dependent on adding support for every OS interface (e.g. rr doesn't support io_uring yet but I believe antithesis would since it's running at the VM level)


I know time-travel debugging is very very close to Gilad Bracha's heart and something he was really hoping would make its way into Dart.

I don't know to what degree this is true for other language teams but one thing I've observed is that language designers, compiler people, VM people, and IDE/debugger people have more distinct cultures than you might expect. That can make it hard to ship features that cut across those domains. I think we've gotten a lot better at doing that kind of holistic design on the Dart team, but it took years of team-building to get there.


There's reverse debugging and then there's what antithesis does which is a deterministic guarantee of the state. So for example, if you rewind, you'll get the exact same disk & network I/O happening across each call. And it supports arbitrary OS operations whereas typically at the PL level you'll be left at the mercy of whichever OS APIs the PL chooses to support for recording (i.e. similar to rr in terms of what it'll be able to do). Often times, PLs don't even bother with recording state across OS calls since they don't actually know what are OS calls vs normal function calls.

Yeah, I can believe this. I've been working in static analysis for C++ for only 2 years or so, but balancing soundness, precision, and reasonable analysis time really does a number on being more holistic about how to approach these problems. Very much feels like your brain just cannot see other ways to reason about programs because it is so sunk into the current way.

I sometimes wonder if this sort of determinism is the sort of thing that is either designed in from the start of a system/PL, or you need near-hardware level control (like Antithesis).


Elm debugger did something like this, but it's much more limited in scope.

As does Vue - in 5+ years I've never used it

Emm.. so many tools are/were available for C and Java. But yeah, we need to reinvent wheels every so often

All I know of is rr :)

Yeah it's fun that we get to do this at the hypervisor level. This opens up time-traveling in systems where there's cross-machine or inter-process communication, which really widens what we're able to do.

(I work at Antithesis, if youre interested in chatting more once this thread has gone cold come join discord.gg/antithesis)


I've enjoyed reading many of the blog posts by Antithesis, really cool work.

I don't really see a fit for the automated testing product in our stack at the moment, but I would love to use a time traveling hypervisor that I can hop into whenever I'd like.

Currently, it seems your pricing is pretty focused on the automated testing service. Do you have pricing or plans that offer just the deterministic dev environment?


(antithesis employee here) We don't currently just offer the deterministic dev environment, but we do offer extended 30 day demos for prospects interested in trying out the tech and seeing how it works. If you're interested contact us directly! contact@antithesis.com

How do you handle side effects that interact with third party systems? In my own tests, I use network request mocks. Do you need to provide a test mode flag to indicate that mocks should be used?

Any third party service does need to be mocked or stubbed out. We have a partnership with Localstack that lets us provide very polished AWS mocks that require zero configuration on your part (https://antithesis.com/docs/using_antithesis/environment.htm...).

If you need something else, reach out and ask us about it, because we have a few of them in the pipeline.


I was once working in a company producing software / operating systems for smart cards (such as the chips on your credit cards). We developed a simulator for the hardware that logged all changes to registers, memory and other states in a very large ring buffer, allowing us to undo / step backwards through code. With RAM being large, those chips being slow, and some snapshotting, we were usually able to undo back to the reset of the card. That was a game changer regarding debugging the OS.

Antithesis employee here. Happy to jump in and answer any burning questions people might have about multiverse debugging.

Is the hypervisor multicore? How do you handle shared memory non-determinism? What is the runtime slowdown for shared memory multicore (lets say 16 cores if you need a concrete example) execution?

Found the answer in a different post [1]. The hypervisor and virtual machines are single-core only. The talk also indicates that all I/O operations need to be manually rewritten to use the instrumented mechanism, so it demands a highly paravirtualized guest OS. Logically, that means there are probably no cross-VM shared memory interfaces either. So, no shared memory and thus no need to deal with shared memory non-determinism.

This is just a standard replay engine from what I can tell.

[1] https://news.ycombinator.com/item?id=41501577


No, we don’t require any paravirtualization at all, and nothing needs to be manually rewritten. I’m not sure where you got that impression.

It also is not in any sense a replay engine. We don’t need to record anything except the inputs!


At timestamp 23:40 in the video by Alex Pshenichkin from 2024-06-10 it says data ingestion comes via VMCALL interactions. As such a call is literal nonsense if you are not virtualized, any such call inherently means you are using a paravirtualized interface. Now maybe FreeBSD has enough standardized paravirtualized drivers similar to virtio that you can just link it up, but that would still be paravirtualization solution with manual rewrites, just somebody else already did the manual rewrites. Has the fundamental design changed in the last 3 months?

This is exactly a replay engine (or I guess you could say replay engines are deterministic simulators). How do you think you replay a recording except with a deterministic execution system that injects the non-deterministic inputs at precise execution points? This is literally how all replay engines work. Furthermore, how do you think recordings work except by recording the inputs? That is literally how all recording systems designed to feed replay engines work. The only distinction is what constitutes non-determinism in a given context. At the whole hypervisor level, it is just I/O into the guest; at the process level, it is just system calls that write into the process; at the threading level, it is all writes into the process. These distinctions are somewhat interesting at a implementation level, but do not change the fundamental character of the solution which is that they are all a replay engine or deterministic simulator, whatever you want to call it.


> Let’s get more concrete. Let’s use this to solve a real problem. My server has crashed and its process has exited! No worries, I’ll just rewind time, attach a debugger to the process, and set a breakpoint or capture a thread dump:

Is this kind of stuff only possible in an Antithesis Environment?


Yes, unfortunately we have not figured out how to rewind time in the real world yet. When we do, there are a lot of choices I'm going to revisit...

... but the intro makes it sound like this system is valuable in investigating bugs that occurred in prod systems:

> I’ve been involved in too many production outages and emergencies whose aftermath felt just like that. Eventually all the alerts and alarms get resolved and the error rates creep back down. And then what? Cordon the servers off with yellow police tape? The bug that caused the outage is there in your code somewhere, but it may have taken some outrageously specific circumstances to trigger it.

So practically, if a production outage (where I think "production" means it cannot be in a simulated environment, since the customers you're serving are real) is caused by very specific circumstances, and your production system records some, but not every attribute of its inputs and state ... how does one make use of antithesis? Concretely, when you have a fully-deterministic system that can help your investigation, but you have only a partial view of the conditions that caused the bug ... how do you proceed?

I feel like this post is over-promising but perhaps there's something I just don't understand since I've never worked with a tool set like this.


(I work at Antithesis)

I think you're right that the framing leans towards providing value in prod issues, but we left out how we provide value there. I think you're also right that we're just used to experiencing the value here, but it needs some explanation.

Basically this is where guided, tree-based fuzzing comes in. If something in the real world is caused by very specific circumstances, we're well positions to have also generated those specific circumstances. This is thanks to parallelism, intelligent exploration, fault injection, our ability to revisit interesting states in the past with fast snapshots, etc.

We've had some super notable instances of a customer finds a bug in prod, recalls its that weird bug they've been ignoring that we surfaced a month ago, and then uses this approach to debug.

The best docs on this are probably here: https://antithesis.com/docs/introduction/how_antithesis_work...


This was my thinking as well. Prod environments can be extremely complicated and issues often come down to specific configuration or data issues in production. So I had a lot of trouble understanding how the premise is connected to the product here.

> Yes, unfortunately we have not figured out how to rewind time in the real world yet.

10 bucks says you get complaints for not implementing the "real world" feature.


The intro mentions that ordinarily, we have to pay a high upfront cost to record info that we might need to debug later.

> When we succeed at this, we collect huge volumes of logs “just in case” they provide some crucial clue, incurring equally huge storage costs.

The 'packets from the past' section says we can just retroactively decide what we should have recorded.

Doesn't that mean we're effectively recording everything always? What's the cost of this? Or is all of this under the assumption that we never have to debug something that happened outside of the simulation environment, e.g. in response to an actual in-bound request from a customer? If this is just saying we can afford to save everything in our development environment ... well in that context recording the logs probably wasn't a "huge storage cost" either, right? Or am I missing something basic here?


You're right that if you tried to do something like this using record/replay, you would pay an enormous cost. Antithesis does not use record/replay, but rather a deterministic hypervisor (https://antithesis.com/blog/deterministic_hypervisor/). So all we have to remember is the set of inputs/changes to entropy that got us somewhere, not the result of every system operation.

The classic time space tradeoff question: If I run Antithesis for X time, say 4 hours, do you take periodic snapshot / deltas of state so that I don't have to re-run the capture for O(4 hours) again, from scratch just to go back 5 seconds?

Yes! See Alex's talk here: https://www.youtube.com/watch?v=0E6GBg13P60

In fact, we just made a radical upgrade to this functionality. Expect a blog post about that soon.


Is there a way to take this for a test drive, without talking to sales :)

There's a binary analysis time travel debugger similar to this, Qira [0][1].

[0]: https://www.usenix.org/conference/enigma2016/conference-prog...

[1]: https://qira.me/


This looks very interesting! Is it possible to implement this in a node.js web app? Does it work with any build tool? How much latency does it add to a production server?

The simulation is a completely generic Linux system, so we can run anything (including NodeJS). If your build tool can produce Docker containers, then it will work with us.

We don't run this on your production server, but in the same simulation that we use to find your bugs. See also: https://antithesis.com/product/how_does_antithesis_work/


>Seems obvious enough. But maybe, just maybe, the brake lines were cut by somebody who wanted the driver dead. Or what if he was drugged? Can we distinguish that scenario from him being sleepy?

If this is prod, your job is going to be finding what combination of these things caused it this time.


Pretty much no software, even when run deterministically, is bijective. There are almost always cases where two different states map to the same state.

How does this tooling deal with that?


This makes the mapping "injective": https://antithesis.com/blog/deterministic_hypervisor/

The "onto" direction doesn't really matter.


How can it reverse time? Does it record a stack of every decision point?

You don’t need to reverse time if you can deterministically reproduce everything that led up to the point of interest. (In practice we save a snapshot of your system at some intermediate point and replay from there.)

Is this like UndoDB[0]?

[0]: https://undo.io/products/udb/


I know I'm taking the wrong thing from this - but I really struggle to read this site. Something about the contrast and aggro gradients.

Hey, designer here. Thank you for this feedback. Do you prefer dark or light theme usually? And do you find reading text on this background here https://antithesis.com/security/manifesto/ is any easier?

Is this designed to be run in production?

No, it is to test your system before the production



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: