While this looks like a nice idea on paper, I would not recommend to use the current implementation of 'maybe' on a system that hosts valuable data.
The tool seems to work by intercepting individual "blacklisted" system calls and then - instead of executing them - returning a nonsense value.
The issue is that this breaks every single POSIX spec and will therefore break any program that does more than a few trivial IO operations and relies on those operations to behave as specified.
So it might work for a simple demo case where a small script only does a single file modification and never checks the results, but for any serious program (think a database, a complex on-disk format or really anything that does network IO) this will lead to corruption and undefined behaviour as system calls will return erroneous success values or invalid file descriptors.
I think to actually make this work one would have to emulate the system calls and make sure everything stays POSIX compliant. Doing this correctly for calls like mmap might get tricky though (and won't be possible from within a python runtime). And even then it isn't obvious how something like network IO would be handled.
Obviously, that's why the Github description says:
> That being said, maybe should :warning: NEVER :warning: be used to run untrusted code on a system you care about! A process running under maybe can still do serious damage to your system because only a handful of syscalls are blocked. Currently, maybe is best thought of as an (alpha-quality) "what exactly will this command I typed myself do?" tool.
You've missed Paul's point I think. Even running trusted code is unlikely to work as desired (and could still have negative consequences due to what isn't included in the "sandbox") for anything other than trivial programs.
I understood. Obviously the author knows about these limitations, otherwise he wouldn't be able to write such a tool... but still it can be useful;
It's alpha, the tool can be used on trivial programs and you should be aware of its limitations.
My simple Perl todo script (either writing or reading a text file) causes the "maybe" program to fail while partially reading/outputting the todo archive file.
Yes, the current 'maybe' implementation should practically break almost any properly written IO code.
At this point, it will only work for the most trivial of demo cases. Still IMO it's a cool demonstration of the linux ptrace facility. And if the author implements the missing sandboxing/emulation layer in a future version and switches to whitelisting instead of blacklisting syscalls I think it could actually run a limited number of programs (forbidding stuff like mmap and network IO).
An absolutely cool project, but I fear the amount of work required to bring it out of the beta stage. Python seems to be a good choice for a fun project, but I do not see this project evolving much without resorting to lower-level languages.
In Plan9 one could create clean separate filesystem and run unsafe program there, or, make a filesystem server which logs each of ~20 api calls of 9p protocol, since they are all you have to manipulate files.
Why would anyone ever invoke something like this and expect maybe to somehow steal those pipe characters? It's almost meaningless and displays just enough know how that the author should know how a shell works.
Is the "right solution" to use `tee`? I saw that once, and it seemed like we should be able to do better -- as if, had `tee` not been in the standard we wouldn't have any way to do it...
The parent is more similar to doing `sudo find . | grep -- "*-foo.bar" | xargs rm` which I can definitely say I've never done because it's absurd. The consequences of redirecting to a file are usually nil, unlike running rm on the output of a pipeline without at least using -n on the destructive util...
It doesn't work this way, but my first expectation was that this would take an LVM2/ZFS snapshot and diff the file trees afterward. Then it'd be easily ported to Windows (VSS) and wouldn't have the subprocess issues on *BSD, but, the diff would be slower, unordered, and contain changes made by unrelated processes.
combining the ptrace approach for logging with mount namespaces, snapshotting or overlayfs would probably be more consistent approach if the program actually tries to use the files.
just stubbing out the system calls sounds like it'll quickly break down once the programs try to do something more complicated with the files.
Ideally, you want ad hoc sandbox (like Sandboxie on Win32)
Ideally, Docker could offer ad hoc commands to launch an process in a sandbox. And then you launch a file explorer process in the same sandbox (like Sandboxie) to inspect or run a diff-tool that outputs statistics like the tool in the headline.
libguestfs can be used to do the diffing. Create the initial disk, do stuff in VM, use libguestfs to do diff. I almost built a hacky omnibus packaging tool this way. I say almost because it turns out to be just as simple to just compile stuff from source and install in /opt and then package those files.
This is like Sandboxed Execution Environment https://github.com/F-Secure/see which was originally created for malware testing. And yes, we used guestfs.
I would have used something like this when I was first learning rsync...
Even though I had read all of the man pages and knew the commands inside and out, it still seemed incredibly risky and scary to me to run rsync with the --delete function when I was backing up my main USB drive for the very first time.
Basically my biggest fear was that I had source and destination mixed up, which I didn't, but it would have been nice to run a test trial of that command before doing so.
Instead of a control approach: having to confirm every little action in there, It would be nicer to have a undo-able behavior: I actually run the script on the first time, but I can easily undo it if it did not work as expected.
The current state of the art is of no use for me.
Yes, that is possible. As it says in the readme file:
> maybe should :warning: NEVER :warning: be used to run untrusted code on a system you care about! A process running under maybe can still do serious damage to your system because only a handful of syscalls are blocked. Currently, maybe is best thought of as an (alpha-quality) "what exactly will this command I typed myself do?" tool.
This seems neat, but from a security standpoint I'd much rather see a command which spawned a new VM with a copy of my current file-system.
I would then want to capture all disk and network i/o that the "maybed" command generated in the VM.
Even that wouldn't be that secure, because the command would still be able to send sensitive data out. You could intercept the network i/o, but that would cause most installers to fail.
Suggestion: Add a way to distinguish between parameters sent to maybe and parameters sent to the program. You don't accept any parameters at the moment, but you may want to in the future, and when you start wanting to do that, you may not want to break any existing usages by changing behavior. For that reason I think it would be good to introduce disambiguation now.
You can try it yourself by running "strace" on itself, e.g.:
strace -f strace -f ls
I guess this isn't possible because the kernel developers only envisioned ptrace to be useful for debugging (they supposedly didn't think about sandboxing applications), and implementing a "recursive" version of ptrace is probably more difficult.
A really cool idea. Is this more or less how rootkits work and hide themselves? A system call is made to list the contents of a directory, and the rootkit excludes itself from that listing?
My understanding is that support for the ktrace back-end is (/was/will be) upcoming for python-ptrace. Unfortunately I don't have the skills for that, but what I meant is that I hope someone will; then tools like `maybe` can presumably use python-ptrace in just the same way as other Unices?
ktrace doesn't exist anymore on OS X as far as I know. You need to use dtrace, which means you need to write dtrace programs. Python's dtrace interface would probably only let you load a dtrace program.
It might be possible to implement `maybe` as a dtrace program by instrumenting syscall entry and raising a signal or stopping the program immediately, which you can recover in your debugger (though I'm not sure if this actually stops the syscall). That said, I tried it and OS X doesn't seem to allow "destructive" dtrace actions even as root with SIP disabled:
$ sudo dtrace -n 'syscall::open:entry { stop(); }' -c 'cat Makefile'
dtrace: could not enable tracing: Destructive actions not allowed
$ sudo dtrace -n 'syscall::open:entry { raise(9); }' -c 'cat Makefile'
dtrace: could not enable tracing: Destructive actions not allowed
Alternately you can use dynamorio, intel pin, qemu, or a quick instruction scan/patch for SYSCALL to manually break on syscalls.
Either way you almost certainly will not be able to do this with python-ptrace alone. I filed an issue to write a `maybe` tool using Usercorn [1] (which supports OS X) with my VFS overlay work, which means writes could still succeed but be non-destructive.
Reliable software should definitely check returns codes, but maybe returns success. Inspecting the filesystem to verify the result is overdoing it, and race-y besides.
It's intercepting system calls with ptrace, so it could easily fool the process into thinking its calls did exactly what was expected. A cursory glance through the code indicates that it mostly does. However, it looks like it doesn't record changes to files so that they could be read back in by the application. So this will work for applications that read once and write once, but if the same file is written and read, it'll probably break.
This is what I immediately thought. This command should not be considered fail-proof. Even if you're fooling the caller returning successes when actually doing nothing, the trace displayed by this command can't be considered exactly what the program, running without being traced, would have done.
Sometimes I write a script that I am 99% sure that is correct. Let's say I write a regex that should delete files with a specific pattern. With maybe, I could run it and see the files it /actually/ intend to delete with a dry run.
I might not want to actually delete them without confirming it.
My favourite pattern for this use-case is to write script A that generates script B; instead of A invoking 'rm' directly, it runs 'echo rm', or even 'echo #rm'.
Then I open up script B in an editor and review the planned changes. Once I like what I see (potentially after a few edits) I run 'bash B'.
The tool seems to work by intercepting individual "blacklisted" system calls and then - instead of executing them - returning a nonsense value.
The issue is that this breaks every single POSIX spec and will therefore break any program that does more than a few trivial IO operations and relies on those operations to behave as specified.
So it might work for a simple demo case where a small script only does a single file modification and never checks the results, but for any serious program (think a database, a complex on-disk format or really anything that does network IO) this will lead to corruption and undefined behaviour as system calls will return erroneous success values or invalid file descriptors.
I think to actually make this work one would have to emulate the system calls and make sure everything stays POSIX compliant. Doing this correctly for calls like mmap might get tricky though (and won't be possible from within a python runtime). And even then it isn't obvious how something like network IO would be handled.