
Show HN: Sysdig, a tool for Linux system exploration - degio
https://github.com/draios/sysdig
======
brendangregg
Impressive. Easy to get going, low overhead, powerful one-liners.

I like the filter syntax - would be nice for perf_events to pick this up.
Although, if it did, I hope that the stable filter fields API can be extended
with unstable arbitrary expressions as needed, for when dynamic probes are
used.

What perf_events realy lacks is a way for custom processing of data in kernel
context, to reduce the overheads of enablings. Eg, lets say I want a histogram
of disk I/O latency. sysdig has chisels, which look like they do what I want,
but from the Chisels User Guide: "Usually, with dtrace-like tools you write
your scripts using a domain-specific language that gets compiled into bytecode
and injected in the kernel. Draios uses a different approach: events are
efficiently brought to user-level, enriched with context, and then scripts can
be applied to them." Oh no, not user-level!

I tested this quickly, expecting DTrace's approach (which is the same as
SystemTap and ktap) to blow sysdig out of the water. But the results were
surprising (take these quick tests with a grain of salt). Here's my target
command, along with sysdig and DTrace enablings, and strace for comparison:

    
    
      Target: dd if=/dev/zero of=/dev/null bs=1k count=1000k
      sysdig: sysdig -c topfiles_bytes
      DTrace: dtrace -n 'syscall:::entry /execname == "dd"/ { @[probefunc] = count(); }'
      strace: strace -c dd ...
    

sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace
(for comparison), over 200x. This is a worst-case test, and if I'm willing to
slow a target by 2x then taking that to 4x doesn't make much difference. With
what I normally trace, the overheads are 1/100th of that, so DTrace is
negligible. The take-away here is that the overheads are closer to the
"negligible" end of the spectrum than strace's "violent" end. Which I found
surprising for user-level aggregation.

The Sysdig Examples could do with some sanity checking. Eg:

"See the top processes in terms of disk bandwidth usage sysdig -c
topprocs_file"

I saw:

    
    
      Bytes     Process
      ------------------------------
      134.65M   dd
      4.82KB    snmp-pass
      603B      snmpd
      332B      sshd
      220B      bash
      107B      sysdig
    

That's while my dd between /dev/zero and /dev/null was running. No "disk
bandwidth"! :)

edit: formatting

~~~
degio
Brendan, thanks for the feedback. It's really cool to hear comments like this
from someone like you. We really respect your work in the field.

Good catch on topprocs_file, we'll have to find a better name for it.

In terms of overhead, we put a lot of effort in it and, as you pointed out,
we're already extremely optimized. But we think we can do even better. For
example, we don't have any kind of kernel-level filtering yet. Coming soon! :)

~~~
SEJeff
Any chance you are working on getting this upstream? I noticed Greg KH as one
of the contributors

~~~
degio
I guess it's early to tell, but if the kernel folks don't object we would be
happy to work at including our driver in the kernel.

------
otterley
I had the privilege of early access to sysdig thanks to the developers. It's
not as powerful as SystemTap or DTrace but it is very useful and easy to use.
Think of it as strace(8) with global dump capability (not just per-process),
more powerful filters, replayable logging à la tcpdump(8), and Lua plugin
support.

Plus the packaging is top-notch; its kernel modules are rebuilt automatically
on kernel upgrade via DKMS (which I wish other vendors like FusionIO would
do).

------
peterwwillis
I like that you link to the github, where the README is a link to your more-
slick website, which has nothing but a couple of examples and an install page,
all of which is really linkbait for your company Draios. It almost seemed like
you were just sharing a useful tool. The tool might be really useful, but at
this point i'm still clicking through links trying to figure out what it does
and how.

edit: Nevermind, I found it. It's a kernel module and user app that uses Lua
scripts for interpreting data. Sorry about my harsh tone before, but jesus I
hate it when there's more gloss than content.

~~~
degio
Thanks.

To answer the question "what it does and how", sysdig captures system calls
and other system level events using a linux kernel facility called
tracepoints, which means much less overhead than strace.

It then "packetizes" this information, so that you can save it into trace
files and filter it, a bit like you would do with tcpdump. This makes it very
flexible to explore what processes are doing.

We also pack it with a set of scripts that make it easier to extract useful
information and do troubleshooting.

~~~
peterwwillis
See, _that_ is a really good description that would be useful in a README.
Right away I know what it is, what it does and whether I should use it.

~~~
degio
As you suggested, we've updated the README with the content above.

------
zokier
I feel like some introductory article about the different instrumentation
facilities available for Linux systems would be welcome. Just checking
wikipedia and google, I found the following items: SystemTap, Dprobes, LTTng,
DTrace, strace, ltrace (and latrace), ktap, utrace, ftrace, kprobes, jprobes.
And now we have sysdig too.

~~~
zokier
Replying to myself; found this
[http://www.brendangregg.com/linuxperf.html](http://www.brendangregg.com/linuxperf.html)
page which does at least some sort of summary of the tools

------
shubb
Looks very useful. Some things you can do with it:

Dump system activity to file, so that sysdig can be used to process it later.

* sysdig -w trace.scap

Print process name and connection details for each incoming connection not
served by apache.

* sysdig -p "%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"

See the files where apache spends the most time doing I/O.

* sysdig -c topfiles_time proc.name=httpd

Show the network data that apache exchanged with 192.168.0.1.

* sysdig -A -c echo_fds fd.sip=192.168.0.1 and proc.name=httpd

Show every time a file is opened under /etc.

* sysdig evt.type=open and fd.name contains /etc

~~~
degio
Thanks! A full list of examples can be found here:
[https://github.com/draios/sysdig/wiki/Sysdig%20Examples](https://github.com/draios/sysdig/wiki/Sysdig%20Examples)

------
joshbaptiste
I would like to know what's going more low level, Ktap gives a good break down
how they differ from SystemTap, dynamically typed, byte-code design... etc

[http://www.ktap.org/doc/tutorial.html#faq](http://www.ktap.org/doc/tutorial.html#faq)

Is Sysdig design similar?

~~~
degio
The design is actually quite different.

From the architectural point of view, sysdig is closer to tcpdump/wireshark
than to systemtap/ktap.

systemtap/ktap work similarly to dtrace: \- a script is loaded into a user
level process \- the process compiles the script and dispatches it to a kernel
module \- the kernel module hooks the script into specific places in the
kernel \- the kernel module sends the results back to userspace where the user
can see them

sysdig works this way: \- the kernel module hooks into specific places in the
kernel (using tracepoints), captures everything, and puts it into a shared
memory buffer \- the buffer is accessed from the user-level sysdig process
that reconstructs state (so it knows that fd 23 means /etc/passwd) \-
filtering is applied \- scripting in Lua is applied \- the whole thing is
optionally saved to disk so you can analyze later

Both approaches have pros and cons. We think that the sysdig approach creates
a more natural workflow, ideal for troubleshooting and system administration
tasks. Plus, writing scripts in Lua, with access to its rich libraries, is
quite fun. :)

I want to give more details in a future blog post, so stay tuned.

~~~
annulen
"Reconstruction" step looks like a source of unjustified inefficiency: why
reconstruct state by traversing proc and doing lots of system calls instead of
capturing all data in the first place where it's much cheaper to do?

~~~
degio
Of course, we create and update the state by inspecting the incoming stream of
system calls. We traverse proc only once, when you start a capture, and the
reason to do that is collecting info for the PIDs/FDs that existed _before_ we
start the system call collection. That way, you can for example create a
filter on the IP address of a socket even if that socket was created before
sysdig started.

------
zobzu
"The definitive tool" they name it, yet its not as powerful as dtrace. So, its
not definitive.

Looks nice otherwise. Too bad it needs a kernel module.

~~~
prakashsurya
Is dtrace available on Linux? I know there's been work towards that goal, but
I haven't payed much attention to it recently.

~~~
gregkh
It's "out of tree" due to licensing issues (i.e. Oracle is not releasing it
under a GPLv2 compatible license for various reasons...)

------
yxhuvud
Ah, the good ol' pipe through sudo bash installation instructions. I wish
there was a more structured platform independent way of distributing stuff
before the stuff is packaged by distros.

~~~
degio
We decided to offer bash-piping to make it simpler, but it's actually nothing
more than a clean deb/rpm package.

And you have the option to install it manually if you want
[https://github.com/draios/sysdig/wiki/How%20to%20Install%20S...](https://github.com/draios/sysdig/wiki/How%20to%20Install%20Sysdig%20for%20Linux#automatic-
installation).

~~~
e12e
Are you maintaining the debian build-scripts in some other repo? I was hoping
for a /debian directory and being able to simply dpkg-buildpackage from the
git repo, but I can't find any way to build debs (or rpms)?

Does appear to build fine with cmake/make, though.

~~~
gighi
We use CPack for the moment, so you can just run "make package" inside the
CMake build directory and it will generate RPM/DEB.

------
simonebrunozzi
Wow, this is really great. From the creator of Wireshark, nonetheless :)

~~~
dfc
Wrong.

Gerald Combs is the creator of Ethereal/Wireshark:
[https://www.wireshark.org/about.html](https://www.wireshark.org/about.html)

~~~
nodata
> Wrong.

Less Dwight please.

~~~
dfc
???

------
krakensden
Given that it involves a kernel module, I was kind of skeptical- but Greg KH
seems to have looked it over and fixed it up, which I'd call a compelling seal
of approval:

[https://github.com/draios/sysdig/commits/master/driver](https://github.com/draios/sysdig/commits/master/driver)

------
perryh2
This tool is very similar to what I had created last summer as an intern
(strace/lsof analysis), but it seems to be a lot more rich in features. I
analyzed system calls as well as application tracing (New Relic) to find/fix
performance bottlenecks.

~~~
mh-
is the strace analysis stuff open source? have been messing around with
creating something reusable in my spare time, based on hacky methods I've long
been using. would be interested to see.

------
mesuutt
I am getting error during compiling on Arch linux:

[https://github.com/draios/sysdig/issues/39](https://github.com/draios/sysdig/issues/39)

Has anyone encounter with this error before? Any help would be appreciated.

------
neuronsourcing
After installing sysdig, when I trying to run it I get the following error:

# sysdig fd.type=ipv4

error creating the process list

Has anyone seen this one before? Any help would be appreciated.

~~~
degio
It looks like you don't have enough privileges to read /proc. Are you using
this as root?

------
digitalyatri
Some observations

sudo sysdig -w file1.log

file1.log contains lots of junk characters (fix this)
^@^@^@^@^@^@^@^@^@^@^@^@^

Better alternative

sudo sysdig > file2.log

file has proper logs

~~~
gighi
That's the wrong way to use it.

"sysdig -w" switch will generate a binary dump (in a pcap format) containing
the "raw events" coming from the kernel (plus a snapshot of information
gathered from /proc), so it's not supposed to be human-readable, you have to
use "sysdig -r" on the dump file to get the output.

If you're used to tcpdump, it's the same thing.

~~~
digitalyatri
My bad, works well with -r

------
pinturic
It is amazing how easy it seams to collect such information with this tool

------
wesleyac
Just looked at the website, and had a very "small world" feeling:

They're located in my town O.o

