

Google releases Skipfish, an open-source web security scanner - tptacek
http://code.google.com/p/skipfish/

======
tptacek
It's Michel Zalewski's (known to his friends and those, like me, who fear him
as "lcamtuf") code.

It's a command-line Unix tool. It builds with almost no exotic deps other than
GNU IDN.

It's a pure-async wordlist-driven crawler/fuzzer. It is screaming fast on the
network I'm testing it on. Because it's async, it's not bottlenecked on
demand-threading for each request.

It generates pretty HTML reports. Well, pretty for a C program.

It's Apache licensed. There are things in here I'd probably steal, like the
URL parser; it's much tighter than mine.

How useful is this going to be for YC-style apps? Meh? You should definitely
run it on your QA instance. Make sure you give it a login cookie to run with.
It will find things. But where it looks like Skipfish is _really_ going to do
some damage is on the enterprisey J2EE- and .NET-stack apps.

~~~
axod
Why do you say @ J2EE and .NET-stack apps?

I would have guessed most vulnerabilities are in PHP code.

~~~
tptacek
Uh, very no.

~~~
axod
back it up with some data?

I'd say wordpress vulnerabilities alone account for quite a bit.

PHP is such a low barrier to entry, plus there's just so much php code about.

~~~
tptacek
For now, you should just take my word for this.

PHP is the least of WordPress' problems. Or. Well. Not the worst of WordPress'
problems.

------
ErrantX
_actual results are stored as a hierarchy of JSON files, suitable for machine
processing if needs be_

That's a quality touch. I can't count the number of times I've had to write
"quick" tools to parse results into a form we prefer/need :P

------
rabidgnat
Wow, this actually found a moderate security vulnerability in my website!

Fetching <http://mysite.com/static/> returned the plaintext template of my
index.html file, which I must have accidentally copied in some manual hackery
during a broken push (none of my scripts copy it normally)

However, I almost missed the warning: Skipfish complained because the page
lacked a content type, and it was buried in several similar warnings. I'd like
it to recognize potentially templated files, which is a much more serious
vulnerability than missing a 'text/plain' content type. Years of staring at
unimportant compiler warnings might cause people to miss gems like this.

------
dschobel
And it even builds on cygwin out of the box. All c + minimal dependency, is it
the next big fad? (I sure hope so).

~~~
tptacek
Redis was the same way. I very much like this style of packaging.

~~~
leftnode
For those of us who kind of know C but want to learn it really in depth, I
like it as well. It makes compiling open source programs really easily.

I took a cursory glance of the code and it seems good, but is it good code to
learn from?

~~~
tptacek
Yes. Start with "analysis.c" --- the code in their works with completed
request/response pairs. Look for "scrape_response()". Note how the strcspn's
and pointer math approximate simple, unrolled regular expressions.

The database code (the "pivot tree") is a bit tangley as a data structure.

http_client.c has really tight URL parsing code.

This isn't written in a very modern style (I like that about it, though). For
instance, look at "check_for_stuff()", which implements basic content
sniffing. In a modern C program, you'd probably see an array of structs
containing function pointers and names, each pointing to a tiny function
looking for a different bit of content. Here it's one big unrolled function.
Likewise, modern C code would probably just regex the HTML responses instead
of hand-coding HTML parsing. But on the other hand this actually exists and
works, and that's a good goal to have too.

The I/O loop in Skipfish is definitely the right way to do network programming
in C. This program is simpler and faster because it doesn't waste time with
threads. But if you do something similar, use libevent.

------
tocomment
I'm not really understanding this tool. What does it find exactly?

~~~
tptacek
You aim it at a URL.

It asynchronously launches thousands of requests based on a very large
wordlist.

It scrapes the responses and spiders them.

As it identifies actual pages, it fuzzes them with strings that tickle web app
flaws. It analyzes the responses. For instance, it tries to inject
"skipfish://whatever" URLs into fields, parameters, and links; then it looks
to see whethe those URLs appear in "hot" places in the response, like headers
or link tags.

It's looking for --- primarily --- :

* Cross-site scripting and content injection

* Best practices problems (like failing to declare charsets properly)

* Forms without XSRF tokens

* SQL injection

It's better than anything that I've written but I will hazard a guess that it
finds things well in that order. It's bound to get better over time.

------
aschobel
Simple to build, just make sure you grab libidn:

<http://ftp.gnu.org/gnu/libidn/libidn-1.18.tar.gz>

It also defaults to white text, so set your terminal background to black.

------
arohner
And coincidentally, leiningen is having problems downloading an unrelated
project from google code. Surely HN didn't overload google code?

~~~
durin42
No, we had some unexpected infrastructure problems today.

------
phatbyte
For windows users:

I've installed over cygwin with no problems. Make sure have all the "dev"
packages installed and then type "make", you are ready to go.

------
alnayyir
I'm having trouble compiling it, and libidn is installed.

See:

<http://gist.github.com/338360>

for make error output.

It appears some files are missing, I tried redownloading but no dice.

Has anyone else encountered this?

~~~
tptacek
You're building it on a system without zlib or OpenSSL. Install those too.

~~~
alnayyir
Both were installed, it looks like I needed the dev packages.

