

Web Server Log Forensics App Wanted - bensummers
http://ha.ckers.org/blog/20100613/web-server-log-forensics-app-wanted/

======
wglb
Ages ago, as a contractor for a BFE, in the early days of the internet there,
I wrote something like this for a very specific purpose. We were tasked with
identifying "non business-appropriate activity". Make of that what you will.
So I wrote a c++ program to pull the logs apart and generate a static web page
with findings.

First, I looked for triple X anywhere in the URL. Then, from the workstations
that referred to those, I found other interesting words in the URLs (a pivot
of sorts there). By the end of the week, I could not read the word list over
the phone without running foul of the indecency laws.

It was an effective effort, and ultimately they installed some blocking
software whose site blocks were updated frequently.

Further, there seemed to be a lot of tools out there, but tilted to the
analytical side of things with pretty graphs.

That work was property of where I was engaged, but there was nothing
particularly difficult about that.

Were I to do this today, I would likely use Lisp and if the set was truly
huge, consider getting the Franz triples software to help manage access.

This software would take into account time drift, either by synchronizing of a
known event, or my manual input for each file or server.

I don't see the parsing differences as much of a problem. They should be
relatively self-identifying.

The fun part, as you note, is that you don't know quite what you are looking
for, so you need to be able to easily pivot on something that you don't know
until you see it, and to be able to spot patterns over long periods of time.

But this was ages ago, so I would be astonished if this doesn't exist in some
form already. Fun project.

As an aside, what was not funny was seeing what some of the URLs were--some
were bad enough to turn your stomach. That is without looking at the site.

The funny part was noticing repeat visits, kind of nullifying what I imagined
to be the "banana peel" defense.

------
keltex
I really like Microsoft's Log Parser. It does SQL style queries on log files.
Really quite powerful. It's also command-line based so it's easily scriptable:

[http://www.microsoft.com/downloads/details.aspx?FamilyID=890...](http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en)

------
lsb
Strings are an easy way out.

You don't, actually, want a log parser. You want to easily work with your logs
as a data structure, and your server writes its log as strings. I wish logs
were n-ary trees of objects. Imagine, we could replay log dead-easily, and we
could do analytics on our logs in a really straightforward manner.

------
astrofinch
Have you looked at this?

<http://www.loggly.com/>

~~~
kordless
Heh. Yeah, we will do most of that in a hosted model for people. We're not
launched yet though, but you can sign up for the beta here:
<http://logg.ly/signup/>

------
fossguy
Try OSSEC: <http://www.ossec.net> (open source log-based intrusion detection
tool).

------
nwatson
Try SenSage.

