

Ask HN: what software do you use for analyzing apache logs? - hashtable

What software do you use for analyzing apache logs? I would prefer open source ones that will work on Linux, although this is not a deal breaker.
======
davidw
I use 'visitors' ( <http://www.hping.org/visitors/> ), which was written by my
friend, YC.news visitor antirez.

I tried Google Analytics too, but found it kind of annoying in that it's sort
of a pain to get the information I want.

~~~
antirez
Thanks David, note that in most debian-based distributions all you need is
apt-get install visitors.

 __Edit: __Also if you could like to play with real-time analysis to get
detailed information about how your users are interacting with your web site
please feel free to write me at antirez -at- gmail ~dot~ com for a free
invitation for<http://lloogg.com>

~~~
apathy
cool, you wrote Visitors? thanks for saving me a lot of work!

I used it from 2004-2005 for clicktrails... great idea and implementation.
Graphviz makes everything better (corollary: anything that needs Graphviz is
interesting enough to get better)

Google Analytics and/or Visitors are about all I would think a person needs.
And now that you have lloogg.com out there, it looks like clicktrails are
covered in a GA-style interface. Kick ass! Great idea, _again_ , best of luck.

I haven't run into many websites that couldn't be improved by clickpath
analysis and refocusing navigation on the ways people actually use a site --
optimizing for the common case, in other words. It baffles me that any site
would fail to do so. Now they _really_ have no excuse.

~~~
antirez
Nice to hear that visitors was useful :) and sure Graphviz is very cool, it
was a great help in our startup (the main product we developed is a
digg/reddit/...-alike system for the Italian market) in order to visualize
voting patterns and try to improve the algorithm for fraudolent voting
patterns detection.

LLOOGG is still pretty raw and we are developing the "filters" part of the UI,
but still it seems pretty useful to check what's really happening in a web
site. We have a lot of success stories of people using LLOOGG for some week
and then modifying the site structure to optimize the user experience.

Thanks for the comment.

------
pk
Started with Webalizer, switched to Google Analytics.

Webalizer's ok, but it's missing a bunch of features like sane user agent
string parsing (to give an overview of the browsers accessing your site). It
also displays most of the stats by page hits (such as country and user agent)
rather than "visits" or unique IPs, which I think is a better way to group.

I've been pretty happy with Google Analytics so far - it has a ton of options
for sorting and grouping data (like viewing users' paths through the links on
the site) and good IP geolocation. Plus, the JavaScript tracker gives you
stats on visitors' screen resolution, which can be handy. On the downside, all
our data is belong to Google.

~~~
nickb
If you have $3K to spare, you can purchase Google Urchin 5/6 (Analytics was
based on it) and keep all the data. As a bonus, it also analyzes log files.

~~~
rms
Free trial for Urchin 6? <http://www.google.com/urchin/download.html>

------
sam
apachelog is a nice python module for parsing log lines from apache. it works
as a great base for doing your own analysis.
<http://code.google.com/p/apachelog/>

It's based on this perl module: <http://cpan.uwinnipeg.ca/~peterhi/Apache-
LogRegex>

To manage log files, use cronolog

edit: I use cronolog to break up the logfiles daily and then I run a python
script (which uses apachelog to handle the nasty parsing) to create a summary
dictionary of parameters from that day. For example:

{num_unique_ips:140, num_pageviews:532, ...}

I pickle that dictionary and save it as a file. So every day has a raw log
file and a "summary dictionary" file. To make plots I go through the summary
files and unpickle them to extract the quantities of interest.

------
hoyhoy
AWStats with GeoIPFree is pretty good, but a major hassle to configure.

------
PStamatiou
#!/bin/bash

sudo awk '{print $11}' access_log | grep -v 'yourowndomain.com' | grep -v
'bloglines.com' | grep -v '"-"' | grep -v 'feedburner.com' | sort | uniq -c |
sort -rn | head -20

------
SwellJoe
We use Google Analytics, Webalizer, and AWStats, plus some custom Perl bits.
Clicky looks pretty swish, but I haven't taken the time to try it.

------
dangrossman
I'm using W3Counter (<http://www.w3counter.com>). It's like Google Analytics
minus the 1-24 hour delay, and all the goodness that comes from realtime
reporting. But I keep GA installed as well for some of the more detailed back-
reporting it doesn't have.

------
ivank
Stone Steps Webalizer: <http://www.stonesteps.ca/projects/webalizer/>

along with cronolog and a bunch of custom Python scripts to autogenerate
webalizer configs.

------
slurpme
I use <http://polliwog.sourceforge.net> open source, runs on Java. Not
suitable for large websites since it provides a LOT of information about your
site.

------
dpapathanasiou
Webtrax (<http://www.multicians.org/thvv/webtrax-help.html>) is a good open-
source tool.

------
tom_rath
A Windows solution, but WebLog Expert works great for us:
<http://www.weblogexpert.com/>

------
mleonhard
<http://code.google.com/p/recordstream/>

------
ubudesign
I use analog.

~~~
xirium
I use a combination of analog and some custom scripts. analog is written by a
profession statistician and is a steadfast tool. The custom scripts look for
interesting searches. Classics include "What does the fur on a rat do?" and
"How do I connect an airbrush to a scubadiving tank?"

------
uruzseven
Use Awk or Perl. They are available on every Linux system already so it's
highly portable and very powerful.

------
agentbleu
I like bbclone a lot, it's not like the usual stats but gives me very uniquely
useful information and much better than many of the others as its so direct,

