

Large scale Apache log file analysis - bdwalter

Does anyone know of any good commercial products to analyze about 500 million to 1 billion rows (1 week) of apache access logs?<p>Looking for fairly simple analysis. Something similar to what weblog expert does, but on a larger scale.<p>Are there any commercial tools out to do this or are we stuck writing something for this?<p>I know we could use hive/hadoop/cassandra etc and build something. We are about to roll out another product that uses cassandra as a backend, but this is simply to get an idea of some of our basic web stats in a simple report.  This is not part of our core product, so I am just looking to see if there is an easy way to do something here without a bunch of development.
======
traviscline
I'm not aware of any commercial solutions but a tool you might find useful is
Hive: <http://hadoop.apache.org/hive/>

These videos are a great introduction:
<http://www.cloudera.com/videos/introduction_to_hive>
<http://www.cloudera.com/videos/hive_tutorial>

------
Travis
Splunk is very popular, although I think it's software that you have to host.
I have no idea how fast it is, but with a recent-ish machine I wonder if you
could have it process stuff in a weekly batch with minimum of frustration.
They have a free trial that might be worth it to you -- www.splunk.com

