

Why web log analyzers are better than JavaScript based analytics - mindaugas
http://www.datalandsoftware.com/blog/2009/07/06/10-reasons-why-web-log-analyzers-are-better-than-javascript-based-analytics/

======
jacquesm
The reasons 1 by 1:

1) you don't need to edit HTML code to include scripts

The authors assert that you'd have do this by hand if you had a lot of static
html. This is incorrect (you could easily insert the code using some script),
but it also doesn't make sense, most larger sites (if not all these days) are
dynamically constructed and adding a bit of .js is as easy as changing a
footer.

2) scripts take additional time to load

This is true, but it only matters if you place your little bit of javascript
in the wrong place on the page (say in the header). When positioned correctly
it does not need to take more time to make the connection.

3) 'if website exists, log files exist too' (...)

This is really not always the case. Plenty of very high volume sites rely
almost entirely on 3rd party analysis simply because storing and processing
the logs becomes a major operation by itself.

4) 'server log files contain hits to all files, not just pages'

That's true, but for almost every practical purpose that I can think of that
is a very good reason to use a tag based analysis tool rather than to go
through your logs. The embedding argument the author makes is fairly easily
taken care of by some cookie magic and / or a referrer check.

5) you can investigate and control bandwidth usage

Bot detection and blocking is a reason to spool your log files to a ramdisk
and to analyze them in real time, to do it the next day is totally pointless.
Interactive log analysis (such as the product sold by this company does) can
help there, but a simple 50 line script will do the same thing just as well
and can run in the background instead of requiring 'interaction'.

6) see 5

7) log files record all traffic, even if javascript is disabled

yes, but trust me on this one, almost everybody has javascript enabled these
days because more and more of the web stops working if you don't have it. The
biggest source of missing traffic is not people that have javascript turned
off but bots.

8) you can find out about hacker attacks

True, but your sysadmin probably has a whole bunch of tools looking at the
regular logs already to monitor this. Basically when all the 'regular' traffic
is discarded from your logs the remainder is bots and bad guys. A real attack
(such as a ddos) is actually going to work much better if you are writing log
files because you're going to be writing all that totally useless logging
information to the disk. Also, in my book a 'hacker' is going to go after
other ports than port 80.

9) log files contain error information

This is very true, and should not be taken lightly, your server should log
errors and you should poll those error logs periodically to make sure they're
blank (or nearly so) in case you've got a problem on your site.

10) by using (a) log file analyzer, you don't give away your business data

well, you're not exactly giving away your business data, but the point is well
taken. For most sites however the benefits of having access to fairly detailed
site statistics in real time for $0 vs 'giving away of business data' is
clearly in favor of giving away that data.

Google and plenty of others of course have their own agenda on what they do
with 'your' data, but as long as they don't get too evil with it it looks like
the number of sites that analyse via tags is going to continue to expand.

~~~
snprbob86
RE #10: It actually may be quite beneficial to give away your traffic data if
your site doesn't have a lot of inbound links. PageRank is only one small
component of your ultimate placement in Google results; high traffic sites are
obviously also ranked better.

~~~
jacquesm
Is that true ? It sounds self-serving, after all if high traffic sites are
ranked higher they become even higher traffic sites and so on...

~~~
snprbob86
I don't have any inside knowledge, but if you were Google, wouldn't you at
least factor it into the equation?

We use Google Analytics. I noticed Google Bot, but not other buts, has
increased crawl frequency steadily with our recent traffic spike, but the
number of visitors from search engines is still quite low at this point. This
correlation hints that Google may be using traffic data to prioritize crawl
rates and it would seem a logical extension to prioritize search results.

------
timmaah
Note that Data Land Software (host of blog) sells an "interactive web log
analyzer"

------
axod
JS based:

    
    
      * Can record java version, flash version, other plugin info
      * Can record screen size, browser window size, color space
      * Can detect and record ad block presence

etc

Both have their uses.

~~~
mooism2
Enough sites use Google Analytics wrongly (by including it at the start of the
page rather than at the end of the page) that I have used AdBlock+ to block
it.

Edit, since someone has downvoted this comment without offering a contrary
opinion:

My comment is an argument against trusting figures on ad-blocking obtained
using externally hosted javascript. If you host the analytics javascript
yourself, no-one will be motivated to block you. And the reliability of
statistics such as colour depth and browser window size are probably
unaffected.

~~~
coopr
It is not "wrong" to put Google Analytics at the start of the page - in fact,
Google _requires_ it for some GA features to work (like Event Tracking).

------
eli
Uh, the author _greatly_ underestimates the headache of filtering out bot
traffic. It's bad enough that some of the fancier comment spam bots load
javascript, but going through the server logs would be nuts. The "Contact Us"
form would show as the most popular page, since it's constantly being
assaulted by automated bot-net based attacks.

~~~
vradmilovic
Truth, bots could really be PITA. Still, with log analysis you can remove (at
least some of) them, but with JS you can't add them. Same applies to non-page
files.

~~~
eli
Removing them from the logs sounds difficult. And adding them to JS-based
stats doesn't sound very useful. I don't particularly care whether Google
indexes my site at 10am or 11am so long as it gets indexed. And I certainly
don't care about the comment spam botnets.

~~~
vradmilovic
Of course the hour of bot's visit is not important, but it could be important
to see if it comes every day or not. Not to mention hits to images and other
files.

~~~
jonknee
Google's Webmaster tools show how many pages the Googlebot grabs daily and how
long requests take, so you can monitor that and rest at night.

------
pie
I don't think anyone suggests (for serious websites) that log files should be
abandoned or ignored. On the contrary, script based analytics - which do
indeed offer significant value that's ignored in this article - should be
considered supplemental or complementary to more traditional methods.

------
bjplink
6\. Bots (spiders) are excluded from JavaScript based analytics

To me that is actually a benefit of JS based analytics programs. When I check
Google Analytics in the morning I don't want to see how many search engine
bots and scrapers hit my site the previous day. I want to know how many actual
human beings used my site instead.

Also, and this is probably obvious since it's been pointed out that these
people have a vested interest in log parsers, this article would better be
titled as "10 Reasons Why Web Log Analyzers Should Be Used WITH JavaScript
Based Analytics." I would argue most people serious about tracking traffic use
both anyway but those that don't should see the benefits.

------
vradmilovic
I'm an author of this article - thank you all for commenting. I don't have an
intention to start a flame war - both methods have pros and cons, but in this
"GA craziness" people tend to forget that log analysis even exists. Hence the
article. :)

And yes, Dataland Software "sells" interactive web log analyzer, but I can't
really see how's that important?

~~~
jacquesm
Asking the provider of a service to show reasons why their product is 'better'
is going to give you a fairly biased story, it is good to declare such bias up
front.

Personally I found most of their 'reasons' to be fairly contrived.

I think it would be possible to come up with a much more balanced point of
view than the one given in the article when comparing tag based analysis and
log based analysis.

Both have their uses, some of the reasons given hold water but most of them
are pretty thin.

You could make an equally unbalanced list of 10 reasons why tag based analysis
is 'better'. The important thing to notice is that tag based analysis is very
convenient and can give you a bunch of information that would be fairly hard
for a log file based analyzer to provide.

Log file analysis has it's place though, especially when you need to dig in to
locate a problem. That's when a log file analyzer is next to useless though,
you are basically going to go hunting through the raw logs in order to find
your evidence.

The article smacks of a business giving me reasons to buy their product in the
light of free competition. (The other alternative to paid log analyzers
besides tag based analysis is of course an foss implementation of such a log
analyzer).

~~~
vradmilovic
From the first paragraph of the article: "Depending on your preferences and
type of the website, you might find some or all of these arguments applicable
or not. In any case, everyone should be at least aware of differences in order
to make a right decision."

Sorry, I'm not in a mood for a flame war...

~~~
scott_s
That's substantially different from declaring up front what your biases are.
No part of that sentence indicates to me that you stand to gain from
convincing me your points are valid.

~~~
encoderer
Oh, come on.. This article is on the company's website..

------
eggnet
One of the great things about javascript based analytics is that the cached
version of your page is just as good as someone grabbing it directly. You can
set long cache times on all of your pages without worrying about people
viewing your site without you knowing. This more than counteracts the handful
of people who have javascript turned off.

This is also particularly important for sites like heroku who have an HTML
cache sitting in front of your site. If you serve pages that are cached,
javascript logging is your only option.

~~~
jacquesm
doesn't your cache have the ability to write log data ?

~~~
eggnet
Not Heroku's, and certainly not a cache in a client's browser or their network
squid cache.

------
jawngee
Log analysis is a major PITA, specifically if you're operating a farm of web
servers like we do. We use an epic shit ton of realtime stats (Woopra, Mint,
GA) so we have most needs covered and have a real time view into what's going
on.

We do rotate our logs up to S3, but haven't done anything with them thus far.

------
davidw
IIRC, a while ago there used to be an analysis system that you'd place in the
appropriate location in your network, and it would sniff packets and piece
together its own log files. I don't recall what the advantages were supposed
to be... perhaps that you could get some information on speed/latency.

~~~
jacquesm
the advantage would be 0 overhead on the machine serving up the data.

