
Goodbye Google Analytics. Hello Piwik - tonylampada
http://dicasdolampada.wordpress.com/2013/07/22/goodbye-google-analytics-hello-piwik/
======
tombrossman
Our local police force has set up a site for 'anonymous' reports from rape
victims, and it had GA tracking on every page (plus Google CDN content,
another issue). I wrote them to explain why this wasn't the best idea and how
Piwik was a better choice.

Crimestoppers (A UK charity) are doing this too, and I wrote them to explain
the potential for privacy issues. 'You've outsourced crime victims' privacy to
an ad company' was my basic message.

I told both the local police and Crimestoppers how easy Piwik was, and how I
thought it was a better idea but I don't think I got my point across. There is
a gap in the understanding where the site owner doesn't see the raw data (bot
Google does) and so they think it's okay.

Anyhow, interesting privacy issues and it may be that I am overlooking
something or being overly cautious.

I posted a related question over on Stack Exchange if anyone's interested in
providing some feedback there.
[http://webmasters.stackexchange.com/q/47069](http://webmasters.stackexchange.com/q/47069)

~~~
toble
Ouch. That crimestoppers form has code for a Facebook like, Google Analytics,
New Relic and Fonts.com.

Maybe you sounded like a salesperson? Piwik was unknown to me until I read it
in this comment.

------
liquidcool
My clients are already using Google Webmaster Tools and Adwords and want
everything integrated, plus they want the reliability of Google.

But another FOSS analytics platform is Snowplow
([http://snowplowanalytics.com](http://snowplowanalytics.com)), and while it
wouldn't replace GA for them, it might replace another commercial analytics
package. Few high volume ecommerce sites use just GA these days.

~~~
alexatkeplar
Snowplow co-founder here. Thanks for mentioning us liquidcool :-)

Snowplow is a little different from Piwik - Piwik is a LAMP-stack opensource
app which replicates a GA-style analytics experience.

Snowplow is more of a scalable event analytics platform - it is built on AWS
(CloudFront, Elastic MapReduce, Redshift), does _not_ have a UI but has a very
clean & simple event model[1] and scales horizontally to billions of events.

To date, Snowplow is mostly used by web companies that want to warehouse their
granular event data to build custom analyses, segment users, personalize sites
etc.

If anybody has any questions, just shout!

[1]
[https://github.com/snowplow/snowplow/blob/master/4-storage/r...](https://github.com/snowplow/snowplow/blob/master/4-storage/redshift-
storage/sql/table-def.sql)

~~~
X4
wow! May I ask where you advertised Snowplow so far? I can't believe that I
don't know it.

Awesome product, looks a little hard to setup, due to the Hadoop linkage, but
definitely worth to be used on my next projects. Maybe worth making a vm
appliance, puppet/chef, or docker script for easier deployment.

Why haven't I ever heard of it? I usually "snowplow" the web for all kinds of
software, but it appears that I haven't crossed the marketing channels you've
targeted.

~~~
rcsorensen
The setup is pretty annoying. I still haven't figured out what permissions
need to be granted to IAM users on AWS to do things like spin up the Hadoop
jobs.

Having your granular end up in Redshift, ready to be enriched with other biz
data sources is a beautiful, beautiful thing, and completely worth it.

~~~
alexatkeplar
Funnily enough, have been working on that exact setup gripe this afternoon:

[https://github.com/snowplow/snowplow/wiki/IAM-
setup](https://github.com/snowplow/snowplow/wiki/IAM-setup)

(work in progress ;-)

------
tonylampada
Author here. Seems like Hackernews traffic "DDOS'ed" the original blog. I made
a clone of the post in my personal blog (hosted on wordpress).
[http://dicasdolampada.wordpress.com/2013/07/22/goodbye-
googl...](http://dicasdolampada.wordpress.com/2013/07/22/goodbye-google-
analytics-hello-piwik/)

~~~
tonylampada
My thanks to the hackernews folks who changed the original link (Y)

~~~
laurent123456
Ironically, could the DDOS actually be because of Piwik? With the sudden
increase in traffic, it's probably generating a massive amount of data and
overloading the database.

------
aram
Piwik is pretty good alternative and unlike GA, it gives instant analytics and
is extensible.

However, it tends to behave clunky after some time, depending on the number of
sites, traffic and server where you host it. This is most visible when you try
to display a larger date range or just dig in the history.

There is one more important thing to consider: it collects IP addresses of all
your visitors out of the box, which might be in conflict with your local laws.
Be sure to check that out before adding it to the site.

~~~
dkuntz2
You can get Google analytics to show you real time info too, it just don't
save it as the default view.

~~~
arb99
I've found that with GA the previous 24 hrs or so stats are not really
accurate, especially with bigger traffic sites.

~~~
grey-area
I think they use sampling once your traffic gets to a certain size, so though
they report figures down to individual hits, it's not actually as accurate as
that might imply.

[https://developers.google.com/analytics/devguides/collection...](https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate)

------
lazyjones
Our attempts to install and use Piwik have always been disappointing in the
past. It seems to fall apart (slow display etc.) quickly for medium to high
traffic sites (50+ million page views/month). Are there any people here who
are using it successfully at this order of magnitude and are willing to share
some configuration hints?

~~~
asdasf
We gave up on it for the same reason. It is absurd for a simple web analytics
app to require massively more powerful hardware than our actual app does. And
since it is a mysql mess, any updates that touch the schema put your stats
offline for hours. I really can't fathom how using mysql is still considered
acceptable in 2013.

~~~
ambientwhsiper
I completely agree with the statement. I love Piwik and we use to monitor a
lot of our sites. But I hate the fact that it's LAMP. Archive script runs out
of memory at least once a week.

~~~
petet
We run Piwik on 40M page views per month on dedicated server. Archiving data
takes a few hours, but the UI is fast and it works. We used tips in:
[http://piwik.org/docs/optimize/how-to/](http://piwik.org/docs/optimize/how-
to/)

------
Nursie
Site has slowed down considerably now.

Anyone fancy telling the lot at gov.uk about this? It would be nice if they
weren't using foreign owned (and likely foreign-located) servers to record and
analyse what UK citizens do on UK government websites.

~~~
afandian
GDS know and have taken a position on this
[http://digital.cabinetoffice.gov.uk/2012/03/19/its-not-
about...](http://digital.cabinetoffice.gov.uk/2012/03/19/its-not-about-
cookies-its-about-privacy)

~~~
Nursie
It seems to be entirely cookie-focussed though, and not even consider that
data on UK citizen interaction with government, regardless of cookies, may be
something to keep private.

I mean this - "despite the fact that no personal data was collected, it was
good practice not to share analytics information with third parties in order
to reassure government websites’ users." is ludicrous.

It's trivial for a service with as many hooks into _everything_ as google to
correlate cookies and IP addresses of visitors to GA-using sites with their
Google accounts and other tracking data. It's almost as ludicrous as the
answer I got direct from them "we don't allow google to use the data".

You don't allow it? How exactly is it that you stop them keeping records when
every time I visit your pages my computer also tells them what I'm doing?

What a load of carp.

------
g-garron
Hi, realized about this post here in HN thanks to Piwik.

I am the author of the post Tony is linking at the top of his post, and I also
use Piwik, where I saw my article was with 300+ visits instead of the 20+ it
gets daily. :)

Piwik is great, and I use it to track visits to a few sites I own, all of them
are some 6000 page views per day, so no real traffic.

Because my site is powered by Jekyll, I also use vanilla forums as commenting
system, so no Disqus or Intense Debate either :)

Have a nice day, and thanks Tony for the link and credit .

------
nodefortytwo
Piwik is great until you get a decent amount of traffic, I had it on a client
site as an experiment. at 100k page views per hour Piwik nuked the server :(

~~~
moepstar
Did you / your client set it up as recommended for high-traffic sites?

[http://piwik.org/docs/optimize/](http://piwik.org/docs/optimize/)

------
edoceo
I'm a lover of Piwik; switch out of GA for Piwik across all the domains I
operate for myself and for some clients. One install can handle multiple
domains/site/accounts/groups

Many of the clients appreciate the increased "privacy". And for applications
(internal/public/private) it just makes more sense to me than GA.

My favourite part is that it's doing server-side log analysis for my traffic -
not using those JS based widgets.

It's got event tracking (sweet) if you choose to use a JS based tickler to do
that kind of thing.

Here's a quick and dirty doc I made about it:
[http://praxis.edoceo.com/howto/piwik](http://praxis.edoceo.com/howto/piwik)

------
sergiotapia
One thing I feel you miss out if you don't use GA is 'Google-juice'. Maybe
it's just like blowing into a Super Nintendo cartridge, but I think using GA
increases your SEO with Google.

Am I mistaken?

~~~
mdasen
You're probably mistaken.

The reason why this probably isn't true is that it would be a
regulator's/anti-trust-buster's dream come true. Imagine it: Google ranks you
lower if you don't use other Google products. Don't use GA? Ranked lower.
Don't pay for ads? Maybe your organic results drop a bit. . .

Google has to be very careful about its organic search results. Giving extra
weight to sites that use things like Google Analytics would be the
"confirmation" that regulators would need. As such, it seems like Google
wouldn't risk its core business over something like this.

I totally understand the logic: if I use Google Analytics, Google knows that
my site is getting traffic and it makes a certain sense to take that into
consideration. However, I think the opposite side of that (if sites don't give
Google whatever Google wants, they're going to be ranked lower) is a can of
worms that Google doesn't want to be seen dipping into.

~~~
hayksaakian
This is basically true for Google + already.

Authorship on your sites alters your sites appearance on SERPs thereby giving
you a distinct advantage or disadvantage that you can only gain by using
Google's product.

------
handzhiev
Piwik is great and so much better than Analytics. It works especially well for
us when we want to track sources of sales, as sales are handled by 3rd party
reseller and Analytics goals are of no use.

In Piwik you have the visitors log at a glance, we just match IP/time to the
log and BAM - we know where the buyer came from, what pages did they visit and
how many time stayed there.

On top of this, our site traffic is hidden from the eyes of Google.

------
generj
Piwik can be set to follow the Do Not Track header. It rigidly follows DNT,
not storing the request at all.

It looks like Google and other analytics players are going to refuse to follow
DNT, after a hilariously weak proposal by the DAA was rejected by the W3C
committee.

------
ridruejo
If you want to give a try to Piwik on your own laptop, AWS or Azure we
(BitNami) have free one-click installers, VMs and cloud images
[http://bitnami.com/stack/piwik](http://bitnami.com/stack/piwik)

------
thomaslutz
Edit: Nevermind - I just read to the end, Piwik can auto-update to latest
1.12. Missed that when cross-reading and trying it myself while reading.

The guide and the mentioned Github repo use Piwik 1.5.1. There are several
security issues with this version (perhaps many more):
[http://www.cvedetails.com/vulnerability-
list/vendor_id-9612/...](http://www.cvedetails.com/vulnerability-
list/vendor_id-9612/product_id-17168/version_id-137845/Piwik-Piwik-1.5.1.html)
Latest version is 1.12 - I advise against this "simple" solution to use Piwik.
Perhaps there is a github repo with the latest Piwik?

------
bcRIPster
I've used Piwik myself for years and swear by it for all of my personal
projects or sites where I need full data privacy. It provides all the basic
data I need for my clients and then some. Also once it was setup I've found
maintenance to be pretty simple. I use it for regular web sites, WordPress
sites and MediaWiki installations.

Granted, I miss the days of being able to use tools like Analog but it's so
badly out of date and not maintained anymore so I only use it when I need to
process raw traffic numbers from a server.

------
bpatrianakos
I've used Piwik for years and it is incredibly simple to use and set up but
this post makes it much more complex than it needs to be. In all honesty, it's
just as simple as setting up Wordpress. Drop the Piwik folder on a server
somewhere, run the installation (connecting to your database and if I recall
correctly you don't need to use the root user, just a user with sufficient
privileges), and you're done.

I want to love Piwik, and I do like it a lot, but I do have some problems.
Piwik gets slow after a while. This may have to do with the server its running
on partly but over time the software will slow down especially if you try to
pull out somewhat longer date ranges.

It isn't as pretty as GA. I know this is petty and that its themeable but the
UI was important to me. Keeping it up to date and maintaining it was also
something that requires vigilance. It isn't hard to update but you have to
make sure to check for updates. Sounds simple but you'd be surprised how lazy
one can be. Also, integration with Webmaster Tools isn't available which is
kind of a bummer.

On the plus side there's very little that GA offers that Piwik doesn't.
There's even a great mobile app which GA doesn't yet have to my knowledge. You
can monitor multiple sites on different servers using a simple JavaScript
snippet just like GA, and it breaks down the data in just about every way
you'd want.

In the end, despite really wanting to use Piwik long term I wasn't able to do
it. I don't see a problem with using Google Analytics for tracking purposes.
Google has the power to abuse the data they collect but I trust them not to.
I'm not running a site where visitor privacy is a big priority. If I were
running such a site I'd reconsider this position. But from an ethical
standpoint if it's somehow not okay for Google to collect tracking data on
your visitors (and promise not exploit it) why is it okay for any of us to use
Piwik and collect that data ourselves. Google has far more data that can do
far more damage but they also have far more resources to put into security
than most of us. I can take a pledge not to exploit my user's data but Google
does too? I know I can trust myself but my users don't. My users might even
prefer that if I were to use analytics software that I use software that comes
from Google, a name they know and trust, rather than me, a guy who they know a
little bit but doesn't have a reputation that can even remotely compete with
Google. To me, that's the more interesting aspect of Piwik. The question of
why running your own anaytics software is more ethical than using Google.

Edit: When I said I wasn't running a site that made visitor privacy a priority
I was excluding the site I run that actually does make user privacy a huge
priority. I'm aware I look like a hypcrite now and I think I might actually
spend some time thinking of whether or not to switch over to a self-hosted
analytics solution for that site. I'm still not sure that a self-hosted
service is preferable in my case but I'm open to the idea.

~~~
halfdan
Full Disclosure: I am a Piwik Dev

Whenever Piwik gets slow you will have to setup cron archiving:
[http://piwik.org/docs/setup-auto-archiving/](http://piwik.org/docs/setup-
auto-archiving/)

By default, Piwik will aggregate data when you a) make an API request - b)
Load the dashboard (which in fact calls the API). Cron archiving makes this
process faster by processing all the data beforehand so that the API can
simply request it from the DB.

~~~
hsshah
Apologize for hijacking the thread. I am trying to convince one of my clients
to do exactly this. Switch from GA to Piwik for their site which is deployed
on Adobe CQ5. I know I can use client side tracking easily (hence my
recommendation). However, do you know if there is support for server side CQ5
logs?

~~~
nadaviv
I'm tracking sites by loading their Apache access logs into the database. I'm
not sure what's CQ5 logs structure is, but it should be possible (tho it might
require some modifications).

See here: [http://piwik.org/log-analytics/how-to/](http://piwik.org/log-
analytics/how-to/)

Edit: It should be noted that you will get lots of bots requests showing up
when you load access logs (the regular tracker relies on javascript, which
gets rid of most of the bots). I have a small awk script that cleans the
access logs prior to importing them, by trying to detect bots. I can upload
that if you're interested.

------
richardv
Has anyone installed Piwik for multiple accounts?

My product allows users to create and launch their own website. I'd quite like
to be able to quickly provide basic statistics for users (alongside Google
analytics if they want it).

We run a multi-tenant application so have thousands of sites running from the
same codebase, however, each site owner would need it's own statistics. At the
moment, we just let users provide their own Google Analytics, but it would be
nice to report to Piwik I think and give them their own preconfigured stats
area?

~~~
aram
Yes, you can create additional users within Piwik and grant them access to
specific websites.

~~~
PavlovsCat
Looks like it's possible to do this programmatically even:

[http://piwik.org/docs/analytics-
api/reference/#UsersManager](http://piwik.org/docs/analytics-
api/reference/#UsersManager)

[http://piwik.org/docs/analytics-
api/reference/#SitesManager](http://piwik.org/docs/analytics-
api/reference/#SitesManager)

(and if there are some quirks in the way, file a ticket etc.)

------
mitchwainer
I would not recommend Piwik over GA. The #1 reason is that Piwik does not
track the (not provided) Google keyword searches. Google now provides the (not
provided) keywords with the site page attached now. For example: (np -
/pricing), so you at least have an understanding of what they searched for.
This is a big factor if you're serious about SEO.

------
sgarbi
I have implemented Piwik widgets in my latest project for tracking visitors to
personal pages. When you visit this page
[http://reminderof.me/ruggero](http://reminderof.me/ruggero) I see the insight
on my dasbhboard.

IMHO Piwik is a valid open-source alternative to Google Analytics and will
erode its application marketplace.

------
noinput
I setup a Pagodabox app to launch a new self hosted Piwik instance a few back:
[https://pagodabox.com/cafe/noinput/piwik](https://pagodabox.com/cafe/noinput/piwik)

Hope some find it helpful for a quick way to test out and run it for free to
see if they like it (I do).

------
belorn
For hosting providers (or those nice people who share their servers with
friends), you can have Piwik automatically installed by default when creating
new sites.

This is impossible to do with GA, as GA require the creation of personal
accounts and injection of code into the customers/users website.

------
seriocomic
I tried Piwik on a number of sites to move away from GA, but I was simply not
able to match the minimal impact on my load-times compared to Google. Self-
hosting your analytics solution still requires an optimized web-server -
things like TTFB were a real issue for me.

------
ksec
The Problem is Piwik doesn't scale. Even if you set it up to the way they
said.

The problem is there isn't a single viable GA alternative out there that
doesn't cost a fortune.

Clicky and Chartbeat are the only two relatively inexpensive option

------
felxh
Unfortunately it looks like it doesn't support event tracking at the moment.
Does anybody know more about that?

~~~
moepstar
If i don't completely misunderstood you, there's Goal and Campaign tracking,
see their docs about that:

[http://piwik.org/docs/tracking-campaigns/](http://piwik.org/docs/tracking-
campaigns/)

[http://piwik.org/docs/tracking-goals-web-
analytics/](http://piwik.org/docs/tracking-goals-web-analytics/)

~~~
winslow
I believe he is referring to an event such as a javascript based event when a
user clicks a button etc. I'm also interested in event tracking because my
WebGL site uses it to track user interaction since users only visit the WebGL
view and don't visit multiple pages.

~~~
__david__
Piwik calls those "goals". You can definitely kick those off from javascript.
We run piwik on our web solitaire site
([http://greenfelt.net](http://greenfelt.net)) and kick off goals when users
do things like complete a game. The code looks like this:

    
    
        if (piwikTracker)
            piwikTracker.trackGoal(2);
    

My only annoyance is that you have to reference things by numbers and not some
nice string that reads well in the code... But it works and it's something you
setup once and forget about, so I got over it.

------
af3
Requesting a similar guide for iredmail on Openshift.

