
Fair Analytics: A fair and distributed alternative to Google Analytics - vesparny84
https://github.com/vesparny/fair-analytics
======
buro9
If this gains widespread use then the transparency points will make it far
worse for privacy than using Google.

Why? Google have access control over the data or view it in anonymised form
when crunching. But this data will open to anyone and would allow people to
correlate visitor sessions not just on a site but across other sites that use
this, to the degree that you could start deanonymising the internet activity
of people.

It's a strange world in which we can say, "At least today only Google has the
data and they've made it hard to get to".

~~~
yuhong
I am trying to figure out what the problems are with Google Analytics anyway.

~~~
sudojudo
Some of us just don't want to live under Google's umbrella.

Personally, I have no interest in sacrificing my privacy for "free" services.
I'm not important, nothing I do is controversial or of any significance, but
having my activities logged and monitored by a giant all-seeing eye is still
uncomfortable. It's disturbing, to me, how accepting of these things people
have become.

------
throwaway2016a
I'm not sure I'd go with the idea (yet). The power of Google Analytics is in
the "Analytics" pieces. This seems to be only the data gathering piece.
Granted that is also important. But this would need a ETL & BI system or
something like that in-front of it to be useful.

It is a great start.

I also really enjoy looking through the source code for projects like this and
seeing how other coders do things.

~~~
vesparny84
As you said, this is just a starting point :) It's meant to be the engine
gathering data.

It's possible to build any type of chart/fancy dashboards on top of it

------
morecoffee
At least for me, the real value that GA provides is being up all the time and
giving me peace of mind that hits won't get dropped. Running my own analytics
infrastructure is possible, but would get in the way of what my site was
actually trying to do.

What the open source analytics world needs is better infra, not better
collection.

------
KaoruAoiShiho
"Fair" aka "Missing features".

Is there an OS alternative to Google Analytics that's comparable in
featureset?

~~~
Paul_O_Meany_Jr
Maybe something like Piwik is what you're looking for. Open Source, self
hosted analytics.

[https://piwik.org](https://piwik.org)

------
londons_explore
It seems that users will be very tempted to write additional "session"
information, for example some kind of visitor id to the logs. They would need
that to, for example, correlate which people visit the homepage with which
referrer then buy a product. That's a core use of most analytics products.

Considering the raw logs are timestamped and public, that then sounds like a
rather large privacy hole for the users.

I would imagine some kind of pre-aggregation happening, and then making
available aggregated data instead of raw logs.

~~~
vesparny84
Writing additional "session" information without explicitly asking for user
consent often leads to threatening user's privacy.

This is one of the main reasons for fair analytics to exist.

~~~
NetOpWibby
I think you're missing the point. People are going to add-on the missing
features anyway.

------
irq-1
Advertising needs to filter out fake views, or they'd end up paying for click
farms. I don't see how this protects against this.

~~~
vesparny84
At the moment there is configuration option to only accept requests from a
specific host. This should help avoiding fake clicks.

I'd love to have suggestions on how improve this particular aspect :)

~~~
irq-1
I don't think there is an answer. Fake clicks can only be determined by a 3rd
party -- or else sites could host the ads and report whatever numbers they
want.

The only way forward that I can see is anonymous aggregation by a 3rd party
that is not the advertiser or the website, and which has incentives to protect
the users.

Imagine a system with dozens of 3rd party collectors that anonymize browsing
data. The data is submitted by users who have a plugin for their collector.
The collectors then sell the anonymous data to the advertisers. Advertisers
can then compare data from different collectors to determine fake clicks, and
what the level of payment should be.

There are two big problems with a scheme like this. First, anonymous data is
much less valuable to advertisers. I think this can only be solved by
technical means; making anonymous data the only information available. Second,
why should users have a plugin that tracks them? Access to content seems like
the obvious answer; users with a plugin (being tracked by a collector) could
be allowed to read the NYT and the NYT website could be paid by the collector.

Realistically, I don't see either of the problems being resolved.

------
brudgers
If it meets the guidelines, this might make a good 'Show HN'. Show HN
guidelines:
[https://news.ycombinator.com/showhn.html](https://news.ycombinator.com/showhn.html)

------
awinter-py
re: privacy.

Important to distinguish between a blogger's utility from getting session
information and the huge privacy risk of sending everyone's reading to a
central provider (GA).

I use piwik and I have it set to respect do-not-track; I assume that means if
someone visits from a private tab in FF, I won't even see a hit (piwik experts
-- correct if wrong).

But I also benefit greatly from the analytics from non-private browsers --
seeing that users are, for example, clicking on the footnotes incentivizes me
to deliver thorough arguments in the future.

------
Kinnard
I find the auditability most interesting, I bet this could be useful for some
blockchain based decentralized applications.

