1. I would rename the pixel.png to something like image.png. Never call your script anything like tracking, analytics, or pixel when you don't want to be blocked by ad blockers. We use hello.js and hello.gif. 
> Log file parsing is an old-skool but effective way of measuring the traffic to your site.
2. By using a pixel image you can bypass caching. When using server logs without the image you only get the non cached requests. So an image like you use is a better approach.
3. Your image is being cached. So if somebody revisits your website your image will not be loaded and you will not find anything in your logs. Just disable your etag and add a expiry date of the past.
Awesome that more and more people are replacing Google Analytics with more simple tools. GA is overly complicated and has not the best privacy mindset. I built Simple Analytics  as a privacy friendly alternative.
But this is a tracking script, no? If I were you I'd keep the name that way so that if people don't want to be tracked, they don't.
If you really want to block you can enable the Do Not Track setting. Although I think this should only be used when you are actually tracking people (we don't). So this feature might be removed in the future. It's already removed by Safari because it is another parameter to fingerprint a browser.
You don't have to play the privacy game -- there is a lot of space between really respecting user's privacy and breaking privacy laws. But if you do, you should put the power back in the user's hand.
(Disclaimer: I work for Google.)
I think saying nothing bad has happened is disingenuous. As soon as Google gets similar exposure as FB right now, internal whistleblowers might come forward also with more stories.
Also, couple of years ago Google was thoroughly compromised by at least one foreign government, ever wonder how much data was stolen?
You've basically said you know some users block your stuff, but you think they probably shouldn't really want to block your stuff, so you devise an end run around them.
Then, finally, you conclude that if they really want to block your stuff, they should do something that you yourself admit won't block your stuff, but that's okay because they shouldn't want to block your stuff.
If you don't like someone running log analysis on their traffic then don't go to their website, nobody forced you to.
> whiny, entitled users who think they have a right to use any website they want on their terms
Users do have a right to use any website they want to on their own terms. If I make an HTTP GET to your site, it's up to you to decide what HTML to return. Once you do, it's up to me whether to request the images, scripts, etc, and whether to read the sidebar, etc.
It's up to you to decide whether to show me any content of substance without first collecting payment. I can't demand that you publish content for free. I'm not entitled to that.
But it's up to me to decide what I consume. You can't demand that I view ads, send back tracking cookies, etc. You're not entitled to that.
If you don't like site visitors refusing to be tracked, then don't let them view your website. Nobody forced you to.
Business often count foot fall with IR or laser; it's generally on the door. How do I know that using a businesses with cameras claiming to only count traffic are not actually gathering a whole lot more information to use/sell/change their mind later?
I have no skin in this game but the original comment was more focused on "we've decided to bypass your adblocker because we feel that our interests outweigh yours"
How do you deal with this sort of tracking?
No. This is self-hosted analytics, no 3rd-party involved, the way it's supposed to be.
Tracking involves a 3rd-party. Be it Google Analytics or Cambridge Analytica UserTracking (TM).
To be clear about the other thread about bypassing ad blockers. We will not have this feature when tracking user flow. People should have the right to block events if they want. But for basic info as in page views we keep offering the ad blocker bypass feature.
This also bypasses Ad blocker. In the case you have a large percentage of technical audience (who presumably would have Ad blocker installed) this log can be way more accurate than GA.
However this still requires setting up the image on a third-party server. I would really love it if GitHub pages or Netlify can provide some simple server-side tracking. It doesn't match GA but in some cases that's all I need.
That would be a great feature for GitHub Pages! Just a simple interface like https://simpleanalytics.io/simpleanalytics.io would serve most GH Pages use cases, I would think.
Not for me. `/pixel.png?` is blocked by default in uBlock.
Cloudflare does this for free, and you can run it on top of Netlify, Github/Gitlab pages, etc.
While this is true, we found a solution to bypass ad blockers (which could be implemented by Google Analytics as well). My experience is that ad blockers only block scripts and pixels that are implemented on multiple websites . With having a custom domain and a non analytics named script or URL, ad blockers are unlikely to block you. At Simple Analytics we created a feature for this where customers can point a CNAME to our server . We setup SSL and proxy all requests to our server. This makes it almost impossible for ad blockers to block the stats of those customers.
GA does a good job at extrapolating your data to account for users with ad block. Obviously not perfect, but good enough for most cases.
That said, behind the curtain, awstats has plenty of problems and shows its age. Most of it is a single ~20 kline script with hundreds of global variables, so it's very challenging to debug. There are no tests. Over time it also had plenty of security issues . I wouldn't recommend running it in any other mode than for generating static HTML reports from an unprivileged cronjob.
I've made my own test suite and I'm using a slightly patched version with ~20 commits on top of the latest release that fix problems I found and that upstream didn't merge (still from the times of SourceForge - since they switched to GitHub they do seem to be a bit better in accepting pull requests). However it doesn't help with submitting patches that, for example, concerns regarding GDPR compliance are met with responses like .
I always use awstats to generate static pages, and that's what any security conscientious operator should do.
I see users with a valid issue (even quoting relevant laws) being called names and told in a patronizing tone that widely accepted interpretations of said laws are wrong.
"We do not access or use your content for any purpose without your consent. We never use your content or derive information from it for marketing or advertising"
"That would be corporate suicide"
Yes sometimes protecting personal privacy might be opposed to a corporations profits. The horror of it!
Not doing so would be a major hit to their business. Whether you trust their reputation and compliance certifications is up to you, but a completely different issue.
Are you being serious with this question? Regardless of whether they look at your data behind the scenes without telling you or not, you don't need to know what type of exact data and metadata you have to charge you. My AWS bill charges me for CPU hours uptime for the server, network requests, load balancing etc. None of this needs them to know your data or even exactly much metadata (other than what domain and server to route to obviously but that's public information anyway).
By your logic, if someone is using end to end encryption, are you saying Amazon wouldn't be able to charge them because they can't look at the encrypted data?
Sorry, but :
> How else can they tell where your data is and track how much they need to charge you?
You say companies need to track metadata for pricing and they don't know where your data is without tracking? Huh. I mean, that answer is surreal. I thought you had a console to set everything up and pricing is based on cost + wanted profit.
AWS was like something of 33% profit. Nothing of their pricing is based on GA :p ( like wtf)
It's more work to do it this way and forsake the convenience provided by Amazon & Co, but I prefer it. Plus I learn lots of things while doing it which I would've outsourced otherwise.
There's also the problem that Amazon might stay 100% truthful and trustworthy and never access my data. But their employees might not. The NSA's surveillance data was misused by employees to stalk romantic interests, so there's no reason to believe that an Amazon employee couldn't wrongfully do the same. They might be fired for it, but the damage is done, privacy has been compromised.
I’m not sure why anyone would want to waste time with this.
Another fun fact. Since the tracking happens on the client side, there's potentially a ton of truncated data that GA simply misses. Backend server instruments don't suffer the same way.
Cloudflare provides all the basic analytics I need and I can parse the log files in the command line if I need more.
Can you tell me where and how you are parsing cloudflare logs?
On several projects, we've had success with a custom tracker that records IP, URL, referrer, display resolution, OS, and user agent to a local db.
To filter out bot traffic, we used Crawler-Detect .
The whole thing is just a few lines of PHP and JS, doesn't even require a tracking pixel (we grab most of the data from the user session).
A cron job moves entries older than x from the production db to an archive db.
Nice post! It's always fun reading about people being creative and challenging the analytics status quo (aka GA). Besides the joy of doing it yourself, you've accomplished a couple other things worth mentioning:
1. You'll never be sampled. GA samples historical data pretty heavily, and you have to pay for 360 to retain unsampled event data (at a tune of $160k+ per year).
2. You have full access to all generated data.
1. Sessionization, which is consistent with google analytics' definition - effectively a 30 minute window of activity.
2. User identification - the tracker drops a persistent cookie (just like GA), so you can see returning visitors.
3. Tools for splitting requests
4. A variety of event types, out of the box: https://github.com/snowplow/snowplow/wiki/2-Specific-event-t...
5. Ability to respect Do Not Track
6. Time on page, browser width/height, etc
7. Ability to make your event tracking 100% first-party
(Disclaimer: I don't work for them, but I've seen the system work very well a number of times.)
I'm running a similar setup on my blog, and it costs well under $1 per month: https://bostata.com/client-side-instrumentation-for-under-on.... I'm doing the same exact thing with Cloudfront log forwarding and have several lambdas that process the files in S3. From there, I visualize traffic stats with AWS Athena (but retain a ton of flexibility, since they are all structured log files).
I wish google would make search-keyword-hiding opt-in for users, and perhaps auto-opted-in if using incognito mode. I am sure most of my visitors would be glad to provide the search phrases knowing that it helps us make more thing and things better. But google does not let them opt in to sharing them, they are all basically opted out.
Give web sites a way to say - thanks for visiting, we noticed you are using a browser that.. or search portal that strips info form us.. would you please click to enable sharing this small bit of info.. more about how we use and what info here..
Something like this could help sites and users. I'd like to toggle it myself. I like how startpage scrambles url queries, but I would turn it off for some sites, whitelist them like some are with ublock etc. I also don't like how p-hub and some others keep queries in the url, and would like an option to scramble, with the site, via browser settings, proxies, whatever it takes.. to give more options, more choice.
Way back in the day, aw stats, webalizer, (and similar server side stats) would show which keywords were searched as totals, and show which pages were found by each set of keywords (and totals for each page / key phrase) - this info was valuable. I've not been aware of any way to gain that info since the google changes some time ago.
I could probably hack it and overload different HTTP status codes to mean different screen sizes or something, but I didn't consider device size to be important for me. GoAccess does break down the User-Agent into OS, so I can see mobile usage via the "iOS" and "Android" OS usage. Breakdown for my site: Windows 24%, iOS 22%, macOS 19%, Android 20%, Linux 11%, other 4%. So mobile usage is probably about 45%.
Can you give a screenshot? These stats look way off to what I'm used to.
Not sure how that holds up these days.
Google analytics has too many layers of UI.
I have found solutions with GA, matomo, fantom all to have image based solutions that you can use.
The audience who disables it is incredibly small and doesnt want to be tracked anyways.
It shouldn't be a factor in your analytics solution unless you want to track bots too.
And then there's the whole "data analysis" thing.
I've researched building out a desktop app that pulls GA data over the API in the background so you can get key stats out much quicker, but it's quite an investment of time to be beholden to Google's platform.
Now doing some dogfooding on a web analytics service I've been evolving that tries to answer the "why" of change in traffic/behaviour over time ("traffic's up today....not sure why?"). Google do this with their GA mobile app ("Insights") but what and when they show you don't seem to be too predictable.
True, but easier to learn than custom programming and visualizations. A smart English major can still figure it out on their own. Giving product managers the tools to do in-depth analysis is a huge plus. Otherwise, they just ask devs to run reports all the time, burdening devs and slowing down analysis.