Hacker News new | past | comments | ask | show | jobs | submit login
Ad Blockers Are Also Changing the Game for SaaS and Web Developers (snipcart.com)
157 points by plehoux on Oct 29, 2015 | hide | past | favorite | 133 comments

Good! We need to ditch all the analytics software that rely on clients to make pointless HTTP requests. They are offloading their work on to the website visitors, wasting every user's bandwidth and slowing down internet connections worldwide.

If you want to know who is visiting your site, try reading your server logs.

Server logs are missing important information like screen resolution, navigator.language etc. Server logs also can't report on element events. Nor does it work with things like single page apps. How can developers make their apps better without having the proper analytics to do so?

We're dropping all third party domains at Userify [1] (plug: SSH key management software for EC2), but for reasons of both security and privacy.

What would happen to your website (or millions of websites) if one of the CDN's that you rely on started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?

I don't think that it'd be too hard for us to add a simple API call that pokes data about screen resolution, browser agent string, language, etc upon load or login, and it'll be far more efficient and private than us sending random data off to GA or similar where they frequently don't even provide us IP addresses of our own site visitors so that we can correlate the data against our own logs.

The data that GA gathers is highly valuable... to Google. They only provide you visibility of the tippy tip of the iceberg.. but ultimately it's your customers' and your data, not theirs.

Don't compromise your users with third-party includes, even Google Fonts (which is still our last holdout on the website.. hm, someone should make a simple web app that gathers names and styles of fonts and provides a zip w/ pre-generated CSS.)

CDN's sound great but they're a huge privacy hole. Ask yourself; what's the profit model? Are they really just an opportunity to gather valuable data on other people's websites and browsing habits? (yes).

Please don't leak your customers' data.

1. https://Userify.com

Same here. I never really used third party domain software. I used Google Fonts for a while, but I find it pathetic to have some font files and some CSS loaded from a different server than mine. Google Fonts (and every CDN) can be a lot slower than my own server. Sometimes, the site hangs while "waiting for google.com". Silly.

Also, Google Fonts might be Google Analytics in disguise. Who knows.

I use Piwik for tracking, but the self-hosted version. I don't even use newsletter services. I bought a cheap newsletter plugin for Wordpress which I use as an autoresponder email course.

But delivery is all that matters they say. And yet, all Mailchimp and Aweber and whatnot goes 100% to my spam folder automatically. I believe the delivery argument is a myth.

The best part: Decision making is much easier. "So, your product can't be installed on my own server? Bad luck, I won't become your customer."

> I used Google Fonts for a while, but I find it pathetic to have some font files and some CSS loaded from a different server than mine

You can have the fonts and not have any requests leave your site by using something like this[1]. It downloads the Google font data so you can serve the font files and CSS from your own site.


In theory, the point of Google Fonts is that it does user-agent sniffing to adapt the font and css to the user's browser, to get the best rendering. You would lose that advantage by hosting the fonts yourself.

In practice, I'd set up the CSS such that modern browsers render it beautifully and give a crap about older browsers. Or make a CSS for older browsers without that font.

That's great, thank you :)

> What would happen to your website (or millions of websites) if one of the CDN's that you rely on started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?

Most of ours would. Subresource Integrity means that all of our Firefox and Chrome users would get mostly blank pages if our CDN tried to pull anything. It's really hard to justify dropping our CDN when they do so much for our load times for people outside of the US (where our servers are located).

> Ask yourself; what's the profit model?

Well, we pay them, so I kinda thought it was obvious. I suppose they could be selling data as well, but it doesn't seem like a great strategy to endanger so much of their userbase when there's already a clear and profitable monetization model.

> What would happen to your website (or millions of websites) if one of the CDN's that you rely on started quietly issuing evil code to a few, targeted users?

What would happen if EC2 started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?

What would happen if Digital Ocean started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?

Great questions. People should host themselves.

In their own garage too!

> hm, someone should make a simple web app that gathers names and styles of fonts and provides a zip w/ pre-generated CSS

I used http://www.localfont.com to retrieve the Open Sans font I was previously using from google font. Maybe that's the kind of webapp you are looking for.

What about application performance monitoring? Usual SaaS solutions use tracking solutions similar to Google Analytics (and are blocked by ad blockers).

All third party domains - wonderful :-). Please tell us more - would like to follow suit (am revamping my site. Where I have a different privacy protection strategy - no-one ever visits so no privacy loss)

What are you asking? How to host your own css/jquery etc?

(sorry if I'm an annoying old fart in the rest of this remark but I went back to desktop software 15 years ago (yes that sounds weird) and the last 2-4 years I have more and more troubles understanding discussions on modern web development as everything that was once considered Very Bad is now not only encouraged, but taken as the natural state of affairs - e.g. javascript for core functionality, the 'css is bad, do it in javascript' movement, 'semantic markup should not even be attempted', 'frontend frameworks', ...)

So yeah my question is not sarcastic, I'm just asking for some context.

Well partly I'm asking the same question - what am I missing? CSS and other CDN based stuff I get, but hosting my own analytics? Where does one start - what can be measured, what is worth measuring? What else is done on these 10MB JS downloads we all get these days?

I suppose there is a job of work to be done downloading the top 1Million web sites and seeing what crap comes through the door - but would be nice to know what the OP is replacing and with what so I know what is derigeur these days

Oh OK then we're on the same page and I don't have much advise.

Maybe just that you can look at Piwik analytics - open source and you can host it yourself.

Html/dynamic pages, images, CSS, js, fonts, analytics all from the same server (or at least same domain), and as much squashed together to avoid requests, that was the 'best practice' when I was still 'current'. I don't really understand either what else there would be.

You can certainly obtain this information with your own javascript and record it on your own server. Combine it with your server logs and you have can most of the same information you would have had from GA or another hosted solution.

Agreed! but s/most/all/g FTFY :)

Actually, GA only provides you with a [meager] subset of the data that Google gathers. Now it's their data.. not yours, and not your customers. They won't even give up IP addresses so that you can check it against your own logs.

Yes, and although that would remedy some of the tracking/privacy concerns with GA, the (grand)parent comment's concern with bandwidth would not be solved but transformed when taking the road you describe.

I have build myself a little rails gem to help with that: https://github.com/KaktusLab/sql_metrics

I've had pretty good experiences so far with Ahoy - https://ankane.github.io/ahoy/

AFAIK, you can use even use google's javascript library.

> How can developers make their apps better without having the proper analytics to do so?

Of course. They can just ask people.

Let's not pretend that all this data is for developers. It's only for a) advertisers to shit more on your users, and b) sales to micromanage the site into getting more conversions, usually at the cost of utility.

That's good, you have no business knowing my screen resolution, browser language nor lot of other information. You just feel entitled to have them, for some unfathomable reason.

Just prepare your web{site,app} and have the webserver serve it to my browser. If it is too large for my screen, my browser has these newfangled things called scrollbars to deal with it, and I will see that my screen is too small to comfortably view content you graciously share with me.

I couldn't agree more. Why can't web developers produce websites that just serve clean standards compliant html+css(+/-javascript where really need -- no highjacking scrolling is not a good use for javascript) and trust my browser to display it how I like it?

> How can developers make their apps better without having the proper analytics to do so?

The same way as desktop developers do. Test things yourself and get you mother (or some non-techy person) to try it out and see how much she/he swears when attempting to use it.

The same way as desktop developers do.

Well, did - I regret to tell you that desktop developers are now using analytics as well, using stuff like DeskMetrics and Trackerbird.

As someone mentioned below, there is self-hosted solutions to analytics tools like the free software project piwik. Nothing the app development process require that you send your customer data to a third-party for collection and processing.

Store that info and bundle it into the next request.

Why don't they proxy host client-side analytics on their side, and don't put the word analytics in the endpoint names?

Most ads would be unblockable if you made the ads come from your domain and have them indistinguishable from your normal content in the URIs as far as I know.

IMO they will do this out of necessity soon enough.

This will mean advertisers can't count exact hits to their ads (or at least would be foolish to do so) so they will probably have to employ some kind of web crawler or HIT services to randomly sample the sites who are supposed to be serving their ads to make sure they are.

But eventually the blocker technology will become much better at blocking page element ads more easily and automatically. Then I guess they will have to think of something else.

This could be a huge opportunity for someone like Cloudflare. They could proxy agreed upon URLs to advertising networks and everything would originate from the same domain. Advertisers would establish a business relationship w/ Cloudflare and could trust that the traffic they are receiving is more or less legitimate.

Analytics are different from ads. When you're running analytics on your own site, you can trust your own logs. But when you are displaying adverts, the ad buyers will likely want better proof that their ad was displayed 'n' times.

They should crawl the website x-times a day (sample) to check if you ate displaying their ads. Or release a binary blob so that you can host their ads on your server (which reports back to their ad network once per hour)

Do you really think this is a feasible solution?

What are your objections?

Honestly, I'm tired of pointing to people how superficial their understanding of certain topics is and how ridiculous their "solutions" sound.

I suggested this the last time this issue came up [1]. In the case of ads, some people are saying that it becomes a problem for advertising networks to actually verify the traffic if the connection is proxied, though I think there can be solutions developed for this given the enormous amount of money being lost. As far as analytics, this shouldn't be an issue because the primary recipient of the data is the site itself, and cheating on their own analytics would only make sense for a couple of applications (trying to sell the site based upon fraudulent analytics etc).

[1] https://news.ycombinator.com/item?id=10185256

I may be misunderstanding this but if my users download a pixel/whatever from my site not from google analytics, then google does not know anything but my server IP

Or does client side mean the user downloads JS which does some investigation and reports back?

Maybe the people purchasing ad space wouldn't be able to verify impressions?

How about static ad-pictures served from the ad-network's server? The ad network can track the impression. That's how it all began back in the nineties, and it worked fine. Everyone would be happy.

You can't read their Google Analytics cookies in your server logs though.

For clarity, as some replies seem to have missed the point: because conforming clients will not send the cookie in the origin server request due to domain name differences.

Most web servers can log cookie data or even arbitrary HTTP headers, you just have to modify the log format.

The point is that third-party cookies will never reach your server (in this case Google Analytics' cookies), because the request they are piggy-backed on is going to Google, not to your server.

Unless you patch your web server of your choice.

Analytics JS is not pointless because it is part of the page/service that the visitor is requesting. It is a legitimate form of data logging just like Apache or Nginx logs and reading cookies. Logging is a necessity for troubleshooting and improving any software service.

It's no more pointless than using Adobe Typekit for fonts or pulling jquery from a CDN. Both of these use a lot more bandwidth than an analytics ping, BTW.

The point you are missing is that users don't need to send an analytics ping for the page to load. If you were to conflate your jquery load with analytics so that the page was broken, then you could prevent the use of ad-blockers (although you're just as likely to make people think your site is just broken).

The use of bandwidth is a moot point. If advertisers hadn't abused the user's good will we wouldn't be where we are today, but there's no putting the genie back in the bottle.

If we're talking about "need" then the conversation gets sticky pretty quickly.

Users don't really need webfonts; text just needs to be readable. And they don't need to grab jquery from a CDN; it can come off the server. Heck, they don't need to grab anything from a CDN--the app server can serve images just as well as Cloudflare.

Do users really need DDOS protection? No, that protects the server, not the user. And it banks their legitmate request--needlessly--through a proxy server, which certainly adds latency. Not to mention that Cloudflare can see everything they're browsing.

Do users need single-page JS apps at all? Why not just render HTML4 on the server like in the good old days?

So why do developers use these technologies? Because it makes the user experience better. And so do analytics. Without analytics, developers are flying blind. And no, Apache/Nginx logs don't capture the same data--especially with more advanced JS-heavy sites.

Users want websites that load fast and are easy to use. It is impossible to build or improve such a site without data upon which to base decisions. That's why analytics are a necessity.

It gives you a bit of extra data, I do use it on my site and all my client's sites, but in reality it's actually pretty invasive privacy wise.

That's not a valid comparison. If your site needs a custom font, the users have to download it from somewhere - the data needs to be sent. For analytics, the HTTP request can be dropped entirely, since the data can be extracted from the web server logs.

For analytics, the HTTP request can be dropped entirely, since the data can be extracted from the web server logs.

No, it can't, at least not in general. That's what others here are trying to explain to you. It can be very useful, and in both the visitors' and the host's interests, for someone operating a site that has a lot of client-side interactivity to see what's really going on, for example.

> software that rely on clients to make pointless HTTP requests

Somebody please kill HTTP-DASH

Yes, but you can't then "connect" this information using a cookie from another domain, like e.g. Facebook is doing with their embedded like-buttons.

This is why I use server side analytics almost exclusively. You just can't rely on every browser running all your javascript.

With ad blockers, user tracking gets harder and harder, and this sucks when you want to understand the effects of changes on your own website, such as changing user menus, moving content blocks from a position to another, or even to a different page, etc. The value of partnerships with other websites gets harder to evaluate too.

Hosting my websites on Linux environments, I never stopped using server-side generated statistics (based on Apache's access logs) with tools like AWStats. Used together, I feel that client and server-side stats give a much better image of what really happens on my websites.

But in the end, I feel awstats isn't enough, and I'll take a look at other solutions that have been pointed out in these comments. Thanks!

Is Apache reporting back to the apache foundation about how you are using your apache software? Maybe your expectations about user tracking are the problem.

I'm pretty sure they aren't, but I must admit, I have never checked. I supposed that would be known by now... One thing is sure, access logs are located on my own VM, and are processed locally by awstats. The only thing that binds requests to an actual user is the IP address, and one must contact ISPs to have more information. I don't have the authority to do that.

What interfaces akin to Google Analytics read web server logs?

Piwik can. Run a self-hosted instance or pay them to run it, and it handles server logs no problem.

You do give up the ability to get live stats, but you get better performance and the ability to track more visitors (read: people like me who have blocked GA for years...)

Added bonus: Referrer spam is automatically blocked by default.


EDIT: Almost forgot GoAccess, if you are okay with a terminal app and want live stats (can also be scripted to generate HTML reports) - http://goaccess.io/

Piwik is astonishingly good. It'll give you trouble if you have a lot of traffic, but for moderate sites it's pretty much a completely self hosted drop in replacement for Google Analytics. You can track conversions, set up campaign goals and view statistics overlays for example. Not all features are available when using the log import backend, as compared to the javascript thingy, but most are and it'll give you an overview on how your ad blocking customers are doing.

Quick note that with hourly, or more frequent, cron jobs you get quasi live data. Mentioned a little about logimport in a recent post on stripping external calls and pointless js/css/font loads from our site. Otherwise well put, same experience here.

Btw, tail -f /log.log is always fun for live data...

Yeah, I build a system in the late 1990's that did something like that. Required a dedicated server though, the web server farm would copy over their log files to this analytics machine with cron jobs, which would then dynamically aggregate that data and update the stats cache.

Charles, Snipcart co-founder here. As mentioned by the end of the post, we're actively looking to expand our analytics game a bit. Piwik seems like a good place to start, so thanks for sharing. Cheers!

Try running it alongside GA and comparing stats. I did this for a while and switched to Piwik as it consistently counted more visits (legitimate visits, obviously).

Also, please consider adding your Google Analytics use to your privacy policy. They require it, even though most people ignore this requirement and there is no enforcement.

Side note, I just tried @ replying to your companies post on Twitter and got a @your account may not be allowed to perform this action' http://i.imgur.com/9uoEn9B.png Strange, never saw that before...

We received some mentions in the last minutes, and we just reviewed our account settings, nothing seems problematic on our end. Weird indeed. But thanks for the feedback!

> (legitimate visits, obviously)

That's an important point; I'm seeing referral spam in my GA reports which I never noticed before 2015. I manually discount it when compiling reports, and my understanding is that these spammers hit GA UAs at random without even loading your website.

It's incredible easy to filter this out in GA. Just set a filter or segment that only counts traffic from your known hostnames.

aside from the ELK stack, you can also send views/events from your server to GA (this is what I do). Staccato[0] is my go to gem for this

[0]: https://github.com/tpitale/staccato

keep in mind that you can push stats into GA from the server. I actually made PoC for pushing server stats to the GA - https://github.com/hippich/server-analytics

The same way you can gather all available stats on server side and push these yourself. You can override IP too. Obviously not everything will be available, like screen resolution, but you still can capture a lot of stuff.

How do you filter out crawlers, bots, pre-fetchers, etc.?

I've personally never even tried using server-side analytics because I assumed there would be so much noise it wouldn't be worth the trouble.

Every well-behaved bot (ie, almost all of them) has its own user agent string. You can just grep -v those out.

I wish that were true, but I'd estimate that less than half of the bot hits on my website properly identify themselves as bots. Most of the crawlers running on DigitalOcean/AWS/etc. seem to use a user-agent string lifted from one of the common browsers.

Check for the order of the headers and the TLS protocols it claims to support – those are useful for identification.

I must be getting old, never? There's a lot of techniques like doings stats on blank pixels and such that have been around for a long time. A lot of ad blocking techniques start at the DNS of the hosted resource, so having a single domain for everything (if practical) can present some reliable information.

> Needless to say, we were slightly irritated at the fact that a valuable feature for our merchants, totally unrelated to online privacy issues, was blocked by the software.

Why would the author think tracking/analytics is "totally unrelated to online privacy issues"? Baffling

The information blocked had nothing to do with tracking and privacy. It's a page that displayed the customer's sales performance within the application itself. So it IS "totally unrelated to online privacy issues"

I've a business that rely only on online advertising and directly lose revenue from ad blockers. Never felt irritated about that, it's just fair game.

As a publisher, if you want to be profitable, you have to load a bunch of crap from Google, Criteo and others. And, I feel the quality of the JS loaded are slowly degrading. That's shameful point. Advertising networks should buy the best JS talents and release top quality JS. Until then, people should use Ad blockers.

It is not 100% true that publishers can only be profitable using ad networks that degrade quality.

Anecdotal evidence is myself who recently removed Adsense and only direct sells ads. Positives of direct selling is that I get 100% of the sale and I have better control over the ads on my site (static image graphics, no popovers or interstitials). The former makes me happy, the latter improves the experience for my readers.

That said, I do use ad blockers from time-to-time myself because some sites have gotten so ridiculous they are absolutely unusable - I'm talking to you Epicurious and Bon Appetit!

Yes, and you also have to manage a team (even if that is yourself) to monetize your ad traffic.

There is no way to scale running your own internal ad network unless you have scores of folks to manage the marketing of your property, the managemnet of your ads, contracts, receipt of payment etc.

I think you meant to say "lose" not "loose".

Because if I understand it, the people who were blocking "api/analytics" we're actually paying in order to see the page that was being blocked.

Just because you're handing over money, doesn't mean you are happy to relinquish all of your privacy. Entering a very grey area.

It was not about tracking the users, but about showing tracking data in the dashboard to the users itself.

Note that the key issue in this story is not the blocking of analytics tools, but that they had a legitimate URL in their app of /api/analytics and it was getting blocked. That's quite a problem, especially as I see things like /js/dart.js also in that list which could destroy an app's functionality if you knew no better.

Worth having a look at https://easylist-downloads.adblockplus.org/easyprivacy.txt

There are some possibly problematic blocks with the words 'analytics', 'log', 'event' that might be used in a log viewer, for example.

Also worth noting that I don't think EasyPrivacy is on by deafault in uBlock.

>Also worth noting that I don't think EasyPrivacy is on by deafault in uBlock.

It is. See the readme: https://github.com/chrisaljoudi/uBlock or https://github.com/gorhill/uBlock

Interesting list. Grep on `.cloudfront.net`, hopefully AWS don't reuse these.

They don't. Same origin policy, cookies etc.. You just don't.

Google recommends against hosting their tracking javascript locally, but that in combination with a server-side proxy (POSTING to https://example.com/a forwards to https://analytics.google.com/collect) might be the most resilient to ad blocking techniques.

Wow. This would be great; once I have a few examples of this kind of behavior, I'm certain I can get a lot more people to turn off Javascript by default. (I know quite a few people that don't want Google logging their web browsing and who take active to prevent it)

This is another "tragedy of the commons" situation, just like what we've seen recently in advertising, where escalating behavior forced a backlash. If you want to prevent a future backlash against analytics, stop aggregating the data.

Serious question that I haven't googled, but for analytics logging what should a site administrator use that anti-tracking folks would be happy with?

Others here have suggested "processing the server logs", but is there some sort of locally hosted thing I can add that will help me get the same stats google analytics does? Or are there any third party hosted analytics services that provide similar services without being aggregated?

(my apologies for a late response)

> third party hosted

That is the exact thing that needs to be avoided. By using such a service, you are allowing that service to aggregate browsing logs.

It doesn't matter if any particular site logs it's OWN requests; it is expected that if I ask you for a page, you (as the 2nd party) may choose to remember that transaction. Without aggregation, any service only knows about the people that choose to interact with that service. This mirrors fairly closely traditional expectations where e.g. a shopkeeper knows that you walked into their shop, but most people would find it more than a little creepy if that same shopkeeper allowed a 3rd party to kept detailed notes about their customers.

The problems start when you decide to let other people eavesdrop on what should be a two-party transaction, especially when they have access to a lot of these interactions. By aggregating logs, the knowledge about someone changes from known that they used a particular service, to knowing their pattern-of-life[1] (and more).

> what should a site administrator use

I'm truly sorry that there are not a lot of options (that I know of) for better server-log analysis. This area has suffered a lot of damage from the Service As A Software Substitute[2] monopolies.

I suggest pressuring vendors for better analytics software. There may be a market for better local-only, no-services-involved server-log analysis tools. Until such tools exist (or are found), you're in a hard place, because lack of tools is not justification for betraying the activities of your users to a snooping 3rd party.

[1] https://en.wikipedia.org/wiki/Pattern-of-life_analysis

[2] http://www.gnu.org/philosophy/who-does-that-server-really-se...

Nothing that both parties would be content with.

Yeah, this feels like the next step in the arms race. Just use some random URL to post this to, to avoid any regex based ad blockers.

We will then have rules based on file hashes. People will obfuscate them, so it'll now be based on function signatures and API calls (get locale, screen size, OS, set cookie? No network requests for you). When that's bypassed, maybe we will see DPI of network requests for patterns.

The arms race is only going to continue if trackers play the game. I think we'll probably see server side analytics instead, with Google and so making Apache/nginx/express modules and middlewear.

Good. I would like to encourage people to move to a whitelist based approach to filtering javascript...

Right. Time to dump Google Analytics, which is now heavily blocked, and start processing your server logs.

You can dump events at Google Analytics, too, and leverage their infrastructure (instead of expensive options around handling your own).

Perhaps the SaaS industry could use its cloudy-buzzword momentum to rage against the ad industry which necessitates the use of these plugins. I really have no problem with most ads. It's the one too-invasive full-page flyover or auto-playing video which ruins it for everyone else. If the ad industry was sensible and moderated itself, there would be far less ad blockers out there.

But it's not just ads. I don't want Google Analytics putting cookies on my machine and then tracking every other web site I go to. It's a privacy issue as much as an annoyance at ads.

It's instructive to play with NoScript, Lightbeam and Cookie Controller in a fresh VM, connecting through a fresh VPN exit.

I think it was defenetly a factor in Googles early success that they were subtle and unobtrusive about the way they did ads.

Google ad servers also power a lot of the worst formats today, just not on Google properties.

A quick scan of the EasyPrivacy list reveals partial or full blocking of the JavaScript trackers and/or event collection endpoints of the following YC-backed SaaS analytics companies:

  - Mixpanel (17 matches)
  - Heap (heapanalytics.com^$third-party only)
  - Segment (11 matches)
Disclaimer: co-founder of Snowplow Analytics, a first-party event analytics platform (https://github.com/snowplow/snowplow). I see 2 entries in the list related to Snowplow, and 26 for Piwik (https://github.com/piwik/piwik), another first-party solution.

That EasyPrivacy apparently even blocks self hosted first-parts solutions is really bad. They are the good guys.

Well, any blocking of self-hosted first-party analytics is easily circumvented by the site owner: you just rename the JavaScript Tracker filename to something new (or even safer, just minify the code into your own JS bundle), and put a new CNAME on your event collector.

For SaaS analytics companies, take a lesson learned from online advertising: host your corporate site and dashboard on a different domain than your ads/analytics pixels are hosted on. That way if your domain inevitably ends up on ad block lists, your corporate website and dashboard still work.

I'm working on a collaboration tool that uses google drive APIs, and we occasionally get e-mails from people claiming that the product is broken, but in 99% of the cases they installed disconnect or an overly zealous ad-blocker to explicitly block access to drive APIs. I assume this is just a blanket block on anything being loaded from google domains using ajax.

Even though the error message suggests that they blocked it themselves, somehow that thought doesn't come across to users before they start complaining that our product is broken. Given that google loads APIs mostly in the background through several chains of dynamic JS, there isn't much to do about this, really.

I've always used both server side analytics and JS based analytics, some external analytics tools have been long in a black list by EasyPrivacy and their constant addition and removal of those services from that list means sporadic stat charts.

The only problem with server side analytics is that they're pretty limited with functionality unless you have your own internal analytics system which can track alot more information than just page views and general traffic stats.

This makes me wonder at the next generation of Web-based spyware.

Presently, surveillance and monitoring is accomplished through third-party requests. Including, yes, sites' own monitoring tools -- Google Analytics, New Relic, and related services.

If the monitoring can be brought in-house, and assembled back-end on the server side, so can the advertising. Which will means that the present generation of site-and-domain based blocking will eventually become less effective.

Have fun storming the castle, kids.

They already go beyond site-and-domain based blocking. Most Adblock extensions support CSS-based rules that will block based on CSS IDs, class names, div-sizes (468x60 is a common ad size) or more complex rules.

For the sites that can't be blocked with those, there are even more advanced blockers via Greasemonkey/Tampermonkey JS scripts.

I actually remember the _old_ regime of userContent.css with hacks for specific banner sizes and such.

I've found a set of rules which are quite helpful at removing / reducing online Web annoyances myself, including killing variants of interstitials, popups, and flyovers. To the extent I'd strongly recommend current Web devs avoid use of those and similar terms in their own CSS.

(ProTip: if you're calling an element "nag" or "tease", it probably shouldn't be there in the first place.)

Probably a good idea to use one of these on your SaaS portals for safety: https://github.com/nicjansma/adblock-detector.js , https://github.com/sitexw/BlockAdBlock

What is the best first-party web analytics product?

Pwiki (self hosted version), it is pretty similar to Google Analytics.

Depends what you mean with "best". As suggested, piwik is good. There is also snowplow.

Is one way around this to have analysis packages on my site? Is there now a upload your raw stats and get a nice analysis SaaS?

I run uBlock Origin and I block third party cookies. Because screw trackers and spammy ads.

Unfortunately you're blocking people self-hosting piwik.js too.

Why are first-party analytics like pwiki blocked? They are the good guys... it's like reading the server log, except you get also the screen resolution and a few more infos.

I don't mind if a website owner gets that stats for himself. What I (and a growing number of people) mind is that huge corporations collect that stats and then track me down and crunch the data to show me "personalised ads" (of products I already bought) and categorise as a person of interest group XY and then selling that data including my email address to evil spammer YZ.

> except you get also the screen resolution and a few more infos.

It is quite the sense of entitlement to think you have a right to that information. You have a right to log what people ask you (the server logs); logging anything more just makes you a creepy peeping-tom.

> I don't mind

That's nice. That doesn't mean everybody else agrees.

You're right, in general - it is the aggregation at Google (et al) that is the real problem. My point is that it is highly presumptuous to assume everybody is ok with a particular type of logging.

Well, isn't that a bummer. I'd block it manually if it weren't blocked automatically since it uses my resources (including, but not limited to, simply requesting and downloading the file) to do things the server ought to be doing at its own end. And no, you don't need to be more granular than that.

That comes across as quite an entitled attitude. It's not really using your resources, that's the package deal for the site. And there are plenty of reasons for granular data - or do you claim to know every use case and rule over the internet?

There is no package deal for websites. I know website owners really wish there was, but it's just that - wishful thinking.

Maybe for sites which require registration (and hence explicitly accepting terms) that could be argued, but there's certainly no implicit agreement to download and process all the stuff the site is offering to the browser.

And "entitled" is just a lazy insult.

Shit, do you block Mustache/Handlebars too because it's using your resources to render a template? That's quite a reductive argument - you get to the point where you don't turn the computer on because it's using your electricity.

You know, quite a lot of us do use NoScript, so that reduction to absurd is not really that effective. As for using electricity, that's why stan_rogers added the caveat of "do things the server ought to be doing at its own end".

Yes. please! voltagex_, do you and webmasters a favor and stop using a computer!

It's good to be proactive about such things as a developer. But it's also worth considering that if you're using things that take such a zealous approach to blocking that you are trading some of the UX of well-behaved sites and apps to deal with the poorly-behaved ones.

Unfortunately the well behaved sites are one in a thousand. to cater for those you need to expose yourself to a wide range of threats and spies. not to mention making your eyeballs bleed and your sites perform so poorly it isn't funny.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact