Hacker News new | past | comments | ask | show | jobs | submit login
The scourge of web analytics (stavros.io)
160 points by stavros on May 29, 2017 | hide | past | favorite | 79 comments



Disclaimer: I'm a bit of a GA power user; although being on the technical side, I'm not really sympathetic towards the marketing uses for analytical data.

There's a tough line to draw between excessive and sufficient instrumentation for a given app, and a business will probably opt for more than less data. I'm not sure that's a moral argument against tools like GA, though.

Especially given one particularly compelling feature of Universal Analytics (GA's latest incarnation): you can definitely water down what you collect. (https://developers.google.com/analytics/devguides/collection...)

Want to opt out of collecting advertising info? Say so in the UI. Anonymize/obliterate the IP/geolocation info? Override the IP field to a static value. Override the intrusive collection of User-Agent values? Override the ua parameter. Prevent collection of granular page info? Override the dl, dp, dt fields. Stop collecting referral info? Blank out the dr, cs, cm, cn, and remove UTMs/GCLIDs from the URLs you report.

In fact, you can even implement measurement protocol serverside – GA's ancient forebear, Urchin Analytics, actually began as a server-log parsing utility! (https://urchin.biz/urchin-software-corp-89a1f5292999)

Reduce JavaScript use altogether but keep it clientside? Implement a static GA pixel:

  <img src="https://google-analytics.com/collect?v=1&tid=UA-XXXXX-Y&cid=somevalue&dp=%2Fsomepath&ua=YourUserAgent&...">
Now, mistrust of a third party with said data is a relevant issue, but slightly different from the technical bones of this piece's argument.


> Anonymize/obliterate the IP/geolocation info? Override the IP field to a static value.

This might affect the IP you see in the GA dashboard, but it's worth noting that Google still get the user's IP as long as the client makes any connection to a Google server (via JS or pixel).

> you can even implement measurement protocol serverside

This is the only way for a site owner to truly get full control over what data you send to GA (while still using GA), but it robs the site visitor of the option to opt out using client-side tools like uMatrix, &c. This is the one small advantage of the rise of client-side tracking: as a visitor I can get a reasonably good overview of which third parties your sending my data to.


Luckily, if you care about letting the user opt-out, you can respect DoNotTrack: http://donottrack.us/


That's a very interesting notion, I didn't know you could use a GA pixel. That's a much better alternative to the full-blown JS tracker, thank you.


What about running your own service like piwik?


That would be best, I tried to run it for me but my needs are too simple for a full-blown solution like that. I'm currently trying to just add a tracking image to my own server so I can count visits.


Server logs?


I'm using netlify, which is a global CDN, and doesn't provide logs :/


Curious how that pixel works... do you have to mouse over it, seems like if it was that case would be a bad approach/design, one pixel versus 1366x768 (at least). Also I wonder that code sample you posted isn't complete (realize why) but I wonder if it has bound JS or how it works technically (all in the source)... do you still have to include a script... I should just Google it haha.


Think it's been answered, but the link below might provide more information on tracking pixels/beacons.

https://en.wikipedia.org/wiki/Web_beacon


Thanks for that, I've seen/heard of tracking pixels but haven't tried them yet.


Google Analytics just allows you to GET a URL to send an event, which that tracking pixel does. Interesting that it actually returns an image, though, I didn't know that.


So that runs once when the page loads? It's not a continuous/real time tracker? Feel free not to respond as I said I can just Google. Thanks though.


Nope it will fire again after 30 minutes but not continuously


I don't think the web pixel will, that just runs on page load, unless you have some fancy JS to do other things (but then you could just call the event directly).


Yeah you are right sorry thats the session that finishes, had it configured a different way on a certain site.


Interesting for my own tracking I was thinking the next logical step might be to use websockets but I'm not sure.


I’ve been making web apps since 2003, which means that I’ve been doing this for fourteen years now, or it means that I can’t count. So, there are few people more qualified than me to tell you this:

The web is shit.

I've been doing this since 1996. I watched the transition from server-side analytics (one of my first jobs was implementing access log collection and anaylsis for a decently sized group of sites) to client-side.

I remember solutions like analog and on up to Urchin and then Urhcin's transiton to client-side and finally its acqusition by Google.

By 2000, anyone paying attention knew it would not end well.

So here we are, bloated web. I'll be mildly amused if sites are forced to return to server-side log collection because enough folks have started using tracking blockers.

Everything old is new again.

https://en.wikipedia.org/wiki/Analog_(program)


I'll be mildly amused if sites are forced to return to server-side log collection because enough folks have started using tracking blockers.

That's already available as a service from Cloudflare.[1] Since so many sites go through Cloudflare, they know what many large sites are doing.

[1] https://www.cloudflare.com/analytics/


I hope to have time to work on a server-side ad + analytics framework, purely as an exercise. Might work as a business as well, as a "boutique" version of an ad network for blogs and businesses who care about privacy/not shitting in their users' browsers. I noticed this kind of small network with "premium"/"nice" ads a couple of years ago, but I can't remember any names. I really liked the idea tho.


That's an idea I've had for a long time actually, but never made anything of it. Running ads server-side, maybe especially that the server can preload ads so there is no overhead for the actual users in loading the ad. That seems like a good idea to me.


You can have a static button for "Share" on Facebook. I'm currently using static image buttons that allow you to Share on Facebook, Tweet on Twitter, Share on Google+, Share on LinkedIn and Share on StumbleUpon (should probably remove that one) on my personal site: http://johnhaller.com/

The code for Share on Facebook is (hoping HN doesn't mangle this):

<a href="https://www.facebook.com/" onclick="windowpop('https://www.facebook.com/sharer/sharer.php?u=', 500, 500); return false;" style="padding-right:15px;"><img src="/path/to/your/FacebookShareButton.png" width="57" height="20" alt="Share on Facebook" title="Share on Facebook"></a>

I've been using static social buttons for a while now on my personal site as well as PortableApps.com to improve page load speed and user privacy. I've been building web apps for 20 years since the original Active Server Pages 1.0 and 2.0, so there are few people more qualified than me to post this comment :)


Thank you, but you forgot to include the "windowpop" function :)

I'd love it if I can replace both the Facebook button and GA on my site with something static, thanks.

EDIT: Done, the URL is:

https://www.facebook.com/sharer/sharer.php?u=<url>


Ah, quite right, sorry...

  <script type='text/javascript'>
  function windowpop(url, width, height) {
      var leftPosition, topPosition;
      //Allow for borders.
      leftPosition = (window.screen.width / 2) - ((width / 2) + 10);
      //Allow for title and status bars.
      topPosition = (window.screen.height / 2) - ((height / 2) + 50);
      //Open the window.
      window.open(url+document.URL, "Window2", "status=no,height=" + height + ",width=" + width + ",resizable=yes,left=" + leftPosition + ",top=" + topPosition + ",screenX=" + leftPosition + ",screenY=" + topPosition + ",toolbar=no,menubar=no,scrollbars=no,location=no,directories=no");
  }</script>


You can post content in verbatim by indenting it by 2 or more spaces (https://news.ycombinator.com/formatdoc)


Thanks, updated.


Thank you, unfortunately HN eats that. Here it is, better formatted: https://www.pastery.net/fwytsv/


Genuine question since you mentioned probably removing StumbleUpon. Does anyone share with Google+?


It depends on your audience. There are more tech communities on Google+ than any other in my experience.


Ah yeah for sure. I guess I was specifically asking for you. That's why I didn't want it sounding like I'm just dismissing Google+. If you don't want to share, that's fine.


Pop-ups? Don't be this guy.


I could never understand why any ecommerce company would use GA. They are collecting your user's data for ad targeting. Well, guess what? They target ads to your users for competitor's products.

You browse a site that has GA on it looking to buy a bike, then that user get hit with bike ads on other sites that host Google ads. Almost every site owner I've spoken to about this issue was unaware of it.

Server side analytics prevents this, or use an analytics package that has no advertising business.


99% of ecommerce folks are not a scale where this would effect anything, and even then google display ads are less than 5% of the usual online marketing mix. It's well worth the tradeoff for the power and flexibility of GA.


Why would scale matter?


As a small company, you are barely affecting Google's targeting algorithms on your own.


They target individual users based on their browsing history. Scale doesn't matter.


The benefit from using GA is enormous and the cost to implement is near-zero (for the company). The benefits you get vastly outweigh any potentially lost revenue.


Is that the main problem with GA or are there other problems? I have been thinking of a Google problems mailing list for while now.


I fully agree on the social buttons but Google Analytics is so much more than just a statcounter. Goal tracking and segmentation are invaluable tools that I believe no other tool provides.

And before you complain that 'spying on your users isn't needed to make money', try working at a large corporation. Things just don't work the way you (ideally) want to. You can only say "no" so many times before someone else steps in and does it for you.


My point wasn't so much "never use Google analytics", it's more "don't use Google analytics if all you need is visitor counts and referrals", though.


If spying on your users is needed for a company to make money... maybe that company should shut down as obviously their producs/services are not needed on socienty anymore?

I mean, something else really wrong must have happened in core business for this to be an actual issue.


I did not imply that large corporations solely exist because of using Google Analytics to gain insight into the value of their marketing efforts.

My point is this. If your company spends a couple million on online marketing / year, your boss will want to know if you're spending it wisely. And if it's your job to spend it wisely, effectively and report the results, good luck doing so without using tools like Google Analytics.

But regarding your opinion: I happen to know a couple of SMB's that do rely on online marketing to sell things they would not sell otherwise. They are good businesses with great customer service and experts in their field, but without the reach of online marketing they would indeed cease to exist, as their local market is just too small and their margins are under pressure from large retailers like Amazon.

Their margins don't allow for ineffective marketing campaigns and without insight they would fly blind. Are you saying they should just give up and leave the market to behemoths like Amazon? That's just stupid.


I apreciate the fact that you clarified your point of view, I clearly misunderstood.

A few things I would disagree regarding SMB's behaviour you describe (and I know myself a few of those too).

I believe they shouldn't try to play the same championship as Amazon alike behemoths. As you correctly said, they are usually short on margins and consequently marketing budget. Investing and trusting in marketing, event though not blindly, may not be the best move as they probably will never be able to compete with behemoths price/services. Which means you're already targeting a different market anyway. Great customer service and being experts in their field is where they should probably bet first in order to differenciate: Big corps don't have this. And they don't event care. This will most likely contribute to sustainable growth.

This said, I think it's pretty clear now what I think about behemoths ;) (I liked the term)

And I have nothing agains GA too.

(Also, sorry for the delay on the reply)


Is GA 'spying on users'? If so.. you should never go anywhere in public, never go shopping at a grocery store, etc. They all record everything that happens and analyze it to get similar amounts of data. Your bank tracks you way more (and sells all your data!).


> Your bank tracks you way more (and sells all your data!).

In the US they might. It's not true everywhere.


Sorry but that's completely off topic.. I never said that.

And yes, I'm aware of all of those.


I fully agree with the sentiment (if not the tone) of this post; namely, consumer tracking across the web has run rampant without much consideration of its long-term cost to both consumer privacy and user experience.

However, I do find the blog post's use of Google Analytics and the Facebook like button both indicates and perpetuates the very problem the OP is posting about. The simplicity and convenience to the content producer of using services like Google Analytics and of offering a means of sharing their content through a platform like Facebook clearly still outweighs any objections, moral or otherwise, to the use of those services/platforms.


It turns out that there are easy alternatives to this, and some were posted in this thread (like Shariff). I have replaced all the social buttons with static alternatives, and soon I will be able to remove tracking as well in favor of GoAccess, so it's not as bad as it sounds!


In addition, the blog post uses Google Fonts and jquery from a Google CDN. Could people please consider using a non-Google CDN for things that affect webpage functionality? Those of us that block HTTP requests to Google servers will then also be able to use the website. (In this case the webpage content appears entirely usable even with jquery etc. blocked.)


The problem with web analytics is that it is a low barrier to entry. I have a saying - 'if you can't code then do SEO' - and far too many companies have some inexperienced, non-technical squeaky wheel insistent on the overbearing analytics. It is not just the tracking scripts, there is a whole universe of snakeoil built on top of that. In ecommerce there are tracking script things that promise to deliver you the ultimate newsletter - if you sign up for it - and if you don't then all your interactions will be recorded anyway.

This is all fantastic, however, there really just needs to be some compelling CTA for the newsletter signup, which is doable. Instead though, this stuff you could ask the customer has to be magically inferred.

Another favourite is some stalking module for ecommerce that does personalised recommendations. This adds to the bloat and does 'jumble sale recommendations'. To do it properly, with code, is probably a simple SQL query but you still need to build, test, deploy such a thing in a professional way. But a third party script that magically does it, well, sign me up...! (Says the non-technical guy).

On top of that is the instant help widget that someone in marketing expects. There is no need for it if the website actually works and information expected is provided. Again this widget needs to stalk 2000 people for the 1 person that uses it.

So how do I face the challenge? Sometimes due to squeky wheel noise it is not possible. However, if you can do a better job of delivering what the required end information is, then you are good to go.

Some people use analytics for actual sales information, again because some report can be produced easily without any knowledge required. Again, to do a proper report probably needs some knowledge of SQL and coding. So the result is business decisions made on data that is not correct.

There is also a cost to some of these things, you can have a pop up giving customers 10% off their first order with yet more script going on. The script will charge a 8% cut to make it a 18% payout if anyone puts their email address in the popup box.

Then there is affiliate marketing, which is okay but another level where more script is added and more 'fee' paid out for the privilege.

So those scripts you don't see are costing money, real money. You as the consumer pay 10% of the product price to some mystery javascript bloaters every time you 'Buy Now' (which is always next day at the earliest).


For reporting, we use Django Explorer (there are definitely similar tools for many stacks). Someone who knows SQL prepares a query once, then the person who needs the report specifies the arguments and they get a well-formed CSV to download immediately.


It's odd to keep Google Analytics because you don't have access to logs while being able to provide self-hosted comments. Just add a self-hosted small image on each page.


Hmm, that's actually a great idea! Let me try that right now, thank you.


FWIW for non-tracking share buttons, there's also Shariff: https://github.com/heiseonline/shariff


That is fantastic, thank you! I'll add them to the article right now, and my site as soon as I can integrate them.


There are also many plugins for various CMS for Shariff, e.g. for WordPress I can highly recommend https://wordpress.org/plugins/shariff/


I've used Piwik for a few sites, I love that you don't need the JS crud to use it, makes the overhead almost unnoticeable.


I've worked in web analytics for the last 4 years. It's a bit annoying that OP doesn't differentiate "analytics tools" and "marketing tools", nearly every site I've worked on doubles the site load speed due to some arcane synchronous-only loading iFrame. Server side seems like a good idea until you realise how many bots there are on the internet (which the business doesn't care about), how hard it is to implement (one of the best and worst things about client side tracking is on a lot of sides it can be implemented and modified outside the deployment cycle) and how much information you get through these pixel calls.

There is a massive lack of technical talent in this field that companies are screaming out for. I'm very happy with it as a career. If you are interested in some advice on getting into it send me an email. Also happy to answer any questions.


Not to be rude, as I am genuinely interested in this area. But is there much more to it than adding a GA tracker and setting up the dashboard?


Depends on the role, the bad roles generally are just that (report monkey and a bit of project management around implementation.) The tooling is either Adobe analytics or Google analytics at most companies. There are 3 specialisations as I see it: data, outbound marketing and inbound marketing.

Data is to do with the limitations of GA and Adobe. Businesses want to use hit or session level data for segmentation, retargeting, personalisation and all the fancy machine learning stuff. This isn't easy to achieve with out of the box analytics tools so it requires a bit more data processing.

Inbound marketing is to do with conversion rate optimisation, SEO and UX. Questions like "is our left hand navigation better or worse than it was before?" aren't super easy to answer in a commercial context.

Outbound marketing is to do with return on marketing investment for paid advertising. Attribution modelling is a huge space which is VERY hard to figure out for a business.


What's the difference between the two in how they affect load times? As far as I know, there's no clear delineation.


Between analytics tools and marketing? You are right by name nothing, however in general if you look in the network tab some on content sites (especially crappy slow ones) you will see ads and other pixels being loaded onto the site as time goes on. Leave certain sites for an hour and they will make hundreds of extra requests.


Oh, by marketing you mean ads and things like that? I thought you meant optimization tools like A/B testing and similar services.


A/B testing tools slow down sites a lot but yeah I was referring to remarketing pixels or publisher side pixels


Your static images could do with high-DPI versions of them (via `srcset="… 2x"` and the likes), or using SVG, so that they’re not fuzzy.


Good call, they do look weird on my high DPI screen. I'll replace them with better-looking alternatives, thank you.


I was just looking for something to replace Google Analytics with the other day.

GoAccess looks great.


Here's a how-to I wrote recently, which includes automating GeoIP look ups plus 'Referer Spam' blocking with simple cronjobs: https://www.tombrossman.com/blog/2017/faster-and-more-accura...


So the author assumes that the reader doesn’t use ad blockers, nor incognito mode for porn. Who is this post’s audience? Fox News viewers? Yes, websites are bloated with ads and trackers, but we have tools to eliminate them.


This post's audience is the 3000 people that showed up reading it in my Google Analytics dashboard.


> You’ll notice my hypocrisy in having social buttons at the bottom of this post...the Facebook button likes the page directly, which can’t be substituted with a simple link, but I encourage you to use Privacy Badger...

This nukes the argument. At least for this page alone, trust your users to copy and paste a URL to content they like.


You're saying it doesn't matter if it's as easy as possible for people to share content?

EDIT: Reword.


I want to build and refine and improve and make useful and a lot of things, but not add credibility to such practices. And I want people to know what an URL is and how copy and paste works. Everything I write is secretly about wanting people to know what an URL is, anyway, so it would be kind of self defeating for me.

That Facebook ever offered these buttons without warning about the privacy implications in bold red letters put them clean outside any sort of human history I can be arsed to care about. Don't get me started on Twitter, they don't even need to snoop for that, haha. They're toys. Some people can't fit that together with reach or money or whatever, but all FB is proving is that toys can have success, too. We knew that, so in the end there is nothing there, except something with which we can diminish ourselves and those who visit our sites. This is my personal outlook and I know I'm deluded to think this holds any weight in the real world; but my inside weighs more than the world so that still doesn't move the needle for me.


Not right after claiming such buttons "do nothing for the user that a simple link wouldn’t" and assess their cost in "tracking and slowness". You apparently think they do something good. Otherwise you wouldn't include them.

My complaint isn't with your using social media buttons. (I hated them enough to block them; they no longer bother me.) It's in using them right after claiming they are useless and parasitic.


The argument was "social buttons are bad for privacy. The ones that don't do anything a link wouldn't are especially egregious". The Like button doesn't fall in the latter category, as it does something a link won't.


Simply use SocialSharePrivacy http://panzi.github.io/SocialSharePrivacy/

Buttons are all turned off by default and the user can turn them on if they care about them.

See https://www.schneier.com/ 's website for a demo.


Great project, thank you! Another alternative that was suggested above (that looks a bit better, in my opinion) is Shariff: https://github.com/heiseonline/shariff


They are different in the way that SocialSharePrivacy uses the actual badges while Shariff uses custom elements that link to the social network pages.


That's true, SocialSharePrivacy loads the script after they've been requested, which is the much better way to do this (if you're going to).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: