It took us only a few weeks to write our home-brew
analytics package. Nothing super fancy yet now we have
an internal dashboard that shows the entire company much
of what we used analytics for anyway - and with some
nice integration with some of our other systems too.
And to echo other posters: SpiderOak deserve thanks. If I find myself with any need for a service like theirs, I know I'll be looking at them.
Ah, the "not invented here" syndrome!
There are tons of things that you could do "in a couple of weeks" that more or less work. However, it doesn't mean you have to or even that it would be a good idea.
If all developers adopted the attitude that you have expressed, there would be thousands of sad sad developers who need to maintain shitty in-house analytics system because someone once said "I could do it in a week". There are tons of awful CMSes already because someone once said "I could do better than wordpress" / "I could create a better framework" / etc.
In a lot of the cases, GA is just good enough. Sure, you might need to spend some time to explore its features (custom dimensions, etc), there's more to GA than a number of pageviews for a given day. There are cases when GA is not enough. Fair enough. But it's definitely not the majority of the cases.
Sure, it makes sense for SpiderOak given it's target audience. However, there's no need to make such a generic statement about 'anyone working in the tech'.
Then the question is do you really want to maintain the infrastructure required to run the analytics smoothly? Especially if your company has dozens of millions of pageviews a month and depends on the real time needs (extra infrastructure to support that).
Are you familiar enough with the stack so you could have a high degree of confidence that you can fix productions issues which are inevitable? Quite often, an honest answer here is 'no'. Then can you afford to lose a few hours/days/weeks (whatever it would take to fix the issue) of data? Again, often the answer here is 'no'.
Of course, you have hosted solutions. But they are no better than GA in terms of privacy.
Paid support exists too but the cost can skyrocket pretty quickly, on top of paying for the infrastructure and maintaining it.
The code for this is generic. An open source solution costs nothing beyond some CPU to process the logs and a database to store the analytics.
b: I don't have the study off-hand, but IIRC some guy after finishing his masters from Stanford wanted to assess how much information Google had re: an average users browser history. The findings, based off Common Crawl data of the top 100k sites + presence of GA.js yielded something like ~> 75% of the web was tracked (not to be confused with how much of an end-user's traffic is tracked, that number will be far higher) based on sites with a GA.js history factoring in Referer tags. Those were unweighed numbers, i.e., I bet more than one out of two 45 year old woman's traffic can be analyzed to a 95% degree of completely entirely based off of Pinterest, Facebook, search history and the outbound links from her e-mail.
I've had "simple foss analytics" on my todo-list for quite some time. I'm hoping one can build on what piwik have collected wrt bot agent strings, ips etc - and combine with a simpler collector (adding php to the stack just for analytics isn't very appealing, never mind a php codebase of somewhat questionable quality).
Snowplow looks good, but I'm not sure if they have a supported "self-host" stack yet (they started out very awz/s3 centric).
I actually think there's room for a new product, that puts a little bit more thought into what questions it makes sense to ask, and how best to answer them (eg: does collecting metrics on every visitor even make sense if you can answer the same wuestions just as well by doing random sampling? You might want to quantify where your bandwith goes - but simple log analysis might do that easily enough - and it might have very little to do with your human visitors etc).
Perhaps systems administration is somehow very cheap for you, but I'm willing to bet it is still not "nothing" - even if the cost is you personally not watching a TV show you like because you're patching the web server on your analytics box for your personal vanity domain, that's still a cost.
For most operations, sysadmins are somewhat expensive, and because of that, busy. This is why Urchin was such a good idea, and why Google bought them - the proposition is to trade your users' privacy for the admin time it takes to support another internal app. There's an absolute no-brainer, assuming you don't care about your users' privacy (IIRC, they were going to sell the service before Google ate them, but that's ancient and trivial history).
If you're business is so small that an additional low-volume web server just to display your analytics (you don't need one for the actual tracking) is a big deal, then the same web server that serves your product can serve your analytics. Not a big deal.
I applaud SpiderOak, but they are much different from most other sites. They have privacy conscious customers to begin with, this is something that is good press for them and probably a net positive on their bottom line for doing it, not the case with most other sites. Also it's something they are doing after having a very mature product for many years, clearly not the first or most important thing they needed to tackle as a company.
GA is mostly used by people that don't need it, yet want to pretend they get actionable data out of it.
I bring this up because people had been slamming moot for using GA on 4chan instead of piwik without understanding why.
Look at the comments from sandfox and afterlastangel in this thread. afterlastangel is pushing a billion, sandfox is around 300 MM per month.
Piwik is still using (unsalted) MD5 for passwords in 2015, and probably will still be using unsalted MD5 in 2016.
I can't believe unsalted MD5 is "by design" (https://github.com/piwik/piwik/issues/8753).
Considering Piwik is used by the GCHQ, I find it hilarious.
I don't find it using poorly implemented hashing in the administrative interface to be at all relevant to what they're doing, or why they shouldn't be using it.
Given these known security flaws, it's not a stretch to assume anyone who can see the GCHQ's Piwik server can have that data too, regardless of whether they are authorized.
See below for a small preview of what an attacker could exfiltrate (dissident IPs redacted for a reason):
While we're talking about poor security practices: the privileged username in the screenshot is apparently still the default ("admin"), so I hope the password isn't still "changeMe" ... http://piwik.org/faq/how-to/faq_191/
Too bad the interface on the Azure Portal is terrible. They spent too much time making it look fancy, and not enough time getting the 101s of usability right (which is a criticism I'd lay at the feed of the new Azure portal in general).
Probably the vendors of the software concerned. Perhaps it started out as a list of three with a major bias towards a particular product. And then the competitors responded, moderators did their things and eventually an accurate list was evolved.
Am looking to use this in lieu of Google Analytics.
EDIT: Sorry, I've been dealing with uBlock Matrix for too long, and forgot how advanced the other blockers pattern matching is. See the many responses to this for better information.
Every service does. Pingdom, GA, Olark, Github...
It took them a few weeks to write their own analytics. What features did they not implement? How many people worked on it?
Does your 1 or 2 person startup have 4 weeks to write their own analytics package or do you have more important stuff to do? (I'm betting you do. Like launching your product instead of re-inventing the wheel with analytics)
It's almost never "developers" who are deciding to use GA; it's middle managers or marketing departments.
So it's not hard to imagine marketing wanting it; presumably it provides them a lot of value that wouldn't be easy to recreate in-house.
That maybe no option...
GA is rather deep, with tons of integration and ways to slice and segment data.
Yeah, maybe in a few weeks you can get _something_ that'll give you something that'll make some manager not too unhappy. Seems like a terrible value prop for almost all companies since, unfortunately, approximately no one cares (or they run adblock anyways).
Not hard to track page hits, time on, time off, and arbitrary events.
Seriously? Folks, it's a table for analytics events, a few SQL queries to do basic reporting (at least in Postgres), a little bit of client-side JS to post the events, and a bit of server-side code to create the routes and maybe display the report page.
I guess if it doesn't include Kafka, Mesos, Kubermetes, Neo4j, and Docker, it isn't delivering business value.
People are prematurely optimizing if they fear is "but but but mah datamoose".
Also, it's not prohibitively costly if you do even slight batching of the events, say batch load between every five minutes or an hour.
I'd love to hear somebody with war stories chime in though!
You're missing the point of why "rolling your own shitting [sic] implementation" is worth it: it's not the speed, it's the privacy.
Frankly most of what i read out of the tech world these days seems to be about pandering to developer laziness.
All manner of APIs and services seem to exist in their current form simply to extract rent from developers that don't want to do back end "dirty work".
The entire idea behind writing a Service as a Software Substitute is about extracting rent.
Unless I'm mistaken, one big difference is that not using Google Analytics means you don't know which Google search pages people used to access your website. That can be a really important difference for some websites.
Sure, the basics are easy. But marketers and business people want to drill into a lot of data which is non-trivial to gather.
Unless you have a compelling business case (which SpiderOak does), it's not worth it.
I've recently been faced with this problem, and a solution doesn't have to be too complex.
There are roughly two parts to an analytics solution: event logging and, well, the actual analytics.
What's left is running and defining your queries in elasticsearch.
I realize it's not fit to be used for every situation, but it can so some pretty complex things this way without the hugest amount of effort ...
I don't think anyone was saying that GA is always better, it's just more often than not it is. It takes some skill and quite a bit of experience to draw the line at a reasonable place and correctly recognize the trade offs.
But then I started to fall short on disk space for storing too many events. This is a problem.
much of what we used analytics for anyway
So, I think, as more and more people will start using ad blockers, site owners will start getting less and less accurate stats from Google Analytics, forcing them to implement their own solutions. Hopefully, open source solutions will start providing the best features that Google does.
I have gone through their trial, but now I think I will register for the Solo account ($6/mo).
In February SpiderOak dropped its pricing to $12/month for 1TB of data. Having several hundred gigabytes of photos to backup I took advantage and bought a year long subscription ($129). I had access to a symmetric gigabit fibre connection so I connected, set up the SpiderOak client and started uploading.
However I noticed something odd. According to my Mac's activity monitor, SpiderOak was only uploading in short bursts  of ~2MB/s. I did some test uploads to other services (Google Drive, Amazon) to verify that things were fine with my connection (they were) and then contacted support (Feb 10).
What followed was nearly __6 months__ of "support", first claiming that it might be a server side issue and moving me "to a new host" (Feb 17) then when that didn't resolve my issue, they ignored me for a couple of months then handed me over to an engineer (Apr 28) who told me:
"we may have your uploads running at the maximum speed we can offer you at the moment. Additional changes to storage network configuration will not improve the situation much. There is an overhead limitation when the client encrypts, deduplicates, and compresses the files you are uploading"
At this point I ran a basic test (cat /dev/urandom | gzip -c | openssl enc -aes-256-cbc -pass pass:spideroak | pv | shasum -a 256 > /dev/zero) that showed my laptop was easily capable of hashing and encrypting the data much faster than SpiderOak was handling it (Apr 30) after which I was simply ignored for a full month until I opened another ticket asking for a refund (Jul 9).
I really love the idea of secure, private storage but SpiderOak's client is barely functional and their customer support is rather bad.
I wonder if that is happening in this specific case? Although if it were the case the vendor should still be honest about it. Just saying they limit uploads to 2 Mbps is better than giving the run-around.
Its to reduce their maximum bandwidth capacity required. I don't see it as a problem, considering their price points. They're selling you storage, not "slam 1TB of your data into our storage system in a day". If you're looking for that, ship a hard drive to Iron Mountain.
EDIT: Even AWS limits how fast you can upload to S3, and built an appliance for you to rent and ship back and forth if you need to move data faster. That station wagon full of tape is still alive and well.
I'm on gigabit fiber and use S3 to backup hundreds of gigs per month to S3. I've never seen them limit upload speeds, it is clearly saturating the connection for the entire duration of my upload. I would expect that because I am paying for the storage, they would be happy to let me write data to their machines as fast as I like. Is there a citation you can provide from their docs that supports your statement? Genuinely curious, because my experience has been different.
To the point that some of these sync or backup providers limit bandwidth, I have definitely experienced that. Tested SpiderOak and Dropbox and upload speed was horrid. Dropbox in particular was disappointing because they can't even claim to have the extra encryption overhead SpiderOak does, it was just shit speed every day. I'm paying a premium for gigabit fiber to the home and you really can tell who over-promises and under-delivers quickly. Fortunately my 'roll your own' backup + sync works well and is price competitive so I'll stick with that.
I don't understand why you'd think this. You're paying for storage, not an SLA as to how fast you can fill it.
> I'm paying a premium for gigabit fiber to the home and you really can tell who over-promises and under-delivers quickly. Fortunately my 'roll your own' backup + sync works well and is price competitive so I'll stick with that.
This is the preferred solution if a) commercial services are too slow for you and b) you're willing to spend the time to implement and manage it. It appears, based on commercial services out there, that there is no competition based on upload speeds.
They should be looking to partner with someone who has bandwidth problems in the other direction. By combining a backup service's upload bandwidth and a streaming video service's download bandwidth into one AS, you can get a more balanced stream, and qualify for free peering.
A great model would be to parter with CDNs; they pour content out to eyeball networks, but you could run a distributed network of your storage system across all of their POPs.
That ZOMG WHAT A DEAL! of a plan is kinda worthless...
"Slam" is a bit of a loaded word, since... if they are selling 1TB of storage, shouldn't we get 1TB of storage?
That's the same crap that ISP's tried to pull with UNLIMITED INTERNET!!! (as long as you stay under 30gb per month)
You do, they're just not allowing you to store it in 24 hours. Some services (Backblaze, if I recall) allow you to ship a drive to get around this limitation.
Notice that all services do this? If you can do better, build one! Prepare to go broke from the peak bandwidth requirements you'll need to build your networking architecture to support such transfer rates, but I always encourage experimentation and learning lessons over complaints.
The limitation is actually the pipe that connects you to Amazon, not an inherent limitation within S3 or other services within Amazon on connection speed. If you have a good enough connection, or peering with Amazon things go amazingly fast.
When I worked at an ISP, we slammed about 20 Gbit/sec into S3 without issues, but even then data we were backing up -- about 300 TB of data a day -- at that rate took 1.4 days to upload to the cloud, so we ended up backing it up in-house instead. (we needed to store the data for 7 days, after that it went bye bye).
Seems like the perfect usecase for S3; inbound transfer is free, and you're only paying for a rolling 7 day window of storage with lifecycle rules :/
I think it would be a good selling point for a service like this to allow higher upload speeds.
Hold on, this is hacker news. VCs, this is a great idea!
No, no of course it's not. Initial seeding is a competitive moat for the first mover. Moving a few hundred gigs to a new backup company just to save a few bucks? I don't think I could be bothered, because I KNOW how long it will take.
Backup services especially have low operational requirements for their hardware and network connection, since once the files are uploaded they only need to be verified periodically.
SpiderOak is definitely overselling the 1TB as well as another one that pops up once in a while called as the "unlimited" plan for $149 a year. This is clear from the disproportional pricing structure - $79 a year for 30GB that jumps to $129 a year for 1TB and then to $279 a year for 5TB - which entices users to go for the higher amounts because they appear to be great deals. What people with residential broadband connections may not realize is that a) uploading even 1TB of data will take a long time and b) SpiderOak cannot, and does not, provide any minimum guarantees on the upload or download speeds (assuming everything else in between SpiderOak and the user looks fine).
BTW, why store photos and videos on encrypted storage? For that I use Office 365's OneDrive: everyone in my family gets a terabyte for $99/year and I really like the web versions of Office 365 because when I am on Linux and someone sends me an EXCEL or WORD file, no problem, and I don't use up local disk space (with SSD drives, something to consider).
As for OneDrive, I tried it for a while but it didn't work out. Their clients and web interface were terrible and their API was severely lacking. I expect more functionality when I'm sacrificing my privacy.
I ended up going with Google Drive in the end, as you can get 1TB for $9/month with an Apps for Work Unlimited account (I actually seem to have Unlimited under that plan, which isn't supposed to happen until 4 users). That of course means sacrificing encryption but I trust Google enough to make the privacy tradeoff in exchange for extra features (OCR, Google Photos etc.).
A little off topic, but Google really seems to be upping their consumer game lately with Google Music, Youtube Red, Google Movies + TV, etc. I am now less a user of other services like GMail and Search, but Google gets those monthly consumer app payments from me. I have the same kind of praise for Microsoft with Office 365.
but its an ad hominem argument, thats for sure.
You are assuming that you are the only one using that uplink and that server
> easily capable of hashing and encrypting the data much faster than SpiderOak was handling it
I can believe that there was upstream congestion somewhere outside my network (speeds to Google, Amazon indicated that there were no issues inside) or that their server was overloaded but the engineer who investigated seemed to attribute it to the client:
> Additional changes to storage network configuration will not improve the situation much. There is an overhead limitation when the client encrypts, deduplicates, and compresses the files you are uploading"
Trivial to set-up, immune to adblockers affecting the completeness of data, prevents the write of tracking cookies, leaves data and utility of the GA dashboard mostly complete (loss of user client capabilities and some session-based metrics).
This is the route I'm preferring to take (being applied this Christmas via https://pypi.python.org/pypi/pyga ).
One may argue that Google will still be aware of page views, but the argument presented in the article is constructed around the use of the tracking cookie and that would no longer apply.
I'm shifting to server-push to restore completeness, I'm presently estimating that client-side GA represents barely 25% of my page views (according to a quick analysis of server logs for a 24hr period). I'm looking to get the insight of how my site is used rather than capabilities of the client, so this works for what I want.
Most people do notice and do care this has come up in countless conversations. They just accept it as a necessary evil that they can't do anything about and accept (wrongly) that they as a individual can't change the world.
You will have no GA cookie from any of my sites, I am not recording client identifying things or capabilities. It is a server-side push of GA and avoids all client-side interactions.
It is merely, "A page has been viewed, this one: /foo/bar?bash".
There's nothing in there that is tracking you. I'm not even embracing the session management aspect.
I get to use the tool that is best-in-class, in a way that lacks capability to track you.
If you are in fact anonymizing everything about a client as you claim you do, then it won't be able to. Unless, of course, you are feeding GA some opaque client ID that you then internally map to and from actual clients that hit your server. However something tells me that you aren't doing that, or you would've mentioned it already.
(edit) I re-read your comment. You aren't apparently interested in session counts. But what's good the GA summary then if you can't tell 10 bounced visitors from one visitor with 10 hits? This makes no sense. If you want to look at just page hit numbers, there are dramatically simpler ways to do that.
But I do retain insight into what content has been viewed, how much, what is rising and falling, etc.
The question really is what info are you really reporting on? AdBlockers make us blind and tracking is horrible, but I get to have a far more complete view over the simple stuff Urchin used to be great at.
Incidentally, I ran similar experiment with gaug.es few years ago - pulled on their tracking API from our server side. While it worked as expected, these sort of shenanigans are good for only one thing - hiding the fact that you are using 3rd party analytics from your visitors.
On a more general note - the thing is that you either care about other people's privacy or you don't. It's not a grayscale, it's binary. And if you do, there's no place for GA in the picture.
I am not passing IP. I am not passing a client-id. I am not passing any kind of correlation identifier from which a session can be inferred or created. I am not passing user-agent information. I am not passing a cookie ID.
I am only passing a page view event. "Page /foo/bar?bash has been viewed".
Take a look here: https://code.google.com/p/serversidegoogleanalytics/
Tell me where in that example (mine is similar) you see any client identifying information.
There is none. If GA deduces anything, it will be a property of my origin server and not a client.
I do not agree that using GA in the way I have described allows Google to invade privacy at all. Please explain clearly how it does in your opinion.
GA has many utilities, mainly is to follow the user and see the funnel they go and second to monitor the marketing campaigns. If you don't need this, then Apache log + webalyzer is perfect for everyone.
Those partners frustrate me, in that they won't trust me to provide stats generated from server logs, but they all default trust GA.
This technique allows me to use GA, produce the view of the content they need, export the PDF, and share that... and they trust it.
GA is the de facto store of trusted data when it comes to web site activity. For my sites that is tracking content page views.
This whole conversation started with you saying why abandon GA when you can use it without compromising clients' privacy. An exchange that followed shows that one can't actually derive not just the same function from GA that way, but virtually any function at all. Yes, you can feed data in, but the usefulness of what you can get back out is next to zero. What am I missing?
From your opening comment:
> Why not move to push GA data server-side?
Because it renders GA largely useless if clients' privacy is actually observed.
I would like to say, as someone extremely hostile to tracking of any kind, that if this is all you're sending to google, that sound perfectly fine from a privacy perspective. (Google gets your information, but that's between you and Google)
Thank you for choosing a method that respects the privacy of your readers.
I do not care to track users/sessions, page views are enough for me. I am tracking content and content views... and I get this big tool that is awesome at slicing data and presenting trend information... for free.
You never answered what info do you send to the owner of the tracking library that you licence? Or if you send them no info how do they get paid?
EDIT: I'd welcome discussion, in addition to your up/down votes
The opening line of this post is amusing. They ought to give thought to fixing their core product first.
It's not arbitrary - it requires a 64 bit CPU (of which Apple has now shipped 3 generations of).
Yeah, I guess it's just time to get a friggin' new phone already, but this one ain't broke yet, ya know?
I don't think you are a minority. I understand adblocking usage is around 20%-ish now.
What I don't understand is people who use adblockers but still login to their google account on chrome. It sorts of defeat the purpose...
Is there anything out there in this realm? If not, why not?
There's an open ticket for it, but it looks like it hasn't been addressed in a while since they don't want to break all existing passwords.
I'll write an exploit for it (the general case, not just Piwik in particular) and drop it on OSS Sec some day, but here's a theoretical attack:
1. Guess a username somehow. Maybe "admin"? Whatever, we're interested in the security of the hash function. Let's assume we have the username for our target.
2. Calculate a bunch of guess passwords, such that we have one hash output for each possible value for the first N hexits.
substr(md5($string), 0, 2) === "00"
substr(md5($string), 0, 2) === "01"
substr(md5($string), 0, 2) === "02"
substr(md5($string), 0, 2) === "ff"
4. Iterate steps 2 and 3 until you have the first N bytes of the MD5 hash for the password.
5. Use offline methods to generate password guesses against a partial hash.
The end result: A timing attack that consequently allows an optimized offline guess. So even if their entire codebase is immune to SQL injection, you can still launch a semi-blind cracking attempt against them.
Any centralized solution, at any scale, can possibly violate someone's privacy. Period. If we want to really fix things, we should stop circle jerking ourselves and do something about it.
It's more than just the tracking cookie, though. It's also about Google aggregating all its website data into a unified profile. The data they have on everyone is frightening—all because of free services like GA.
It syncs fast too. Just thought I'd share my experience with people.
I'm a little curious why they decided to go this route instead of using one of the open-source solutions. Aren't there good solutions to this problem already?
Writing your own is easy for the basic stuff. When you want to move beyond the basics, as Spider Oak will find, it becomes much more difficult.
And for the sake of ducks, I'm eating less meat as well. No more chicken - too much antibiotics, and as little meat as possible, only when it's worth it, so great taste and good quality.
Here in Europe 'ping -c 5' gives an average of about 10ms for google.com and 30ms for duckduckgo.com. Since search is such a fundamental part of browsing, this is very noticeable.
The only thing "wrong" with using an analytics service to better understand your customers is that it places all knowledge of visits, including ones that wished to be private, in a centralized location. This can be useful in providing correlation data across all visitors in aggregate, such as which browser you should make sure your site supports most of the time.
In other words, there exists some data in aggregate that is valuable to all of us, but the cost is a loss of privacy for smaller sets of personal data.
If individuals don't want certain behaviors analyzed by others, then they shouldn't use centralized services which exist outside their realm of control. These individuals would be better off using a "website" that is hosted by themselves, inside their own four walls, running on their own equipment. A simple way for SpiderOak to address this is to put their website on IPFS or something similar.
I appreciate the fact that SpiderOak is thinking about these things. It's important!
Google is pretty clear about this. The only reason they track you is for advertising, and there isn't any evidence of them using the info for anything else. In fact there is a lot of evidence pointing the other way, such as their insistence on encryption data flowing between their datacenters.
This is Google we are talking about, not Kazakhstan, China or Russia.
This inference doesn't even need to be intentional, machine learning is capable of accidentally picking up on latent variables. Even if your neighborhood (the original redlining) isn't a feature in the original variable, it could be inferred from the other variables.
TL;DR: Your surfing behavior could be used to deny you a home loan one day.
I feel so tempted to laught at your face right now.
>screw their customers
Google's customers are advertizers, not common people.
I wouldn't recommend it.
>Google's customers are advertizers, not common people.
Their users are also their customers. Without users there are no advertisers.
it's interesting that still there's meta, probably leftover
<meta name="google-site-verification" content="pPH9-SNGQ9Ne6q-h4StA3twBSknzvtP9kfEB88Qwl0w">
Unfortunately, there's no way to replicate what Google Analytics currently offers (for free!) within a couple of weeks (or even months). Not with big data sets. Yes, GA does enforce sampling if you don't pay for GA Premium, but the free edition is still one of hell of a deal (if you don't care about privacy).
If you only use Google Analytics as a hit counter, sure, you can do that yourself within a couple of minutes. The advanced features are way more complicated, though (think segmentation and custom reports).
This also begs the question: why not use Piwik?
You also underestimate how ubiquitous GA is because it's free and extremely popular. I'd consider myself an intermediate to advanced user of GA, but for people less experienced, I can easily share stuff with them for complicated tasks or they know how to do a lot of the basics themselves.
In hiring digital marketing people, GA is pretty much on par with Word in terms of familiarity. It's something a lot of people have a basic competence with.
GA has become very, very capable in the last five years or so. Combined with their current APIs, you can do pretty much anything you want.
Piwik doesn't scale. At least it doesn't scale unless you spend lots of resources to tinker with it. Its Cloud Edition is even more expensive then GoSquared which i consider to be a much better product.
What we basically need is a simple, effective, and cheap enough alternative to GA. And so far there are simply none.
How is tracking in house more private than GA? The user is still being tracked.
Exclude all hits from known bots and spiders
I've been using it for prob 8-10 years and it has never missed a beat. I use it on all my personal / business sites as well as some client websites that are super high traffic.
I won't be surprised if in the coming years we hear much more about using google fonts being base to count site access, if there is no analytics in place.
The source is at https://github.com/SpiderOak
We just entered private beta yesterday - https://cloudron.io/blog/2015-12-07-private-beta.html