
Ask HN: Who is using a self hosted analytics system? - mfrye0
I&#x27;m so used to SASS based solutions these days like Mixpanel, GA, Kiss Metrics, etc. But with everything happening over the last few years in regards to privacy and security I&#x27;ve been looking into self hosted.<p>Is anyone else considering the same? I know this is standard for companies like Amazon and Facebook, but what about everyone else?<p>If you are using a self hosted setup, what is it? Custom built, open source, etc?
======
mrgreenfur
If you're serious about self-hosting analytics there is only one serious place
to go: [http://snowplowanalytics.com/](http://snowplowanalytics.com/)

I don't use them, but they are building enterprise-grade self-hosting.

Disclaimer: I am working on a project in the marketing analytics space. I
don't use them and this isn't an endorsement, just a pointer to research more!

~~~
mfrye0
This looks awesome. Just what I was looking for.

Seems like they have some decent companies using them too.

~~~
mrgreenfur
Are you still working on thebigpicture?

~~~
mfrye0
Yeah. I'm actually asking the question as one of my friends mentioned we
should offer a full self hosted option for enterprise.

I don't know if we want to go that direction though. Plus I wasn't even sure
who is actually self hosting these days. Kind of reminds me of the Silicon
Valley show with the "box" in the data center dilemma.

~~~
mrgreenfur
Yeah, I hear you. Snow plow is the only commercial one I've heard of that
allows self-hosted. It makes sense but you need a new billing model then (e.g.
services).

The bigger the data the more self-hosting will make sense for you, but the
less for your customers.

~~~
mfrye0
I'm not sure if I follow you there. What did you mean by, "The bigger the data
the more self-hosting will make sense for you, but the less for your
customers"?

~~~
mrgreenfur
I meant that if you're running an analytics company and each client has a huge
amount of data, it's less costs for you to host it (better for you as the
company owner). If the clients have to self host, then the clients will have
to pay for it.

This works for snowplow because they sell services, which only really works
for bigger companies who have the resources to self-host and to pay the
services fees.

~~~
mfrye0
Ah gotcha. Thanks for the input.

I've been looking at Snowplow and it seems really cool.

------
ohgh1ieD
I'm actually on the same road right now, currently I'm testing Piwik.

[https://piwik.org](https://piwik.org)

~~~
pesfandiar
What's your experience with it so far? I was playing around with it at a
startup a few years ago. It seemed to have a strong community and features
were popping up rather quickly. However, it didn't scale that well for us.

~~~
ohgh1ieD
I'm very pleased so far, but as I said, it's too early for me to tell
something.

> However, it didn't scale that well for us.

What kind of scaling problems did you encounter ?

~~~
pesfandiar
In fairness to Piwik, it was some of the custom metrics that required more
processing power. We ended up doing delayed batch processing for them.

Towards the end, IIRC, the calculation of unique visitors within a custom
range was also slow.

------
gesman
I built my own analytics App for Splunk to offer business insights for my
wife's small business. Mostly how traffic correlates with purchases and where
buyers are coming from.

As a side effect same system detects malware and cyber attacks on other
websites pretty well as well.

[https://splunkbase.splunk.com/app/2676/](https://splunkbase.splunk.com/app/2676/)

~~~
mfrye0
Whoa. That's badass.

Do you pull any info regarding the IP addresses, or is it only the raw logs
that you're going through?

~~~
gesman
Only raw logs. Splunk resolves IP to Country/Region/City (and geo coordinates
if wanted to map these).

Mostly playing with raw logs and then even RAW-er logs using Splunk Stream
(thing that switches network interface in promiscuous mode and gives me _all_
data for all protocols and any context I ever want).

For example I can analyze anomalies in web hits and anomalies in web session
to discover new, previously unknown traffic sources and patterns.

It helped to discover 2 new classes of cyberattacks I didn't know were
targeting my server.

~~~
mfrye0
Sounds really useful. I'll have to check that out.

------
__d
I've used [https://piwik.org](https://piwik.org) successfully.

~~~
mfrye0
Thoughts using it so far?

------
tixocloud
To give you some perspective, a startup I worked for began with a self-hosted
web analytics option.

In fact, most of the systems we had were all homegrown and while initially it
was a great idea - the maintenance part of it took a lot of time away from
optimizing on generating revenue.

We spent a lot of time trying to figure out the structure of our self-hosted
web analytics platform, how it tracks data, how it stores data, etc. The
majority of our company were engineers but we still wasted a lot of time
fiddling with the self-hosted analytics.

There will likely be a subset of customers who would be interested in self-
hosted but I'm willing to bet that they are more likely to be companies more
engineering-oriented.

~~~
mfrye0
Good feedback.

Yeah I was thinking the same thing - that companies who are self hosting are
more likely to be engineering-oriented. Otherwise like you said it's just too
much of a pain to handle it yourself.

May I ask, why did you self-host vs use an external service? To save money,
for greater security and privacy?

~~~
tixocloud
I believe we self-hosted because of privacy worries and the engineering
culture (or myth that we can and should build everything).

Ultimately though, the business demanded to switch to something more robust
and we went with Omniture.

I do believe that self-hosted is an option if it's easy to maintain and it
also delivers on business features like reporting and data exploration.

~~~
mfrye0
Yeah I hear you. I run into the same issue - why pay if you can build it
yourself.

Thanks for the insight.

~~~
tixocloud
Yeah but I believe successful companies are ones where they can figure out the
right balance between buying and building.

~~~
mfrye0
Yeah I agree. It's a tough call sometimes.

------
shakna
I don't yet have a product I've settled on, but I'm in the early stages of
developing a website for a government body.

With all the regulations and policies on data protection, using something not
self-hosted is just not going to happen. (I believe that if there was a
_possibility_ that their users were affected by a 3rd party breach, the fine
is around 1000x the project budget).

If this means a little less information about the audience, that is perfectly
acceptable.

~~~
mfrye0
Yeah that makes sense being a government org.

------
boyter
I have a custom built one. It's main feature is that it can email me the
details I want on a scheduled basis to save me logging in.

It tracks a few million records a month so not high scale and runs on a single
$5 digital ocean instance.

I did attempt to make it Sass for hosting resellers as an upsell but never
made any progress. It's just running for myself these days.

~~~
mfrye0
Sounds pretty cool. What tech is it based on? Just a standard api and db?

Also, can I ask how big your company is? A few million records is decent
volume.

~~~
boyter
Nginx, Django/Python and MySQL. Nothing fancy.

No company actually. Just side projects. The largest being searchcode.com

Feel free to email me if you want further details on either. Details in my
profile.

~~~
mfrye0
Sure thing. Thanks.

I've actually been looking at Snowplow and that seems pretty badass so far.
Might work for what I'm looking at.

------
zyzioziom
Piwik is pretty good, but like most open source, not so feature packed
comparing to commercial tools. It's resource heavy, though. We had problems
for big amounts of data which couldn't be loaded before timeouts. Caching
helps, but it definitely needs stronger machine.

------
mfrye0
I've been talking to a friend about it and he mentioned using Elastic Search
to build our own setup. Idk if it's the right use case though...

------
Nilef
Working for a bank processing millions of calls a minute; using WebTrends

~~~
mfrye0
I haven't heard of WebTrends before. I'm looking at their website now. Are you
self-hosting it? I can't find info on that option.

