Hacker News new | past | comments | ask | show | jobs | submit login
The Way Server-Side Analytics Should Be (segment.io)
113 points by ivolo on Jan 22, 2013 | hide | past | favorite | 34 comments

There are probably some value-adds you can do here, besides convenience for developers. For example, you could save your users money by doing some creative aggregation of data points (assuming these services charge by volume) before sending them off. Or you can try to detect low-value data points and let your users filter them out (like hits from bots). Or you can do sampling so that only 1 out of every x data points goes through. Yeah, in-house developers could code this up too, and it might completely kill the value of the analytics service by not sending all data, but who has the time to investigate these types of things?

Also, since switching analytics providers becomes very easy for your users, you can try to leverage this fact and get a kickback from analytics providers that you help convert your users to. You could, theoretically, reduce the friction of moving to another provider to near zero (no development / integration cost, no data-lock in if your users let you store historic logs that you could replay to another provider, etc). That would result in much more competitive pricing from the analytics providers.

edit: yeah, these random thoughts mostly apply to the server side stuff. just throwing them out there.

Is this really a problem crying out for a solution? The justification felt a bit forced to me. My only experience is integrating with Mixpanel from Scala. I wrote the following in an evening:


Seriously, it wasn't a big deal. Maybe I'm spoiled by using Scala/Play -- the web client is asynchronous so the analytics calls aren't blocking the controller.

I'm far more annoyed that I get very little useful analysis from my analytics. That is, to me, a much more interesting and useful problem to solve.

I talk to a lot of startups about analytics and this is definitely a problem that needs solving. Doing lots and lots of external API calls is nontrivial at scale for someone who is "just a Rails app". Lots of startups need analytics but are still at "just a Rails app" stage. If you're already async it's easy but so many people aren't.

You're 150% right on your second point - people have no fucking clue how to get business value out of analytics. I am trying to blog about it to help solve the problem, but it's a tricky thing to teach.


I've spent lots and lots of time on this. I've used nearly every off-the-shelf system as well as some really well-done in-house systems (Zynga, in particular).

All have major shortfalls, and many startups have piecemeal data in multiple providers (AppAnnie! Flurry! MixPanel! Google Analytics! Internal Databases!) that only solve one little sliver of the problem and don't talk to each other. There's plenty of room for innovation and improvement.

teej, you hit the nail on the head.

I'm a co-founder of SoundGecko.com and we have analytics but nothing anywhere near what we want.

Sticking points are: 1. It's going to take some time to get going 2. There are other business/product feature issues that need development 3. There is no decent reporting interface that works out of the box, is cheap, and is real time

I think Google Analytics with Event Tracking is the way to go by the looks of it.

I can completely relate to this, it might be anecdotal, (Java isn't as popular for web development these days as it used to be), but server side analytics using Mixpanel in Java for example was pretty hard from a non Android app, I ended up building something myself to circumvent the issue (https://github.com/eranation/mixpanel-java) but I have to give credit to mixpanel for responding to emails quickly, picking it up, creating their own Java API and kindly linking to mine and another mixpanel Scala based API from their support pages. Still I wouldn't have had to even think doing it if I knew segment.io probably, nice job, will give it a go.

Impressive range of libraries provided and the documentation is quite good.

Seems a little strange that the benefits are touted of using the libraries over setting up a dedicated queue service when the libraries rely on in memory queues. While the loss of the tracking data, should these queues fail, may not be worth the hassle of maintaining the dedicated infrastructure, there is a critical difference between the two approaches. These libraries are much more likely to impact application (or the servers the applications are running on) performance then queuing a message on another server.

This caught my eye as well, because the article talks about how big a pain it is to set up & maintain queuing servers, but then provides a solution that simply isn't as robust.

But to be honest, the fact that they open sourced all these libraries and present them front-and-center for anybody interested makes up for that. There isn't any magic going on -- just some straightforward code that leverages language/runtime-appropriate mechanisms to queue up the messages. Its pretty easy to figure out if the solution meets your performance needs when you can see the actual code being run.

Sure, its not as robust as a generalised queuing system, but it's web analytics - the driving reason for queuing is more about high performance than about guaranteed message delivery. Lightweight and _fast_ in-memory queues seem like a perfectly adequate solution fr the problem domain, while being (as claimed) significantly less of a pain to set up than "traditional" queuing techniques.

the main idea is ease of integration. the developer doesn't have to write code to queue and dequeue, and the sys admin doesn't have to set up and support a queuing system. it's just a drop-in install.

regarding the impact on application performance, each of the libraries has a maxQueueSize feature which stops accepting messages into the internal queue if the flushing can't keep up. you can check out how maxQueueSize works in our docs: https://segment.io/docs

From a random "Show HN" a few weeks ago to what seems like a polished company that's been around for years, I just wanted to say you guys are absolutely killing it (and making a joke of all the dev power that probably went into Google's own inferior Tag Manager).

If you want to laugh (and cry if you're actually forced to use it), check out the adobe tag manager. it makes Google's look like technology from the year 3023.

thank you so much :) that means a lot.

Do you have a PHP library?, he asked sheepishly.

Haha, we've got one in the works! Shoot us an email (friends@segment.io) and I'll get you on the list and let you know when we release it.

+1 for the web's most popular server side language. For all that talk if increasing customer retention and conversion, nothing beats supporting what people need.

Awesome stuff guys! We're very excited about adding SnowPlow support into analytics.js - expect a pull request soon :-)

Does adding support for SnowPlow into analytics.js get us "server-side support" for free, or is there anything else we have to do too?

We handle adding server-side support to our REST API separately, but the delay is pretty short :p If you do submit a pull request, send us a logo file (friends@segment.io) and we'll get you in the interface real quick!

Great thanks Ian!

Very glad you guys are doing this. I love Mixpanel but maintaining analytics calls via JS was a royal pain. This way, we can use your apis to do all the tracking and their front-end for all the slicing. Perfect!

I don't really need or want to use multiple analytics services, at the moment. I currently use frontend analytics with mixpanel to track funnels and whatnot. What is the benefit to instrumenting backend code over frontend?

If you're particularly paranoid about security, you may want to avoid importing 3rd party JS files into pages displaying sensitive information... which means you either have to manage your own analytics service entirely, or make server-side calls.

Or, depending on the complexity of your webapp, you may have events you'd like to track that aren't known on the client side -- e.g., in your UI a user can invite someone else to link them by providing an email, and there's just one flow; on the server-side, this may trigger an invitation sent to an email address unknown to you, or to an existing user of your service unrelated to the active user, or to someone who is already linked (directly, 2nd-level, etc.) to the active user.

This kind of thing can be important to capture, but the info is unavailable in the front end.

Again from the security perspective -- in the example above, it may be important that you don't reveal to the active user whether the invitee is already in the system, because that could be sensitive information (imagine a journalist joining an drug addiction support forum, then inviting lots of politicians' email addresses to see if they're members...).

One theoretical that I can think of is you may be able to circumvent people's ad blockers.

It seems so strange to read these issues about blocking IO and the complicated ways to 'resolve' them (introduce an AMQP broker? really?) when you're used to asynchronicity.

I'm used to writing in Twisted. Okay, so I fire up something, and I get a deferred. I'm not waiting for it to complete before I can do other things. None of this queuing nonsense.

At the same time, I've heard tons of people complain about how async-everywhere complicates everything. Either they don't know what they're talking about, or it just complicates everything else (or the truth is somewhere in the middle).

Are the server side libraries hitting your servers and then being forwarded to the specific analytic service or hitting their service directly?

the server side libraries hit our servers and then forward to the specific analytics services. the client side library (free) does not hit our servers and just forwards directly to each service.

Segment.io guys, an aside from your blog post, which was actually quite good... Both your site and your blog are almost unusable on an iPhone. I know a lot of people (like me!) like browsing HN and blogs on their phone, and fixing layout CSS isn't too big of a project, so just a little heads up.

How it compares to http://www.NewRelic.com ?

Is there support for ecommerce tracking with Google Analytics?

hey we don't support it yet, but we've gotten a couple requests for this, so we'll be looking into it soon!

Will the ruby version work when using unicorn?

I haven't worked with unicorn specifically, but from my understanding - it should work fine given a reasonable number of child processes.

It's worth noting that when you initialize the client, the module spawns a new thread to consume messages from the queue. As long as each forked child process in unicorn initializes its own client, that process will be able to use its own in-memory queue. That means that they will all make their own web requests independently since there isn't any kind of syncing or shared state going on.

What server side trackers do you suggest?

great job segment.io team!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact