

The Way Server-Side Analytics Should Be - ivolo
https://segment.io/blog/the-way-server-side-analytics-should-be/

======
noelwelsh
Is this really a problem crying out for a solution? The justification felt a
bit forced to me. My only experience is integrating with Mixpanel from Scala.
I wrote the following in an evening:

<https://gist.github.com/4598180.git>

Seriously, it wasn't a big deal. Maybe I'm spoiled by using Scala/Play -- the
web client is asynchronous so the analytics calls aren't blocking the
controller.

I'm far more annoyed that I get very little useful analysis from my analytics.
That is, to me, a much more interesting and useful problem to solve.

~~~
teej
I talk to a lot of startups about analytics and this is definitely a problem
that needs solving. Doing lots and lots of external API calls is nontrivial at
scale for someone who is "just a Rails app". Lots of startups need analytics
but are still at "just a Rails app" stage. If you're already async it's easy
but so many people aren't.

You're 150% right on your second point - people have no fucking clue how to
get business value out of analytics. I am trying to blog about it to help
solve the problem, but it's a tricky thing to teach.

~~~
far33d
+1.

I've spent lots and lots of time on this. I've used nearly every off-the-shelf
system as well as some really well-done in-house systems (Zynga, in
particular).

All have major shortfalls, and many startups have piecemeal data in multiple
providers (AppAnnie! Flurry! MixPanel! Google Analytics! Internal Databases!)
that only solve one little sliver of the problem and don't talk to each other.
There's plenty of room for innovation and improvement.

------
arscan
There are probably some value-adds you can do here, besides convenience for
developers. For example, you could save your users money by doing some
creative aggregation of data points (assuming these services charge by volume)
before sending them off. Or you can try to detect low-value data points and
let your users filter them out (like hits from bots). Or you can do sampling
so that only 1 out of every x data points goes through. Yeah, in-house
developers could code this up too, and it might completely kill the value of
the analytics service by not sending all data, but who has the time to
investigate these types of things?

Also, since switching analytics providers becomes very easy for your users,
you can try to leverage this fact and get a kickback from analytics providers
that you help convert your users to. You could, theoretically, reduce the
friction of moving to another provider to near zero (no development /
integration cost, no data-lock in if your users let you store historic logs
that you could replay to another provider, etc). That would result in much
more competitive pricing from the analytics providers.

edit: yeah, these random thoughts mostly apply to the server side stuff. just
throwing them out there.

------
eranation
I can completely relate to this, it might be anecdotal, (Java isn't as popular
for web development these days as it used to be), but server side analytics
using Mixpanel in Java for example was pretty hard from a non Android app, I
ended up building something myself to circumvent the issue
(<https://github.com/eranation/mixpanel-java>) but I have to give credit to
mixpanel for responding to emails quickly, picking it up, creating their own
Java API and kindly linking to mine and another mixpanel Scala based API from
their support pages. Still I wouldn't have had to even think doing it if I
knew segment.io probably, nice job, will give it a go.

------
mountaineer
Impressive range of libraries provided and the documentation is quite good.

Seems a little strange that the benefits are touted of using the libraries
over setting up a dedicated queue service when the libraries rely on in memory
queues. While the loss of the tracking data, should these queues fail, may not
be worth the hassle of maintaining the dedicated infrastructure, there is a
critical difference between the two approaches. These libraries are much more
likely to impact application (or the servers the applications are running on)
performance then queuing a message on another server.

~~~
arscan
This caught my eye as well, because the article talks about how big a pain it
is to set up & maintain queuing servers, but then provides a solution that
simply isn't as robust.

But to be honest, the fact that they open sourced all these libraries and
present them front-and-center for anybody interested makes up for that. There
isn't any magic going on -- just some straightforward code that leverages
language/runtime-appropriate mechanisms to queue up the messages. Its pretty
easy to figure out if the solution meets your performance needs when you can
see the actual code being run.

~~~
bigiain
Sure, its not as robust as a generalised queuing system, but it's web
analytics - the driving reason for queuing is more about high performance than
about guaranteed message delivery. Lightweight and _fast_ in-memory queues
seem like a perfectly adequate solution fr the problem domain, while being (as
claimed) significantly less of a pain to set up than "traditional" queuing
techniques.

------
ry0ohki
From a random "Show HN" a few weeks ago to what seems like a polished company
that's been around for years, I just wanted to say you guys are absolutely
killing it (and making a joke of all the dev power that probably went into
Google's own inferior Tag Manager).

~~~
anoncoward75
If you want to laugh (and cry if you're actually forced to use it), check out
the adobe tag manager. it makes Google's look like technology from the year
3023.

------
qeorge
Do you have a PHP library?, he asked sheepishly.

~~~
ianstormtaylor
Haha, we've got one in the works! Shoot us an email (friends@segment.io) and
I'll get you on the list and let you know when we release it.

------
alexatkeplar
Awesome stuff guys! We're very excited about adding SnowPlow support into
analytics.js - expect a pull request soon :-)

Does adding support for SnowPlow into analytics.js get us "server-side
support" for free, or is there anything else we have to do too?

~~~
ianstormtaylor
We handle adding server-side support to our REST API separately, but the delay
is pretty short :p If you do submit a pull request, send us a logo file
(friends@segment.io) and we'll get you in the interface real quick!

~~~
alexatkeplar
Great thanks Ian!

------
kirillzubovsky
Very glad you guys are doing this. I love Mixpanel but maintaining analytics
calls via JS was a royal pain. This way, we can use your apis to do all the
tracking and their front-end for all the slicing. Perfect!

------
calpaterson
I don't really need or want to use multiple analytics services, at the moment.
I currently use frontend analytics with mixpanel to track funnels and whatnot.
What is the benefit to instrumenting backend code over frontend?

~~~
jtheory
If you're particularly paranoid about security, you may want to avoid
importing 3rd party JS files into pages displaying sensitive information...
which means you either have to manage your own analytics service entirely, or
make server-side calls.

Or, depending on the complexity of your webapp, you may have events you'd like
to track that aren't known on the client side -- e.g., in your UI a user can
invite someone else to link them by providing an email, and there's just one
flow; on the server-side, this may trigger an invitation sent to an email
address unknown to you, or to an existing user of your service unrelated to
the active user, or to someone who is already linked (directly, 2nd-level,
etc.) to the active user.

This kind of thing can be important to capture, but the info is unavailable in
the front end.

Again from the security perspective -- in the example above, it may be
important that you _don't_ reveal to the active user whether the invitee is
already in the system, because that could be sensitive information (imagine a
journalist joining an drug addiction support forum, then inviting lots of
politicians' email addresses to see if they're members...).

------
lvh
It seems so strange to read these issues about blocking IO and the complicated
ways to 'resolve' them (introduce an AMQP broker? really?) when you're used to
asynchronicity.

I'm used to writing in Twisted. Okay, so I fire up something, and I get a
deferred. I'm not waiting for it to complete before I can do other things.
None of this queuing nonsense.

At the same time, I've heard tons of people complain about how async-
everywhere complicates everything. Either they don't know what they're talking
about, or it just complicates everything else (or the truth is somewhere in
the middle).

------
giulianob
Are the server side libraries hitting your servers and then being forwarded to
the specific analytic service or hitting their service directly?

~~~
pkrein
the server side libraries hit our servers and then forward to the specific
analytics services. the client side library (free) does not hit our servers
and just forwards directly to each service.

------
ftwinnovations
Segment.io guys, an aside from your blog post, which was actually quite
good... Both your site and your blog are almost unusable on an iPhone. I know
a lot of people (like me!) like browsing HN and blogs on their phone, and
fixing layout CSS isn't too big of a project, so just a little heads up.

------
CCs
How it compares to <http://www.NewRelic.com> ?

------
namabile
Is there support for ecommerce tracking with Google Analytics?

~~~
pkrein
hey we don't support it yet, but we've gotten a couple requests for this, so
we'll be looking into it soon!

------
dickeytk
Will the ruby version work when using unicorn?

~~~
calvinfo
I haven't worked with unicorn specifically, but from my understanding - it
should work fine given a reasonable number of child processes.

It's worth noting that when you initialize the client, the module spawns a new
thread to consume messages from the queue. As long as each forked child
process in unicorn initializes its own client, that process will be able to
use its own in-memory queue. That means that they will all make their own web
requests independently since there isn't any kind of syncing or shared state
going on.

------
togasystems
What server side trackers do you suggest?

------
vj44
great job segment.io team!

