One example of something that bit us was that the mixpanel servers modify the time stamp from a client if it is in the future. It turns out this happens quite regularly with mobile devices (especially Android). Consequently a batch of data coming in would all have the same time stamp destroying the ability to see what happened over time.
Another example is that Mixpanel will add country information to incoming data, but refuse to add to regional information so the only thing you know about US users is that they were in the US.
I also got bitten by their handling of timestamp.
e.g. - scenario:
Your users create bookings (which are normally done in the future).
How do you approach this? Well you submit a date field in the document you send to Mixpanel. But then you figure out that ONLY time axis supported by Mixpanel is their timestamp. Which could make sense for an off the shelf service, since you probably really don't want to build indexes against random user data.
Ok, take 2 - so I only have one X axis available (the Mixpanel Timestamp), however you soon figure out that Mixpanel just silently prunes your future timestamp to @now.
The worst part for me was Mixpanel teams total bewilderment as to why anybody would want either additional time axes OR future timestamps.
Guys your service really shows promise for building a fast and simple to use analytics. I really don't want to roll my own analytics service just to show simple activity timelines to my customers. I also don't want to maintain and support the infrastructure necesary - at least not at this time. Please realize that there are other cases for real time analytics besides funneling users towards "Buy now!" or "Signup now!" pages.
Are there any similar alternatives (I searched quite a bit but found nothing) or on the other hand if there are not - would there be anybody interested in joining up with me to build a service.
I can point to mistakes I made such as the higher posting. In our case some extra numbers are sent with each event and the values of those numbers are interesting. (One example is the volume level of the device.)
With Google Analytics where they term this custom variables all you get to see is the average value which is spectacularly useless. Mixpanel would show us the distribution, but only for a particular event type. We have about 20 different events, so working it out across all 20 would be too tedious.
Today we only use mixpanel as a receiver of events. We export the analytics data from them and then work on that locally. Unfortunately they only provide an export every 24 hours. We do not use any of their other functionality although we did try.
For mobile app analytics you need a client library in Java (Android) and Objective C (iOS). The client library needs to record analytics events into a SQLite database, and then periodically try to upload them to a server (you don't always have connectivity). Attention must also be paid to things like roaming (do you want to burn user's data, it can be expensive in many parts of the world). You also need cleanup (eg if you can't send data for many days then you'll likely want to discard it). You'll want to make sure the client plays nice (eg not creating lots of threads and causing constant wakeups). It should also supply platform information since you'll want to analyze versions, screen sizes etc. There are various other little details that matter on the client.
On the server side it needs to correctly cope with data arriving days late, with "incorrect" time stamps from clients. And you'll want to easily do pointy clicky through the data as you'll have some common questions such as what are the most prevalent platform versions and devices and how does that correlate by country.
Here is a list of things that mattered:
* The library needs to have a posture on how it is used by multiple different components in the same app. For example it can intend there to be one canonical source/package, or each component could make a private fork.
* If the canonical package is chosen then it must work with concurrent but different reporting ids and settings. (For example Google screw this up by having the tracker be a singleton.) It needs to be possible to find out the version number from tools so they can complain about being out of date.
* For the private fork posture it is easiest if the code is all one file (use nested classes). It should use a sqlite database name that differs per fork so they don't clash with each other.
* The library will have "slow" work that needs to be done. This includes updating the SQLite database with new events, clearing out too old events on startup, and sending event batches to the server. I updated the mixpanel code so that it returns Runnables for that work, and then my library can use existing slow work threads. Most however will want the library to work out where to run the slow work.
* You'll want to grab some other stuff by default (eg carrier information, device model, os version)
* Make debugging easy. For example the logcat mechanism on Android works nicely. Mixpanel were just logging their API call, not any detail. For example it would say "track" instead of "track: clicked" (where "clicked" is the event type)
* Sessions are what matters most. Mixpanel has no concept of sessions. For example when they purge unsent old events, they just delete the ones older than the time frame (was hard coded as 48 hours). However this means it could end up deleting the first half of a session but transmitting the rest. A better approach is to have a session id that is updated on each start, then delete all events belonging to old session ids.
In terms of implementation details, a comparison of Google Analytics to Mixpanel is useful.
Google only have one tracker instance, although you wouldn't know that from the API so multiple usage silently doesn't work. They have an extremely complicated custom variable scheme for adding extra data for each event. Ultimately their database stores a query string for each event. If there are 10 to send then they make 10 separate GET requests.
Mixpanel supports multiple instances, but almost everything was hard coded (eg dispatch intervals, expiry of old data). You supply events with arbitrary JSON data, including a list of "super properties" which are added to every event. This is a very good approach. The database stores the events. When submitting, a POST request is generated with a batch of events (up to 50, again it was hard coded as two different numbers in two different places).
If you use query strings (in the sense of a GET) then there is a danger of the data being logged by proxy servers, hitting URI length issues, and being unable to batch.
Another lesson learned the hard way is that I will not touch any analytics SDK code unless it is open source. It needs to have an actual clear license attached.
We got bitten with Google Analytics, because it can only report against one ID. That means if a library in the app also uses GA, then you can't have the different parts using different ids. This turned out to be a showstopper and they had no code to examine.
We do actually do our import and custom processing in MongoDB, so that side is covered.
If you want to see the power of how useful this really is, you should look at :
I think that the "Sales Page" doesn't do this new service justice. It's quite staggering how useful this is. It basically allows you to segment your users and keep in touch with users of whom you have identified might be power users, or users who might fall into your "danger cohort". For example, you might notice a trend that users who don't "do event X" within the last 7 days, are most likely not going to return.
Tracking the events in your application and then identifying these events as originating from a particular user, allows you to then find these users.
This is of course just one example.
Look at the documentation on People Analytics for more ideas...
Btw - We've avoided this in past using Google Analytics as in the TOS they mention (http://www.google.com/analytics/tos.html).
Hadn't heard of https://www.intercom.io/ - thanks for sharing @reustle.
In most cases, companies are tracking all of this data, just in multiple different systems and not bothering to pull it together (i.e. why do I get emails about product features I already use?) to use to make my life better.
On privacy, companies need to make sure they are being open with users. For most of these so-called "people analytics" companies can choose whether to include personally identifiable info. Companies need to be intentional, and should choose to anonymize customer data when they can (but should still treat people uniquely based on their past interactions, even if they can't put a name on someone).
Even so, all analytics services are basically privacy atrocities, and as such I don't think Mixpanel should receive a disproportionate amount of resentment.
User information utility and privacy are mutually exclusive.
1) More intelligent marketing spend. If you know you have a higher LTV for females for your app between the ages of 18-34 and you do a portion of your advertising on Facebook, it would be good to target just those users.
2) Insights into broad customer engagement. Let's say your 18-34 female users return to your application more often than male users in the same demographic, it'd be helpful to know what friction points cause these users to drop off.
3) Insights into spending users. Being able to segment all your user actions by those who are free and those who spend money would help you optimize your paying funnel.
4) Bug reporting. Knowing where your users are located can help illuminate whether you have server and localization problems.
I can't think of any reasons why you'd use this information to voluntarily contact users other than support-related issues. If Netflix sent me an email that said "We think you'd like these movies because other males liked these movies" I'd probably de-activate immediately.
If we were physically interacting with our users (i.e. we ran a shop or a community centre) we'd be using thousands of signals to determine who needed help, who was afraid, who was ready to purchase more and who was making trouble.
As developers we try to cultivate online social environments, socially-engaged shops, games which envelop the user and collaborative business tools but with absolutely none of these emotional cues. That's hard.
Imagine trying to design an amusement park if all you had was an anonymous ping each time someone went on a ride. The value to all of us (and to the users) in these new wave of analytics is to take us closer to the user and let us feel what they feel and service them where they need to be serviced.
Asking why one should do that is about the same as asking why you'd need to watch people queuing for the rides in an amusement park in order to improve the queues. Because if you don't you won't know what the user feels, wants or needs.
(Side note: what mixpanel is doing is incredible and they really are a pleasure to use. There is a Zen quality to their product and the way it gives you great power from great simplicity (although a custom dash would be great, ty! ))
My point is, it doesn't help you to measure something that isn't actionable. And my list above was just the four that I can think of where gender, age, and location details actually helped across many of the applications I've studied.
I agree with you that the goal of analytics is the help improve your products, but its easy to be misled to the wrong conclusions. Sometimes, too much data may actually be harmful to your business.
However I've been investing a lot of time and code in Analytics and in mixpanel recently and while I would definitely agree that not all data is actionable, it doesn't stop it being useful or worth investing in as long as I take the time to examine (and prune) it afterwards.
What I find exciting is that these new types of Analytics open up value in for product design in a way that just wasn't practical without a tonne of custom dev before.
However, just as a new boss taking time to talk to each employee in the company isn't predictably actionable but still has vast value, so these new Analytics tools allow us to understand users in way that is not always actionable but is invariably valuable. The only real question surely is "is it valuable enough"?
On privacy, we designed in an inability to correlate user data across our customers. So, for example, we cannot know that an end user of Apptegic Customer A is also the same end user of Apptegic Customer B. With this in place, the data is used only for our customers to understand and better serve their customers.
We're were doing something similar with mixpanel, except (in mixpanel client-side parlance this is called super properties) we have to send all attributes such as "number of pages viewed", "amount of money paid" with all our events in order to segment by that data.
And sending emails based on analytics is incredible! I've always wanted to build that for my app but didn't have the resources to. Is it horrible to gloss over the privacy concerns?
Also, if you're interested in automating emails based on analytics you should checkout http://getvero.com (disclaimer: I'm a founder). We're working on a Mixpanel integration as I type so we should pick up right where Mixpanel leaves off :).
It seems like at the very least it should support server-side validation of user ids based on their cookie or something, so a user can only screw with their own stats.
That said, as a customer, I felt that Mixpanel could have been a little more transparent with their roadmap.
It would have been _A LOT_ much better for customers like us to know that was in the pipeline for rolling out and would have saved us a lot of unnecessary headaches and pains.