Hacker News new | comments | show | ask | jobs | submit login
Mixpanel introduces People Analytics (mixpanel.com)
124 points by trefn 1904 days ago | hide | past | web | 45 comments | favorite

If this associates analytics with personal data, it is a tremendous invasion of privacy, especially in the mobile space where it is not expected that interacting with a local application would send your usage data to a remote server.

Our experience in the mobile space with mixpanel was extremely disappointing. For anyone who wants to try it I strongly recommend you do a non-trivial trial run first, and verify everything.

One example of something that bit us was that the mixpanel servers modify the time stamp from a client if it is in the future. It turns out this happens quite regularly with mobile devices (especially Android). Consequently a batch of data coming in would all have the same time stamp destroying the ability to see what happened over time.

Another example is that Mixpanel will add country information to incoming data, but refuse to add to regional information so the only thing you know about US users is that they were in the US.

Another dissatisfied Mixpanel lead here (who didn't get past the evaluation phase).

I also got bitten by their handling of timestamp.

e.g. - scenario:

Your users create bookings (which are normally done in the future).

How do you approach this? Well you submit a date field in the document you send to Mixpanel. But then you figure out that ONLY time axis supported by Mixpanel is their timestamp. Which could make sense for an off the shelf service, since you probably really don't want to build indexes against random user data.

Ok, take 2 - so I only have one X axis available (the Mixpanel Timestamp), however you soon figure out that Mixpanel just silently prunes your future timestamp to @now.

The worst part for me was Mixpanel teams total bewilderment as to why anybody would want either additional time axes OR future timestamps.

Guys your service really shows promise for building a fast and simple to use analytics. I really don't want to roll my own analytics service just to show simple activity timelines to my customers. I also don't want to maintain and support the infrastructure necesary - at least not at this time. Please realize that there are other cases for real time analytics besides funneling users towards "Buy now!" or "Signup now!" pages.

Are there any similar alternatives (I searched quite a bit but found nothing) or on the other hand if there are not - would there be anybody interested in joining up with me to build a service.

People's UI actually supports changing the x-axis so this problem is handled. We plan on bringing the same UI to other parts of Mixpanel soon. This problem will be fixed - sorry!

Really nice.

What are you using now? Do you like it?

We are still on mixpanel because it is hard to extricate ourselves, but obviously that isn't the long term plan.

I can point to mistakes I made such as the higher posting. In our case some extra numbers are sent with each event and the values of those numbers are interesting. (One example is the volume level of the device.)

With Google Analytics where they term this custom variables all you get to see is the average value which is spectacularly useless. Mixpanel would show us the distribution, but only for a particular event type. We have about 20 different events, so working it out across all 20 would be too tedious.

Today we only use mixpanel as a receiver of events. We export the analytics data from them and then work on that locally. Unfortunately they only provide an export every 24 hours. We do not use any of their other functionality although we did try.

If you are hand-rolling your own analytics, it sounds like it might be simpler/cheaper for you to use SnowPlow (https://github.com/snowplow/snowplow) for your use case - especially as you can grab the data hourly from S3 rather than every 24 hours. Unfortunately we don't have mobile clients yet :-( (although we are working on them) - which platforms would you need? Feel free to reach out on alex@snowplowanalytics.com

Snowplow is only about web analytics as far as I can tell. There are a bazillion solutions out there for that.

For mobile app analytics you need a client library in Java (Android) and Objective C (iOS). The client library needs to record analytics events into a SQLite database, and then periodically try to upload them to a server (you don't always have connectivity). Attention must also be paid to things like roaming (do you want to burn user's data, it can be expensive in many parts of the world). You also need cleanup (eg if you can't send data for many days then you'll likely want to discard it). You'll want to make sure the client plays nice (eg not creating lots of threads and causing constant wakeups). It should also supply platform information since you'll want to analyze versions, screen sizes etc. There are various other little details that matter on the client.

On the server side it needs to correctly cope with data arriving days late, with "incorrect" time stamps from clients. And you'll want to easily do pointy clicky through the data as you'll have some common questions such as what are the most prevalent platform versions and devices and how does that correlate by country.

Thanks Roger, that's helpful input. I'm adding a "client-timestamp" to the SnowPlow querystring instead of just relying on the CloudFront timestamp ;-) Out of interest, is there anything about the MixPanel mobile client libraries (Android/iOS) that you would do differently if you were starting from scratch?

I've basically rewritten almost all of it now. One important note about our use case is that our product is a library that application developers add to their application. This means we do not control the application and we have to play very nice. Additionally any dependencies we include (such as an analytics library) reflect on us. This is also why it is hard to get apps rereleased on changes in our code - users punish app updates that have no visible functional differences.

Here is a list of things that mattered:

* The library needs to have a posture on how it is used by multiple different components in the same app. For example it can intend there to be one canonical source/package, or each component could make a private fork.

* If the canonical package is chosen then it must work with concurrent but different reporting ids and settings. (For example Google screw this up by having the tracker be a singleton.) It needs to be possible to find out the version number from tools so they can complain about being out of date.

* For the private fork posture it is easiest if the code is all one file (use nested classes). It should use a sqlite database name that differs per fork so they don't clash with each other.

* The library will have "slow" work that needs to be done. This includes updating the SQLite database with new events, clearing out too old events on startup, and sending event batches to the server. I updated the mixpanel code so that it returns Runnables for that work, and then my library can use existing slow work threads. Most however will want the library to work out where to run the slow work.

* I deleted the code that reads the unique device id (aka UDID). Some companies are happy grabbing that - our privacy policy is far stronger. We generate a random unique id string on first run. Even the device code being present but unused is enough to set off binary analyzers.

* You'll want to grab some other stuff by default (eg carrier information, device model, os version)

* Make debugging easy. For example the logcat mechanism on Android works nicely. Mixpanel were just logging their API call, not any detail. For example it would say "track" instead of "track: clicked" (where "clicked" is the event type)

* Sessions are what matters most. Mixpanel has no concept of sessions. For example when they purge unsent old events, they just delete the ones older than the time frame (was hard coded as 48 hours). However this means it could end up deleting the first half of a session but transmitting the rest. A better approach is to have a session id that is updated on each start, then delete all events belonging to old session ids.

In terms of implementation details, a comparison of Google Analytics to Mixpanel is useful.

Google only have one tracker instance, although you wouldn't know that from the API so multiple usage silently doesn't work. They have an extremely complicated custom variable scheme for adding extra data for each event. Ultimately their database stores a query string for each event. If there are 10 to send then they make 10 separate GET requests.

Mixpanel supports multiple instances, but almost everything was hard coded (eg dispatch intervals, expiry of old data). You supply events with arbitrary JSON data, including a list of "super properties" which are added to every event. This is a very good approach. The database stores the events. When submitting, a POST request is generated with a batch of events (up to 50, again it was hard coded as two different numbers in two different places).

If you use query strings (in the sense of a GET) then there is a danger of the data being logged by proxy servers, hitting URI length issues, and being unable to batch.

Many thanks Roger, that's all super-helpful!

Have you tried http://www.kontagent.com?

All I find is marketing.

Another lesson learned the hard way is that I will not touch any analytics SDK code unless it is open source. It needs to have an actual clear license attached.

We got bitten with Google Analytics, because it can only report against one ID. That means if a library in the app also uses GA, then you can't have the different parts using different ids. This turned out to be a showstopper and they had no code to examine.

You can take a look at http://count.ly if you are looking for an open source solution. It has a Node.js + MongoDB server and Android & iOS SDKs. (Yes I'm the developer)

It only matters that the client is open source (note I didn't say free software). This is so you can read the code and be proactive about issues in it. For example the mixpanel client for Android caused one thread to be created whose sole purpose was to fire a timer event every 60 seconds, which was used in a second thread to see if there was any work to do. It is a lot easier to figure out stuff like that is happening from the source than to try and work out who is responsible for extra threads showing up in the debugger. If there is a suitable license then you can at least modify and redistribute the changed code.

We do actually do our import and custom processing in MongoDB, so that side is covered.

That's because it's not free. In general, you get what you pay for.

I had some trouble setting this up. Already a MixPanel customer, incase you were not aware, you need to update your API include. (Follow the links from your dashboard for the snippet).

If you want to see the power of how useful this really is, you should look at :


I think that the "Sales Page" doesn't do this new service justice. It's quite staggering how useful this is. It basically allows you to segment your users and keep in touch with users of whom you have identified might be power users, or users who might fall into your "danger cohort". For example, you might notice a trend that users who don't "do event X" within the last 7 days, are most likely not going to return.

Tracking the events in your application and then identifying these events as originating from a particular user, allows you to then find these users.

This is of course just one example.

Look at the documentation on People Analytics for more ideas...


This is great to see, trying right now.

Btw - We've avoided this in past using Google Analytics as in the TOS they mention (http://www.google.com/analytics/tos.html).

7. PRIVACY . You will not (and will not allow any third party to) use the Service to track or collect personally identifiable information of Internet users, nor will You (or will You allow any third party to) associate any data gathered from Your website(s) (or such third parties' website(s)) with any personally identifying information from any source as part of Your use (or such third parties' use) of the Service. You will have and abide by an appropriate privacy policy and will comply with all applicable laws relating to the collection of information from visitors to Your websites. You must post a privacy policy and that policy must provide notice of your use of a cookie that collects anonymous traffic data.

Hadn't heard of https://www.intercom.io/ - thanks for sharing @reustle.

As stated, it's crucial that privacy be fully respected for users. The key thing may not necessarily be knowing exactly who someone is, but instead knowing what they've done (which features they use and how often, which marketing emails they open, which support tickets they file, etc) and using this to give users better experiences personalized around their history of interactions. Providing this type of experience from web companies is what we're working on at Klaviyo (http://www.klaviyo.com).

In most cases, companies are tracking all of this data, just in multiple different systems and not bothering to pull it together (i.e. why do I get emails about product features I already use?) to use to make my life better.

On privacy, companies need to make sure they are being open with users. For most of these so-called "people analytics" companies can choose whether to include personally identifiable info. Companies need to be intentional, and should choose to anonymize customer data when they can (but should still treat people uniquely based on their past interactions, even if they can't put a name on someone).

I will reserve any judgement of the service, until I see it in action. I think an example dashboard - that doesn't require registration - would do more good than harm in alleviating users' privacy concerns.

Even so, all analytics services are basically privacy atrocities, and as such I don't think Mixpanel should receive a disproportionate amount of resentment.

User information utility and privacy are mutually exclusive.

Interesting, but the guys over at http://intercom.io have been doing this for some time now.

This is revolutionary and really closes the collect data -> analyze -> act feedback loop. Come to think of it, it really minds me of a DMP. If I were building a consumer app today, MP would be a big part of my growth strategy.

Excuse my ignorance, what's the acronym DMP stand for?

User privacy concerns aside (yes, I realize that is a big aside), I feel like there are only a handful of use cases where knowing this user data is actually helpful in analytics scenarios, and anything beyond just feels creepy to know.

1) More intelligent marketing spend. If you know you have a higher LTV for females for your app between the ages of 18-34 and you do a portion of your advertising on Facebook, it would be good to target just those users.

2) Insights into broad customer engagement. Let's say your 18-34 female users return to your application more often than male users in the same demographic, it'd be helpful to know what friction points cause these users to drop off.

3) Insights into spending users. Being able to segment all your user actions by those who are free and those who spend money would help you optimize your paying funnel.

4) Bug reporting. Knowing where your users are located can help illuminate whether you have server and localization problems.

I can't think of any reasons why you'd use this information to voluntarily contact users other than support-related issues. If Netflix sent me an email that said "We think you'd like these movies because other males liked these movies" I'd probably de-activate immediately.

I'm often surprised when people ask "what's the point in knowing these metrics". The point is simply that anything gives a better insight into use behaviour and user needs is immensely valuable.

If we were physically interacting with our users (i.e. we ran a shop or a community centre) we'd be using thousands of signals to determine who needed help, who was afraid, who was ready to purchase more and who was making trouble.

As developers we try to cultivate online social environments, socially-engaged shops, games which envelop the user and collaborative business tools but with absolutely none of these emotional cues. That's hard.

Imagine trying to design an amusement park if all you had was an anonymous ping each time someone went on a ride. The value to all of us (and to the users) in these new wave of analytics is to take us closer to the user and let us feel what they feel and service them where they need to be serviced.

Asking why one should do that is about the same as asking why you'd need to watch people queuing for the rides in an amusement park in order to improve the queues. Because if you don't you won't know what the user feels, wants or needs.

(Side note: what mixpanel is doing is incredible and they really are a pleasure to use. There is a Zen quality to their product and the way it gives you great power from great simplicity (although a custom dash would be great, ty! ))

Oh, I assure you, I know the importance of measurement. I'm an analyst for a large social game company :)

My point is, it doesn't help you to measure something that isn't actionable. And my list above was just the four that I can think of where gender, age, and location details actually helped across many of the applications I've studied.

I agree with you that the goal of analytics is the help improve your products, but its easy to be misled to the wrong conclusions. Sometimes, too much data may actually be harmful to your business.

I do appreciate what you mean about actionability and tbh it's hard for me to comment on this new mixpanel feature set as there's not too much information on the sales pages.

However I've been investing a lot of time and code in Analytics and in mixpanel recently and while I would definitely agree that not all data is actionable, it doesn't stop it being useful or worth investing in as long as I take the time to examine (and prune) it afterwards.

What I find exciting is that these new types of Analytics open up value in for product design in a way that just wasn't practical without a tonne of custom dev before.

However, just as a new boss taking time to talk to each employee in the company isn't predictably actionable but still has vast value, so these new Analytics tools allow us to understand users in way that is not always actionable but is invariably valuable. The only real question surely is "is it valuable enough"?

If you need greater customizability, have a look at http://www.instahero.com, it's what we're working on at the moment. It's still very young, but it aims to provide powerful yet easily customizable analytics (so you don't have to wait for MixPanel to give you people tracking, for example, you can just easily write a simple version of it yourself).

This is really cool! I'm doing something with a similar perspective, that it's better to simplify the programming model around it than to limit the feature set. I'll give instahero a try.

Oh, nice! What's your product called? Has it launched yet?

Something that many B2C developers may not be aware of, but for businesses that interact with a small number of users (most of B2B), it can be incredibly important to track a specific user. The expectations of privacy are definitely altered and many customers would be delighted to have direct contact with a customer experience team that knows their exact behaviors.

I feel that this is one of those "great for data miners, terrifying for consumers" moments.

Apptegic, http://www.apptegic.com, has also been offering this for a while. We let mobile and web companies understand what each user is doing in their app on a per user and per account basis, and then respond directly to that user in Salesforce, by email, or real-time in-app.

On privacy, we designed in an inability to correlate user data across our customers. So, for example, we cannot know that an end user of Apptegic Customer A is also the same end user of Apptegic Customer B. With this in place, the data is used only for our customers to understand and better serve their customers.

Has anyone has chance to try this yet? How is this different to what KISSMetrics does?

I'm curious if these people properties can be used on the segmentation screen?

We're were doing something similar with mixpanel, except (in mixpanel client-side parlance this is called super properties) we have to send all attributes such as "number of pages viewed", "amount of money paid" with all our events in order to segment by that data.

And sending emails based on analytics is incredible! I've always wanted to build that for my app but didn't have the resources to. Is it horrible to gloss over the privacy concerns?

I'm pretty certain that sending emails based on analytics has been a part of setups like Omniture for a long time. I think it's their test and target product that allows for all of this sort of thing.

I will also be curious to see this. It seems logical to think you'll be able to use this people data to segment in the future, I think it's a huge part of the value of this new feature.

Also, if you're interested in automating emails based on analytics you should checkout http://getvero.com (disclaimer: I'm a founder). We're working on a Mixpanel integration as I type so we should pick up right where Mixpanel leaves off :).

Is there anything stopping someone from using the JavaScript library to mess with your analytics by typing a little JavaScript in their console?

It seems like at the very least it should support server-side validation of user ids based on their cookie or something, so a user can only screw with their own stats.

This is amazing. This is pretty much what we have seen lacking as far as using Mixpanel. Thanks to the awesome work.

That said, as a customer, I felt that Mixpanel could have been a little more transparent with their roadmap.

It would have been _A LOT_ much better for customers like us to know that was in the pipeline for rolling out and would have saved us a lot of unnecessary headaches and pains.

100% agree, this was my only real complaint when it came to recommending Mixpanel, great to see full user history elegantly laid out!

Thank you Ghostery and friends.

now i feel really stalked.

Interesting stuff. I know PipeWise has been innovating in this space as well which we're currently testing out.

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact