
Autotrack by Mixpanel: Collect Everything and track anything retroactively - nih825
https://mixpanel.com/autotrack/
======
_kyran
From what I understand, Heap Analytics pioneered this space.

Can anyone chime in on the differences/experiences between the two?

~~~
matm
(Founder of Heap here.)

At the most fundamental level: Heap's products - our SDKs, infrastructure,
interface, pricing - were built from the ground up and optimized for this
"capture everything" philosophy. Mixpanel tacked it onto an experience that's
still built around manual instrumentation.

This becomes obvious when you actually use both products. I encourage you to
do so and witness the differences yourself.

A few deficiencies of Mixpanel's approach relative to Heap:

* _Performance._ If you click around Mixpanel's site, you'll notice a tracking request get sent for each and every interaction. Click 10 times in a row, and Mixpanel's SDK will issue 10 separate requests in succession. Heap's SDK does the right thing: batch events. This is because Heap's SDK is optimized for automatic event tracking, while Mixpanel's was built for legacy manual tracking.

* _Data trustworthiness._ Form submissions and link clicks can unload the page before an analytics request gets sent, which means Mixpanel will drop a large percentage of events (~30% in our experience). This gets exacerbated on mobile devices with poor internet connections. Heap does the right thing in each of these cases (it's actually a surprisingly tricky technical problem). This sort of data gap is dangerous for any real analysis, because you're basing decisions upon figures that are fundamentally wrong (especially on mobile!). The "best-effort" approach works for manual tracking, but not for automatic tracking.

* _Data completeness._ Mixpanel fails to capture some key client-side events, including pushstate and hashchange events. If you have a single-page webapp, pushtate/hashchange events are critical in understanding a user's flow through your product. Heap captures these (and other interactions) seamlessly. Mixpanel doesn't.

* _Scale._ Owler is the only customer cited as using Mixpanel's autotrack feature in production (at least as far as I can tell from their press release). Heap's automatic tracking, on the other hand, is live and battle-tested on some of the largest websites on the internet (heapanalytics.com/customers). It'll be interesting to see how Mixpanel's approach scales to their customer base. It's clearly not fully figured out: you'll note a "X-MP-CE-Backoff" response header to each of their analytics requests, presumably to pause data collection when their backend load is too high.

* _Pricing._ Mixpanel still applies traditional per-event pricing here. This causes a few issues: 1) costs can balloon unpredictably as you track more events, 2) you're disincentivized from exploring new data, 3) you have to do a cost/benefit analysis for each new event you're thinking to track. This is a major deterrent to ad-hoc, retroactive, exploratory analysis, which is the primary benefit of capturing everything.

It's clear that automatic data collection is the way of the future, and
there's still so many more ways to evolve it (stay tuned!). I think we'll see
more and more tools adopt this approach over time. But dealing with a surplus
of data requires a fundamental rethinking of analytics practices, and it's not
as simple as shoehorning features into traditional experiences.

~~~
suhail
Thanks for pointing out some areas where we can improve. We will work on
making autotrack great over the year specifically for those things but we
haven't seen any major problems with our large customers from performance to
pricing. If we do, we are committed to changing things quickly.

------
hoogasian
How is this different from what Heap already does?

