
How we saved money by replacing Mixpanel with BigQuery and K8S - tzury
https://blog.doit-intl.com/replacing-mixpanel-with-bigquery-dataflow-and-kubernetes-b5f844710674
======
trjordan
Just to flesh out the vendor comparison...

Five weeks of, say, three engineers with $150k salaries ($200k fully loaded)
is $60k. People say that the majority of software is after initial
development. Let's say 20% is development [0]. That means this project costs
$300k total and lives for 5 years before the team decides to rewrite it. That
brings us to $60k / year.

This is good savings! My general guidance to folks thinking about this sort of
project is that you should implement a vendor and see how you like it. The
reason being, MixPanel does a LOT more than this tool, and buying a vendor can
help you figure out the subset of features that you actually need. Once you
have a clear requirement list, you have a clearly scoped project and can
hammer it out relatively easily to get that cost savings.

Beware of thinking that you can do this for $X,000 / year on your own, or get
it right on the first shot. There's a reason MixPanel (and any other vendor)
has so many engineers. You're paying them to have already made the mistakes
that you're about to make.

[0] [https://stackoverflow.com/questions/3477706/development-
cost...](https://stackoverflow.com/questions/3477706/development-cost-versus-
maintenance-cost)

~~~
ben_jones
The whole thing is a big ball of mud. If you commit to a vendor you risk
vendor-lock in. If you commit to hand-roll you risk runaway deadlines, feature
creep, and a slew of other threats. Once you pick one or the other and get it
to a business functional level you lose all momentum and willpower to pivot to
the best solution now that you have a "good enough" approach.

~~~
ProAm
Software is always 'Pay me now, or pay me later'. In the end I believe you
usually always end up paying pretty close to the same, it just depends on if
you are happy with the solution in 18-36 months or hate it.

~~~
bonesss
The tricky bit is when opportunity costs come in... "Pay me now" means
sacrificing clear opportunities in the visible horizon, "pay me later" does
the same thing at an less-controlled future point sacrificing presently
unknown opportunities.

Personally I much prefer 'pay me now', because 9 times out of 10, IME, some
form of technical debt underlies the reasoning about why a team can't do
something. 'It always pays to take the pain up front', basically.

------
ktamura
This article epitomizes the old adage: software engineers are terrible at
estimating. I have absolutely no stake in Mixpanel, and it has its own flaws,
but seeing something as misleading as this on the front page of HN means I
have to write a clarifying, if not somewhat edifying, comment =/

1\. Right off the bat, there's the cost of building all of this + maintaining
it. Something like Mixpanel at scale requires at least two, if not three,
engineers: one for client libraries, one for infrastructure (which the OP
blogs about), and another for the web app (dashboard, real-time user stream,
etc.) To be sure, not everyone needs all features of Mixpanel. To be really
sure, nobody needs all features of any SaaS tools. But if you really want to
compare apples to apples, then you have to account for these. To hire
competent software engineers who can collaborate closely and maintain a
complex piece of analytics software requires at least $100k per year, if not
twice that based on locale. That's at least $300k and more like $600k right
there.

2\. The whole point of something like Mixpanel is to make analytics accessible
to non-engineers. In their case, it's primarily product people and secondarily
marketing/customer success. In any case, building an analytics/data product
consumable by non-technical people is hard and takes way more than assembling
a couple of cloud infrastructure together. If there's one reason Mixpanel is
still in business, that's because of this.

3\. Finally, the OP has a valid point which should have been highlighted more,
if not to make their own biases more clear: Mixpanel's diminishing
differentiation is dev shops/consultancies' opportunity. It is indeed
incredible that a dev shop can build even a third of Mixpanel's functionality
by leveraging GCP components. Mixpanel had to build a lot of its core backend
systems from the ground up, including its original key-value store. Just this
year, they fully migrated to Google Cloud Platform themselves, suggesting that
there's really little room for differentiation among analytics vendors at the
level of infrastructure components (Mixpanel's arch-nemesis, Amplitude,
leverages Apache Kafka and various AWS components, most notably Amazon
Redshift, heavily)

With all of this being said, one thing remains true: the most expensive cost
of any software is people running them and the dependencies created around
them. These may not show up as line items, but they sure are deeply embedded
in your total cost.

~~~
partycoder
I think you are not counting: QA, user experience, DevOps, technical writing,
security auditing, user training, integration support...

What happens if the system collecting all of your data has vulnerability?
suddenly all of your apps and data are owned.

What if it's inefficient? then you have DDoSed yourself.

Non-functional requirements cannot be solved with a functional requirement
mindset.

What if you are losing data? what if you aggregate it incorrectly? what if
data is ingested in the wrong order? ... you get the idea.

It's not a 3 people problem. When you take on a hard problem like this you
have 10 different things to cover and you need to pick all of them.

~~~
brango
Having built something like this myself the components you actually need to
write are so small and simple the scope for security issues is minimal. All
the infra scales automatically.

Also, this post only addresses data flowing in one direction into BQ.
Presumably they're then using something like Tableau or the Google/AWS
offerings to build dashboards based on this data. There aren't many places for
bugs to hide unless the BQ perms are wide open.

------
guelo
This did not mention the front end at all. In my opinion getting a front end
anywhere near as slick and powerful as Mixpanel's is an even bigger challenge
than the data pipeline. The Google Reference Architecture diagram shows
spreadsheets, which would require a bunch of handroll queries, and BI tools
like Tableau which would be another cost and another big custom query
integration job. I've seen PMs spend hours in mixpanel quickly generating
custom report after custom report to really understand their data, none of
these other front ends would come close to that flexibility for non-technical
users.

------
ccorda
We've been doing something similar using keen.io.

Our goals were to get high volume web data (pageviews, clicks, etc.) alongside
application data already saved in our Firebase DB and synced to BigQuery.

We picked Keen because it has an open source web tracking lib
[https://github.com/keen/keen-tracking.js/](https://github.com/keen/keen-
tracking.js/) that easily plugs into our React/Redux stack.

They also have built-in streaming to BigQuery:
[https://keen.io/docs/integrations/google-
bigquery/](https://keen.io/docs/integrations/google-bigquery/)

Keen pricing is about 10% of mixpanel, so for our limited needs it has been
working well.

Long term if our volumes really grew the original post looks like a good
option, but figured we'd pass along this lower dev approach.

------
jdwyah
There's one other "big hammer" for saving money at MixPanel: "send fewer
events".

We saved 60% (which was no small amount of money) by realizing that sending
more than 1 of an event per user per hour was: A) Not necessary B) Not even
something you could query in MixPanel.

more thoughts in: [https://blog.ratelim.it/blog/how-to-save-money-on-event-
trac...](https://blog.ratelim.it/blog/how-to-save-money-on-event-tracking)

~~~
pc86
That's a great domain name and seems like a neat product. One of those things
you never think of then when you see it you think "of course that's a thing,
it makes perfect sense."

------
drej
For me, the issue is not cost. It's the data. You're paying for what's
essentially a data warehouse and BI tool... but you have little to no control
over it. Sure, it's fully managed, but you want to change X? You want to
evolve your schema? You need to change some data in history (I know)? You want
to build real-time systems based on these streams of data? You want Mixpanel
UI for other datasets in your company?

All of these are issues (that I have faced). From what I've seen, people just
keep downloading Mixpanel data and uploading to their DWH, sort of voiding the
reason to implement Mixpanel in the first place.

The tool is lovely, their export policy is great, but there's something about
actually owning your data.

------
mattbillenstein
Built a similar thing last year -- nginx (openresty) + lua -> NSQ -> python
streaming -> BQ

Using Metabase for visualization.

The other key benefit is you can warehouse the rest of your data in BQ as well
-- where it can be easily linked to your non-event data; marketing data from
other systems, etc.

~~~
mattbillenstein
Forgot to add -- analytics.js on the frontend...

------
Redsquare
Did you not consider snowplow analytics with something like periscope over the
top?

~~~
alexatkeplar
Thanks for the mention! We are working on our port of Snowplow to run cloud
natively on GCP currently - you can see our RFC here:

[https://discourse.snowplowanalytics.com/t/porting-
snowplow-t...](https://discourse.snowplowanalytics.com/t/porting-snowplow-to-
google-cloud-platform/1505)

------
partycoder
# Requirements

Unless you have restrictions about how your data moves and where it is stored,
or need to have a trusted computing base with no externally developed
software, or have very strict requirements that no available service
implements (unlikely), you are better off just using an open source solution
or paying for a service.

# Real cost

Even if you have to pay for a commercial service, in comparison, the TCO of
developing a similar solution in-house is very large.

When you create a system in-house, you are paying for: design, implementation,
testing, maintenance, deployment, infrastructure, security audits, training
for users, documentation, and sometimes costs go beyond engineering, e.g: UX
and graphic design and such.

After you are done spending all that money, you end up with a custom built
service that is far from the actual main activity that supports your business.

# Quality

If you are to authorize a team to do something like this, audit their code
constantly and impose a higher quality standard than you do for the rest of
your applications. This is because this system will be a dependency for all
your applications.

Even in large companies, it is unlikely that you have enough resources to have
a large dedicated working on something like this. Because of this,
requirements will need to be deprioritized or just neglected.

Because of this, you can lose all hope of selling this solution externally.

# Users

Then, since you are committing significant resources to your internal tool, it
is likely that every single team will be forced to use it. That in itself is
also a problem. What is better?

a) Learning how to use MixPanel, and put it in your resume (a skill that has
market value and can be traded).

b) Learning how to use a proprietary system that is only used internally
within a company. A skill that cannot be traded in the job market.

If I am a user, it is against my own self interest to push for an internally
developed tool.

~~~
rixed
To be fair though, it seams to me that they moved from one service that was
providing 150% of what they need to another one, Google cloud, that is
providing 90% of what they need plus blueprint exemples on how to implement
the 10% that are missing. If I'm not wrong this is totally reasonable,
especially if it gives them the flexibility to mine this data the way they
want, since it's really close to the core of their business.

------
Kiro
> Having said that, the event data is crucial and no data loss can be
> tolerated.

May I ask why? Seems like exactly the kind of data where some loss is fine.

~~~
TheReveller
Billing, perhaps.

------
code4tee
The challenge for offerings like MixPanel is that the market on other scalable
as-a-Service offerings (streaming data, big query, et al) has gotten to the
point where it’s far more trival for companies to just roll their own. Trying
to build all this 3-4 years ago would have been much much harder. MixPanel’s
pricing doesn’t reflect this change in their value-add proposition for larger
companies.

Just as a company gets big enough to offer a large revenue stream to Mixpanel
is the same time when it probably makes sense to ditch Mixpanel and and just
run all this in house for collecting, storing and processing massive data
streams. I see a lot of this sort of decision making taking place now.

------
tzury
Smart move, sure enough in this case.

See, once you have the data pipeline in place, you can bring front-end/
reporting platform to generate any reports you wish, from free (e.g Google
Data Studio) to self-hosted open source based, up to SaaS solutions which
available few hundred bucks per month. _Yet, you have full control /access to
your data_. This is the main benefit IMHO.

In our region, Tel Aviv, where Jelly Button and DoIT are located, $240K per
year == 2x great s/w engineers annual salary.

Perhaps it is time for MixPanel's product/exec. people to reconsider pricing
model at large scale. I assume there was a discussion between the parties
before they went on let's build our own.

------
statim24
I do some of this but in a slightly different, maybe simpler way: 1\. App runs
in App Engine (flex). It includes a simple /collect endpoint for mobile events
that just logs to /var/log/app_engine/custom_logs/track.json which
automatically gets picked up by Google's Stackdriver. App code can just
directly log to that file as well to record server-side events. (no network
involved in recording events). 2\. Stackdriver stores logs in Cloud Storage
and also dispatches a Cloud PubSub push to a configured topic. 3\. Another
mini App Engine flex app receives the message, does the ETL, and inserts the
row into BigQuery.

~~~
boundlessdreamz
The downside of app engine over GCE/GKE is lack of sustained use discounts

------
rixed
"During July 2017, our new data analytics pipeline was processing about 500
events every second."

Is 500 events per second considered high throughput over here, or do other
people feel they really just needed fault tolerance from Google cloud?

~~~
avip
"500 events per second" is approximately zero. We go to 3rd providers, not for
"tolerance" but because getting milk off-the-shelf in the supermarket is
easier than feeding and milking your own cow (unless you need raw milk or
something).

------
thegavsie
The "backend" would have been a perfect problem to be solved by Google's Cloud
Functions (i.e. serverless), rather than a Kubernetes cluster with Node and
nginx.

~~~
boundlessdreamz
If you are processing at high volume, it might be cheaper to run a cluster
rather than use Cloud Functions

------
georgewfraser
The biggest advantage of systems like this isn't even the money. You can use
the same data warehouse/SQL/BI tool stack to do sales reporting, analyze
marketing spend, track support KPIs---everything in your business. My company
(Fivetran) does the data-pipeline part of that, we event have a Mixpanel
connector for people who want to operate both systems in parallel or migrate
over time.

------
mherrmann
Building a self-hosted copy of MixPanel is surprisingly easy. I too did it for
a desktop app I'm working on [1]. Took a few days, not more, and is much more
flexible. Another advantage is that I retain control of the data.

[1]: [https://fman.io/docs/metrics](https://fman.io/docs/metrics)

------
jshen
Why not use App Engine instead of Container Engine? You could likely save more
that way.

~~~
brianwawok
What? Order of pricing cheap to expensive is GKE, GCE or Container Engine, App
Engine.

~~~
boundlessdreamz
How is GKE cheaper than GCE?

~~~
brianwawok
I think I messed up my branding.. I think GKE and GCE are now the same thing?

I was thinking GKE = Kubernetes, and GCE = was their "run container on a VM"
thing, but I think GKE and GCE are the same thing...

~~~
ianburrell
GCE is Google Compute Engine for running VMs. GKE was Google Container Engine
but makes more sense now as Google Kubernetes Engine. GKE runs on top of GCE
VMs.

------
100pctremote
I dunno. The way I look at it, whatever you could have been working on to
advance your company when Mixpanel took care of things, you weren't when you
were building this and can't when you have to maintain it.

~~~
notyourday
You are confused.

MixPanel costs $X per month. That cost only increases. Self-based solution
costs $Y to build and $Z to operate.

delta = $Z x n - ($Y + $Z x n)

Solve for that delta.

------
orliesaurus
240K - that's huge! I can't believe MixPanel gets that expensive - wow
considering their plans are like:

* FREE * 999 a year * Contact us

That number 3 must be really spooky!

~~~
geofft
I had the opposite reaction. If you're in a big city in the US, $240K is the
cost including overhead of one, _maybe_ two engineers. If running this in-
house takes up a total of a full-time engineering load (i.e., if the team
maintaining this wants to add headcount to run this and maintain prior
commitments), you haven't saved very much and you might be net negative.

(The math may be different in different job markets, in which case, yes, it
makes sense not to pay a startup that's paying SF wages to its own employees.)

~~~
orliesaurus
$240K for 1 Engineer? Wow-zie

~~~
geofft
It's not uncommon at all in the expensive markets (I'm thinking SFBA and NYC)
for an engineer who isn't straight out of college. But I'm counting overhead -
recruiting, benefits, employer-paid taxes, space, equipment, amortized cost of
just becoming a bigger company (helpdesk, possibly management, etc.).
Depending on how you count, that could add anywhere from 25% to over 100% to
the employee's take-home compensation. [http://web.mit.edu/e-club/hadzima/how-
much-does-an-employee-...](http://web.mit.edu/e-club/hadzima/how-much-does-an-
employee-cost.html)

$192K ($240k/1.25) is definitely common; $120K ($240k/2) is _low_ in these
markets for someone who knows what they're doing.

~~~
vonmoltke
You said "a big city". While those two are in that set, they are outliers.

------
balls187
off-topic.

One of my goto none coding interview questions is to have a candidate design a
mobile analytics solution.

From the article, I've learned Google has a reference architecture:
[https://cloud.google.com/solutions/mobile/mobile-gaming-
anal...](https://cloud.google.com/solutions/mobile/mobile-gaming-analysis-
telemetry)

------
igor_a
The easy and cheap way to keep mixpanel cost under control is sample events or
send only unique events per session.

~~~
dfcowell
That doesn't help for all kinds of data analysis - frequency analysis of a
given interaction is one example.

------
xstartup
I am using appengine flex and google compute instances. Do containers make
your setup more cost-effective?

------
kayhi
Anyone else spending around this much?

------
alexandre_m
"The whole project took us about 5 weeks to complete."

More valuable metric is person hours.

------
marklit
Does anyone have any experience of negotiating your costs down with Mixpanel?
Years ago I remember my CTO ringing up our hosting provider and in the space
of a phone call bringing some of our costs down 90%.

~~~
jdwyah
Definitely can negotiate, but it’s not going to be 90%. MixPanel is pretty
solidly enterprise. At this level of spend they’re sending a team of a couple
implementation people onsite and price is def negotiated.

------
iamleppert
Honestly you don't even need some kind of fancy google architecture, query
engine, or columnar data store.

I recently implemented 99% of what mixpanel provides using S3 and lambda. I
stored the events on S3 (actually get requests to an S3 bucket with logging
turned on and what I wanted to track seeialized as JSON in the query string of
the request.

From here it's as easy as writing lambda functions to process the logs and
output the results in some location where a static web app can visualize them.

