
Show HN: ClearBrain (YC W18) – Automated Causal Analytics - bmahmood
http://www.clearbrain.com
======
bmahmood
Hi I’m Bilal, cofounder at
[https://www.clearbrain.com](https://www.clearbrain.com) . ClearBrain is a new
analytics platform that helps you rank which product behaviors cause vs
correlate to conversion. Think Google PageRank, but for Analytics.

Our founding team worked on this problem for quite a few years while at Google
and Optimizely. We contributed to Google Analytics to analyze historical
behaviors in seconds, but observing historical trends merely produced noisy
correlations. We built Optimizely to measure true cause and effect through A/B
testing, but tests took 4-6 weeks on avg to reach significance, and so it
would take years to measure the impact of every single page or feature in an
app.

So we asked ourselves, could we estimate which in-app behaviors cause
conversion, to complement (not replace) a traditional A/B test? We spent a
year in R&D, and built ClearBrain as a self-serve “causal analytics” platform.
All you have to do is specify a goal - signup, engagement, purchase - and
ClearBrain ranks which behaviors are most likely to cause conversion.

Building this required a mix of real-time processing + auto ML + algorithm
work. We connect to a company’s app data via Segment, and ingest their app
events in real-time via Cloud Dataflow into a BigQuery backend. When a
customer uses the ClearBrain UI to select a specific app event as their
conversion goal, our backend will automatically run multiple observational
studies to analyze how every other app event may cause that goal. This is done
in parallel using SparkML, to analyze thousands of different events in
minutes. (more on our algorithm here:
[https://blog.clearbrain.com/posts/introducing-causal-
analyti...](https://blog.clearbrain.com/posts/introducing-causal-analytics))

We’ve had beta customers like Chime Bank, InVision, and TravelBank use
ClearBrain to estimate which behaviors and landing pages cause their users to
convert, and in turn prioritize their actual growth and A/B testing efforts
there.

We’re now releasing the product into general availability in partnership with
Segment - available on a free self-serve basis today! We look forward to
feedback from the HN community. :)

~~~
aditiyaa1
Hi Bilal, Thanks for the overview of the product. This is a really important
business problem to solve for many marketing teams. Just using this to
prioritize A/B tests in itself pretty valuable. But one of the concerns around
this approach is the un-reliability of causal analysis to estimate true
effects. The link below refers to a study done at FB that shows observational
studies could be erroneous in estimating effect sizes and in some cases, the
direction of the effects. Do you think clearbrain's system is robust enough to
estimate the true effects?

[https://www.kellogg.northwestern.edu/faculty/gordon_b/files/...](https://www.kellogg.northwestern.edu/faculty/gordon_b/files/fb_comparison.pdf)

~~~
bmahmood
Thanks for the great feedback! Yes, some of these limitations expressed in the
study are true in the case of ClearBrain - namely we are leveraging
observational studies at this time as a prioritized ranking algorithm for
which behaviors are most important, but the actual effect sizes themselves may
be variable. We're working on improvements, as well as incorporating actual
experiment data into our algorithm to make it more accurate over time.

~~~
aditiyaa1
Thanks! What is your strategy around incorporating actual experiment data? Not
sure I fully follow here.

------
newtothebay
Can you talk about how to infer causality without running an experiment? From
your description, "real-time processing + auto ML + algorithm" still sounds
very much observational to me.

I'm asking not as knock against your service, but genuine curiosity about how
you manage to solve this incredibly hard problem.

EDIT: From your white paper, it looks like you're running a regression that
controls for a bunch of confounders. You also interact the treatment variable
with those confounders to get the heterogeneous treatment effect.

My concern with that is that we're not controlling for unobservable
confounders, which make causal inference so difficult. If we assume that
controlling for observable confounders is enough (we shouldn't!), then
correlation and causation are the same.

White paper: [https://blog.clearbrain.com/posts/introducing-causal-
analyti...](https://blog.clearbrain.com/posts/introducing-causal-analytics)

~~~
bmahmood
Yep, you're correct that we're using observational studies via a regression to
remove confounders and estimate treatment effects. Our confounders are
synthetically generated based on the observable variables - we can only make
projections of course on digital signals our customers send us (we only use
first party data). We are working to incorporate actual experiment data into
the algorithm over time as well, to get even closer to the true causal
treatment effect.

~~~
noobhacker
Awesome! Could you speak a bit about what you have in mind to incorporate
actual experiment data into the algorithm?

------
pfbtgom
I think you have an interesting product, but I'm having serious issues with
your marketing.

Extraordinary claims require extraordinary evidence. How many of your
estimated treatment effects have been supported by experiments? Do you have
experiments demonstrating that your model generalizes? How accurate are your
estimates compared to experimental results?

It's ironic that you're marketing a causal + analytics product without any
data. Generating a narrative and basing it off of observational data is the
typical trap that many causal claims fall into. Portraying yourselves as
statistical experts and pushing unsubstantiated claims is misleading bordering
on unethical.

------
seanwilson
In reference to how regular AB testing needs a certain amount of data to get
statistically significant results, what kind of level of traffic + conversions
would you need for this to work? Would it be useful for sites with low
traffic?

~~~
bmahmood
Hi Sean - great point! When I was at Optimizely working on their data science
team, we found that on average a test needed 10K-20K unique visitors to reach
significance.

We find that this rule of thumb extends similarly to our causal analytics
platform. However, we have found that even low-traffic sites are able to get a
boost if they are tracking more events on their website (increases the
opportunities for signal). Also, our simulations run in minutes on all your
historical, rather waiting for weeks for users to be exposed to the test,
which speeds up time to insight. If we can not determine significance in our
simulation though (due to either sample size or signal), we will designate the
projection as a correlation.

------
gingerlime
Hi Bilal!

I think I reached out to you in early 2018. Any news about the integration
without segment (e.g. Amplitude in our case)? And what’s the pricing model?
Couldn’t find much on the site (maybe it’s more limited on mobile?)

~~~
bmahmood
Thanks for reaching out again! We're prioritizing support for Segment at this
time, but hope to add other integrations next year. Our analytics product is
completely free, so getting set up on our joint solution with Segment
shouldn't be too expensive. :)

~~~
gingerlime
Thank you. What’s your monetization strategy then? (Were you acquired by
Segment or something? honest question)

------
polskibus
How is this related to process mining theory and tools like Celonis?

------
puranjay
Nothing about the product but man, that's a fantastically beautiful and
informational website. Reminds me of Stripe

~~~
bmahmood
Aww thank you! Our design team is pretty cool. :)

------
dtran
Congrats on the launch Bilal and team!

