
Launch HN: PostHog (YC W20) – open-source product analytics - james_impliu
James, Tim and Aaron here - we are building a self-hosted, open source Mixpanel&#x2F;Amplitude style product. The repo is at <a href="https:&#x2F;&#x2F;github.com&#x2F;posthog&#x2F;posthog" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;posthog&#x2F;posthog</a> and our home page is <a href="https:&#x2F;&#x2F;posthog.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;posthog.com&#x2F;</a>.<p>After four years of working together, we originally quit our jobs to set up a company focused on tech debt. We didn’t manage to solve that problem, but we learned how important product analytics were in finding users, getting them to try it out, and in understanding which features we needed to focus on to impact users.<p>However, when we installed product analytics, it bothered us how we needed to send our users’ data to 3rd parties. Exporting data from these tools costs $manyK a month, and it felt wrong from a privacy perspective. We designed PostHog to solve these problems.<p>We made PostHog to automatically capture every front-end click, removing the need to add track(‘event’) - it has a toolbar to label important events after they’re captured. That means you’re spending less time fixing your tracking. You can also push events too.<p>You can have API&#x2F;SQL access to the underlying data, and it has analytics - funnels and event trends with segmentation based on event properties (like UTM tags). That means we’ve got the best parts of the 3rd party analytics providers but are more privacy and developer friendly.<p>We’re thinking of adding features around paths&#x2F;retention&#x2F;pushing events to other tools (ie slack&#x2F;your CRM). We’d love to hear your feature requests.<p>We are platform and language agnostic, with a very simple setup. If you want Python&#x2F;Ruby&#x2F;Node, we give you a library. For anything else, there’s an API. The repo has instructions for Heroku (1 click!), Docker or deploy from source.<p>We’ve launched this repo under MIT license so any developer can use the tool. The goal is to not charge individual developers. We make money by charging a license fee for things like multiple users, user permissions, integrations with other databases, providing a hosted version and support.<p>Give it a spin: <a href="https:&#x2F;&#x2F;github.com&#x2F;posthog&#x2F;posthog" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;posthog&#x2F;posthog</a>. Let us know what you think!
======
ignoramous
Dalton mentions in another comment on this thread that posthog was originally
a different idea and so I'm super interested in knowing if you guys really
started building this on Jan 23rd [0] and got the backend, frontend,
integrations, docs, sdks ready for not a soft-launch but a _Launch HN_ in 4
weeks? That is nuts. Congratulations, either way.

I am glad someone is tackling this problem.

A feature request (or perhaps an architectural direction) would be if you
could accommodate the backend behind graphql instead of Django+MySql, there's
a potential for it go full Serverless (frontend and backend) with JAM-stack
frameworks like redwood.js [1] (backed by apollo-graphql) or using Cloudflare
Workers [2].

Edit: Another question I have is, is posthog at 80% feature parity with
mixpanel / amplitude / heap already? If not, what do the timelines look like
(asking since you're OSS, though, it is understandable if you can't reveal
just yet). May be there needs to be a page on competitor-matrix on the
website?

[0]
[https://github.com/PostHog/posthog/commits/master?after=9ae6...](https://github.com/PostHog/posthog/commits/master?after=9ae6854254085bbe10cc4f9c98820d9efed52424+306)

[1] [https://redwoodjs.com/](https://redwoodjs.com/)

[2] [https://blog.cloudflare.com/jamstack-at-the-edge-how-we-
buil...](https://blog.cloudflare.com/jamstack-at-the-edge-how-we-built-built-
with-workers-on-workers/)

~~~
james_impliu
\- We had a good view of what to do product-wise - the key thing that is
different for us is the open source model.

\- I'd stress how important it was feeling inspired by the idea. Ian from
Mattermost was really helpful, as were Dalton and the YC partners. Enjoying
what we were working on probably tripled our speed.

\- I'm meh technically, so we focussed on making sure Tim (CTO) could focus on
exclusively the development. We split it up pretty clearly to create the right
environment. I did the design, product, website (Elementor/WP) and docs, Aaron
focussed on getting user feedback.

\- We spent $1k on marketing, to speed up user engagement early on, so that
helped get some bugs out.

Will do a blog post if there's more interest in the journey.

~~~
akuji1993
Would love to hear more about this and would definitely like to read a post
launch blog post.

~~~
james_impliu
In case you missed it: [https://posthog.com/blog/pivot-to-
posthog/](https://posthog.com/blog/pivot-to-posthog/)

------
buremba
It's cool to see an open-source product analytics tool in YC!

I'm the co-founder of a company that had a similar value proposition back in
2017. We got invited to the interview at the YC office but couldn't convince
people because of a number of reasons:

1\. GDPR was not huge back in 2017 so the idea of creating an open-source
alternative was not attractive enough.

2\. We were targeting the companies that want to build their own data pipeline
on cloud and the cloud providers such as AWS were claiming that their products
(specifically Kinesis & Redshift) make it dead-easy to create such a data
pipeline. At first, we thought that we were doing something complementary to
cloud providers but soon we realized that we were competing with them. Our
potential customers were trying to create such a data pipeline in AWS thinking
that it will be simple and AWS actually made it easy to start in the
beginning. However; data enrichments and the cost optimizations are really
tricky when your data grows and our product was optimized for these workloads.
AWS doesn't really need partners like us, we're saving costs from the
customers but make AWS lose money in the long run because of these cost
optimizations. The switching cost becomes more than just increasing the
Redshift capacity by 2x as you store all the data in Redshift.

3\. We're not native speakers so we probably couldn't express ourselves back
then.

Time flies. We got into 500Startups Batch 21 that year but had to pivot last
year since we couldn't make money to create a sustainable business.

Shameless plug: Right now, we provide the same feature-set (segmentation,
funnel, retention, and SQL) for different CDPs such as Segment, Snowplow,
Firebase, and in-house solutions. You can think of it Amplitude or Mixpanel
but on top of your data warehouse. We generate SQL queries and run them on
your data-warehouse just like a BI tool.

I would love to collaborate if you're open to partnerships since we're now
complementary to each other. :) You can see how the product looks like from
here: [https://rakam.io/product](https://rakam.io/product)

~~~
gavinray
This is really interesting, I might give this a try over the weekend!

------
salsakran
This is awesome and long overdue. We (I'm the Metabase founder) have been
working on the Business Intelligence side of things, but have always struggled
in our own usage with the ubiquity of closed source solutions for event
collection (read: Google Analytics) and the relative lack of attention on this
problem on the open source side.

~~~
orliesaurus
Is metabase similar to statsbot? I just googled metabase, am I looking at the
right one?

~~~
ignoramous
That is Sameer Al-Sakran, CEO of
[https://www.metabase.com](https://www.metabase.com).

And here is a recently show-hn'd OSS metabase alternative:
[https://news.ycombinator.com/item?id=22347516](https://news.ycombinator.com/item?id=22347516)

You might also like reading a recent news.yc discussion on BI tools:
[https://news.ycombinator.com/item?id=21513566](https://news.ycombinator.com/item?id=21513566)

------
eclipsetheworld
I love the idea of capturing all events and providing the user with an option
to label "useful" events. Similarly, I'd like to capture API calls. In a
typical modern SPA + REST api setup, calls to the REST api often correspond to
events. An analytics integrations that captures all api calls and provides
tools to label/transform these api calls as events would prove similarly
useful.

~~~
james_impliu
We were debating applying it at a framework or query level but were nervous
about this being harder to "clean up" conceptually - what would the equivalent
of the "PostHog UX toolbar" need to look like to make that possible do you
think?

~~~
eclipsetheworld
Api calls could be treated as events. Furthermore, match groups should be more
powerful allowing any query of the form [field] [operator] [value] (e.g. Text
equals {value} or URL contains "/checkout"). However, the challenges that come
to my mind are:

\- (i) How to handle the request/response being essentially two events

\- Should they be merged into one event?

\- (ii) How to handle request/response bodies and http headers?

\- Passing the bodies as event properties does not seems intuitive and it
would be valuable to query them with XPath / JSONPath.

\- Http headers could be passed as event properties

------
codegeek
It is encouraging to see YC accepting Open Source products. It is generally
difficult to monetize OSS and will be interesting to see how a collaboration
with YC helps a startup like this.

~~~
dalton
Hi, I am the partner at YC who funded PostHog, though originally for a
different idea.

I think this can be a _great_ business, we have funded startups following
similar models like Gitlab, Mattermost, etc. Excited to keep funding more :)

~~~
lapnitnelav
Given what you do, do you think open-source projects ran "on premise" (i.e.
with your cloud provider of choice) paying a licence to use has a future
beyond the enterprise scale?

------
dimensi0nal
maybe not the best choice of name

[https://www.google.com/search?q=post+hog](https://www.google.com/search?q=post+hog)

~~~
montenegrohugo
Actually I love the name. Seems tongue in cheek, a small criticism to
"extreme" data-driven decision making that makes unfounded assumptions and
oftentimes mistakes the forest for the trees, or over-optimizes the short term
to a detriment of the long term.

~~~
pizza
Yes but post hog is also internet slang that means post a picture of your
penis

------
scapecast
Just yesterday I discussed a blog post that I have on my mind, with somebody
from dbt, on the rise of the open source analytics stack.

there's a bunch of great hosted tools out there, across ETL, workflows,
dashboards, etc. Think Fivetran, Segment, Matillion, Periscope, etc. And then
of course the warehouses like Snowflake, Redshift, etc.

But I think there are three issues with that stack, roughly like this (I've
got to do some more thinking, would appreciate your input):

\- Privacy: you have your customer data flying around in all these different
tools. it's hard to impossible to track your compliance

\- cost: all these vendors charge in some way by data volume, MAUs, etc. - you
get taxed multiple times for the same data stream. It all adds up.

\- control: your data is subject to pre-determined schemas, proprietary
formats, black boxes, etc. - mismatch for the same metric across different
tools, less flexibility to manipulate your data and pack up and go elsewhere.

I think there's a valid open source alternative for every layer of the stack:

Segment --> Rudder Labs, Snowplow

Matillion --> Airflow, dbt

Fivetran --> Stitch / Singer

Periscope, Looker, Tableau, etc. --> Metabase, Superset

Warehouses --> just yesterday I learned about materialize.io here on HN

And then add open source products like PostHog, that add additional value for
very specific use cases (in this case product analytics).

Not arguing the value of the hosted products. They are amazing to use if you
just get started. But there's a great open source "stack" available that long
term likely will be more transparent, more flexible, and cheaper.

Would love your thoughts!

~~~
soumyadeb
Founder of RudderLabs (RudderStack now) here. You nailed it - Control, Cost &
Privacy (Data-Ownership) are the main drivers we have seen in people adopting
us. Kind of related to Control is the whole drive to adopting open-Source
technologies. We engineers always wanted open-source but increasingly larger
enterprises are interested too - maybe to avoid vendor locking or maybe just
because cool engineers (who grew up adopting open-source stuff) are finally
getting into decision making positions.

Shameless plug - we wrote a blog on setting up an Open Source Analytics Stack
with one of our deployments highlighting these issues.

[https://dev.to/rudderstack/how-to-build-an-open-source-
analy...](https://dev.to/rudderstack/how-to-build-an-open-source-analytics-
stack-using-rudderstack-and-apache-superset-14h2)

PostHog looks awesome!! Congrats on your launch. Would love to collaborate and
share notes.

~~~
dreamer7
It's really unfortunate how HN culture (awesome in most respects) forces you
to use defensive phrases like "shameless plug" for sharing work that you are
really proud of and the community is proud of as well

~~~
dang
That's not an HN thing so much as an internet thing:
[https://www.google.com/search?ei=95VPXs6rG8_4-gSVzZuIBw&q=%2...](https://www.google.com/search?ei=95VPXs6rG8_4-gSVzZuIBw&q=%22shameless+plug%22).

I wouldn't take the self-deprecating language literally. It's probably a
useful signal. It communicates "I'm a genuine community member sharing my work
with the community" in a way that is hard for spammers to replicate.

------
FanaHOVA
Congrats on launch!

What are some of the selling points compared to more mature OS solutions like
Matomo? Also, isn't the enterprise version the opposite of your thesis? I.e.
"it bothered us how we needed to send our users’ data to 3rd parties", but
then provide a hosted version which would do the same thing? How do you think
about that?

~~~
james_impliu
Whilst Matomo and PostHog both provide analytics, Matomo are more focussed on
session-tracking, rather than user/event-tracking. That means they’re better
at things like analytics for traffic sources.

The things you can do with PostHog that you can’t easily do with Matomo, are
things like pulling up identifiable user event histories, or plotting trends
in product usage over time.

The enterprise version is just a private repo we'd give you access to that's
still self hosted. We can also provide hosted deployments of any version, but
that's really just for people that can't set it up themselves... hosting it
isn't our core focus.

------
veeralpatel979
Hey - congrats on launching! I'm adding Posthog to my list of codebases to
check out.

Can you talk more about your tech stack?

As an aside - it seems like most of the analytics companies I've heard of went
through YC: MixPanel, Heap, Amplitude!

~~~
timgl
Thanks so much! It's built on pretty simple/proven technologies: Postgres,
Django and React. We've already seen this scale to millions of daily events on
basic Heroku dynos. We can swap in other databases if you end up going beyond
that.

~~~
veeralpatel979
Got it! And how do you identify users of your open source version who you can
upsell your paid version to, since there's no sign up needed to start using
PostHog?

Or is this something you don't do?

~~~
timgl
We thought long and hard about this. We spoke to founders of other big OS
projects - we think if developers at big companies want to use it, they'll
want to use some of the enterprise features and reach out. We’ve already seen
that start happening :-)

------
darkhorse13
Congratulations, looks great. Im building something similar, except on a
smaller scale: [https://reactric.netlify.com](https://reactric.netlify.com)

------
RIMR
Oh man, good luck with social media. The Chapo Trap House community uses "post
hog" as a memetic insult.

------
krmmalik
I would just like to say THANK YOU! I can't believe it has taken this long for
someone to solve this problem in this way

------
dodata
Very cool - congrats on the launch! I like the docker deploy command that you
posted on your landing page. Tried that out and it is super easy to get up and
running.

Do you have a sample dataset to feed into our local environment or demo
environment to test out the UI? Id love to poke around a bit before deploying
to Heroku and setting it up on a site.

~~~
timgl
Thanks! There's a hidden 'demo' page [0] that you can click around that will
create some events. You can also add that URL when you create an action to
test out the editor.

[0] [https://127.0.0.1:8000/demo](https://127.0.0.1:8000/demo)

------
dreamer7
Great idea to be compatible with Mixpanel libraries! Makes switching over
really easy

------
dizzydiz
Looks cool :)

Curious as to how deep you plan to go on the peripherals to product analytics
- attaching additional attributes to users to group them (eg. Subscription
level), getting a view into attribution channels for marketing strategy etc.

~~~
james_impliu
We have got the ability to do grouping in the backend at the moment, but the
UI isn't quite there yet - we definitely want "team" level analytics as a good
starting point as we've already had this question several times. We know
that's important for B2B SAAS, a world we have come from before.

We don't aim to go "data science" deep with analytics, as we suspect you'd
rather just integrate Metabase/Tableau/etc. We can see some cool ways to use
it for attribution though - as you can host it we don't need to charge you
enormous fees if your MTUs are very big... we see lots of B2C companies using
product analytics on the product, but not the website, and struggling with
tracking say UTM tags the whole way through.

There are two "out there" areas that we're really interested in right now...

1) We're thinking of focussing more on precisely what a developer (not
product, not marketing) needs, as we think there is an underserved and
enormous group here. Imagine when you're building something being able to run
a command in your CLI, then being able to open a browser with a good
understanding of which pages/features are being used as you work. The point
being - give developers user data so they know how to build for impact.

2) We also want to explore integrations with other platforms to push stuff to
them. I can't stop refreshing our own product, so I think pushing an Action to
Slack, for example, would be helpful and would get it into everyday workflows
a bit more easily. We don't want to do too much here and kind of hope the
community spot these kinds of things and run with them :)

What's your reaction to the above? I'd love to know if you had a specific pain
point in mind

------
malisper
I'm curious as to how you plan to scale PostHog to larger users. As the person
who scaled Heap, here is my honest opinion of this. I think there is going to
be a huge challenge ahead in scaling query performance. This was perpetually a
challenge at Heap and was for a long time the main limitation on Heap's
growth.

The challenge was tough enough for Heap and PostHog is going to be at a huge
disadvantage due to the lack of multi-tenancy. When you use Heap, your data is
stored across Heap's entire cluster of machines. When you run a query, that
query is ran simultaneously against every single machine in Heap's cluster.
Even though your data may be taking up something like .1% of the total disk
space, when you run a query, 100% of the disk throughput of Heap's cluster
will go to processing your query. It's not an overstatement to say this alone
results in a >50x improvement to query performance.

I honestly think Heap wouldn't be possible without multi-tenancy. It's hard
enough as is to get queries that process multiple terabytes of data to return
in seconds when you have a fleet of dozens of i3s available. I'm not sure how
you would do that with a fleet a tiny fraction of that size. If you're curious
about Heap's infrastructure, Heap's CTO, Dan Robinson, has given a number of
talks on how it works[0][1].

That's not to say that PostHog won't work for anyone. I previously tried (and
failed) to start a company based on optimizing people's Postgres instances.
One of the big takeaways I had was that no matter how you use it, Postgres
will work completely fine as long as you have <5GB of data. I think if you
have a modest amount of data, something like PostHog would work perfectly fine
for you. Since the Postgres optimization business didn't work out, I wound up
pivoting to freshpaint.io which eliminates the need to setup event tracking
for your analytics and marketing tools by automatically instrumenting your
entire site. Since I started working on it, things have been going a lot
better.

[0]
[https://www.youtube.com/watch?v=NVl9_6J1G60](https://www.youtube.com/watch?v=NVl9_6J1G60)

[1]
[https://www.youtube.com/watch?v=iJLq3GV1Dyk](https://www.youtube.com/watch?v=iJLq3GV1Dyk)

~~~
timgl
For larger volumes of events, we wouldn't recommend using Postgres.

The nice thing about single-tenancy is that in reality lots of users have
small enough datasets that scaling isn't a problem. Heap et al have to scale
to all of their users combined (as you said, terabytes), we just have to scale
to the biggest user. Postgres also allows you to get started very quickly and
do lots of queries yourself.

In our docs we explain our thinking more. Postgres is great for the vast
majority of use-cases, and we're working hard to optimise those queries. Once
users get beyond Postgres, we have integrations with databases that can scale
well across many hosts, and we provide services around this to help people
size their servers correctly.

~~~
malisper
> Heap et al have to scale to all of their users combined.

The hard part wasn't scaling the system to handle all users combined. The hard
part was designing the system such that when an individual user runs a query,
they would get their results back in a reasonable amount of time.

Having every user in a single cluster made this easier because an individual
customer could make use of the computer power of a cluster that was sized to
fit the data for everyone in it. In other words, if Heap doubled the number of
customers, Heap would get twice as fast for everyone. That's not true for
PostHog.

> Heap et al have to scale to all of their users combined (as you said,
> terabytes), we just have to scale to the biggest user.

A decent sized Heap customer had multiple terabytes of data with the largest
being well beyond that. You're going to have to figure out how to scale
PostHog to that point without the benefits of multi-tenancy.

> Once users get beyond Postgres, we have integrations with databases that can
> scale well across many hosts, and we provide services around this to help
> people size their servers correctly.

I think a cluster of servers that could churn through terabytes of data in
seconds would be prohibitively expensive for any individual customer to
purchase.

------
Risse
You should mention on the README that the production dockerfile (and
posthog/posthog:latest) are busted, they do not create any database. Spent
last hour debugging it :) Otherwise, looking really good!

~~~
timgl
Apologies! We've made that clearer now.

------
elm_
Congrats on the launch!

I've been using PostHog with my app for about a week now, and so far the
results have been good. Pretty straightforward to integrate with a Swift iOS
app too!

------
leonardteo
This looks great but was there a reason for going with Postgres over something
purpose built for analytics like Clickhouse? I am seriously considering
building a similar tool for our platform as the cost of Mixpanel/Amplitude is
cost prohibitive at our scale. We had to move off Google Analytics and run our
own Matomo server, and it brings MariaDB to a grinding halt.

In any case, will be looking closer at this. Looks very interesting. Thanks.

~~~
james_impliu
We have actually integrated Clickhouse already for this reason. We started
with Postgres as it works well for smaller volumes, but we have this
integration in a paid version.

~~~
leonardteo
Thanks James. I'll keep this in mind and might shoot you guys an email. Cheers
and good luck!

------
Coxa
Your sign-up page seems broken [1] on my Firefox.

[1] [https://imgur.com/a/wYxbKj4](https://imgur.com/a/wYxbKj4)

~~~
timgl
Fixed :)

------
flashman
Great concept! Just one small issue: to "post hog" is internet slang for
publishing pictures of one's penis. I am not kidding.

~~~
pcmaffey
Given the logo, I'm assuming this is intentional?

------
samblr
Congrats on launch - a quick demo video would be wonderful to go with this.

------
pedalpete
It would be interesting to see quick deploy to firebase, aws (lambda??) or
other services.

Any idea what a moderate size website (10k users per day,500k events) would
cost to run on Heroku?

~~~
timgl
We’d love to do more quick deploy buttons.

We’ve seen 500k events work with the hobby dev dyno and database which is
$14/month. Depending on how much data history you want to keep you could
upgrade to standard-0 on Heroku or spin up an RDS instance which is cheaper.

------
samblr
Could you please point where is backend code in your github ?

~~~
timgl
Hi! If you mean the PostHog code itself, most of it is in the PostHog folder.

If you want to send events from your own backend to PostHog, there's
instructions for Ruby/Python/Node/API in the docs[0]

[0]
[https://github.com/PostHog/posthog/wiki](https://github.com/PostHog/posthog/wiki)

~~~
samblr
Thank you.

actual link :
[https://github.com/PostHog/posthog/tree/master/posthog](https://github.com/PostHog/posthog/tree/master/posthog)

------
dkatri
Congrats on the launch!

Will have to see where I can fit this in to a project.

------
rupertdev
Looks pretty sweet. I like the UI for adding event captures.

------
shafyy
Amazing, going to give it a try. How are you going to monetize? Like Metabase?

Edit: Nevermind, just read your last paragraph :-)

------
phmagic
Thank you! This is much needed!

------
mk4p
Exactly what I need right now. Trying it out (and congrats!) :)

------
marcushyett
Very nice, I'll use this on my next project for sure.

------
zzeder
Looks great!

------
stranger___
Great!

------
chasers
How do you plan on monetizing?

edit: whoops didn’t read.

~~~
neonate
That's up there.

