Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: PostHog (YC W20) – open-source product analytics
282 points by james_impliu on Feb 20, 2020 | hide | past | favorite | 83 comments
James, Tim and Aaron here - we are building a self-hosted, open source Mixpanel/Amplitude style product. The repo is at https://github.com/posthog/posthog and our home page is https://posthog.com/.

After four years of working together, we originally quit our jobs to set up a company focused on tech debt. We didn’t manage to solve that problem, but we learned how important product analytics were in finding users, getting them to try it out, and in understanding which features we needed to focus on to impact users.

However, when we installed product analytics, it bothered us how we needed to send our users’ data to 3rd parties. Exporting data from these tools costs $manyK a month, and it felt wrong from a privacy perspective. We designed PostHog to solve these problems.

We made PostHog to automatically capture every front-end click, removing the need to add track(‘event’) - it has a toolbar to label important events after they’re captured. That means you’re spending less time fixing your tracking. You can also push events too.

You can have API/SQL access to the underlying data, and it has analytics - funnels and event trends with segmentation based on event properties (like UTM tags). That means we’ve got the best parts of the 3rd party analytics providers but are more privacy and developer friendly.

We’re thinking of adding features around paths/retention/pushing events to other tools (ie slack/your CRM). We’d love to hear your feature requests.

We are platform and language agnostic, with a very simple setup. If you want Python/Ruby/Node, we give you a library. For anything else, there’s an API. The repo has instructions for Heroku (1 click!), Docker or deploy from source.

We’ve launched this repo under MIT license so any developer can use the tool. The goal is to not charge individual developers. We make money by charging a license fee for things like multiple users, user permissions, integrations with other databases, providing a hosted version and support.

Give it a spin: https://github.com/posthog/posthog. Let us know what you think!

Dalton mentions in another comment on this thread that posthog was originally a different idea and so I'm super interested in knowing if you guys really started building this on Jan 23rd [0] and got the backend, frontend, integrations, docs, sdks ready for not a soft-launch but a Launch HN in 4 weeks? That is nuts. Congratulations, either way.

I am glad someone is tackling this problem.

A feature request (or perhaps an architectural direction) would be if you could accommodate the backend behind graphql instead of Django+MySql, there's a potential for it go full Serverless (frontend and backend) with JAM-stack frameworks like redwood.js [1] (backed by apollo-graphql) or using Cloudflare Workers [2].

Edit: Another question I have is, is posthog at 80% feature parity with mixpanel / amplitude / heap already? If not, what do the timelines look like (asking since you're OSS, though, it is understandable if you can't reveal just yet). May be there needs to be a page on competitor-matrix on the website?

[0] https://github.com/PostHog/posthog/commits/master?after=9ae6...

[1] https://redwoodjs.com/

[2] https://blog.cloudflare.com/jamstack-at-the-edge-how-we-buil...

- We had a good view of what to do product-wise - the key thing that is different for us is the open source model.

- I'd stress how important it was feeling inspired by the idea. Ian from Mattermost was really helpful, as were Dalton and the YC partners. Enjoying what we were working on probably tripled our speed.

- I'm meh technically, so we focussed on making sure Tim (CTO) could focus on exclusively the development. We split it up pretty clearly to create the right environment. I did the design, product, website (Elementor/WP) and docs, Aaron focussed on getting user feedback.

- We spent $1k on marketing, to speed up user engagement early on, so that helped get some bugs out.

Will do a blog post if there's more interest in the journey.

Would love to hear more about this and would definitely like to read a post launch blog post.

Can you give more details about your Marketing spend? Where you spent it, how it converted, etc? If you are open to it

Thanks! We really did start committing code on Jan 23rd -- but we had strong views on what a solution should look like as we experienced these problems first hand.

We've already had requests from people to store events into different databases, but I hadn't considered doing it with graphql/JAM. That could be a really nice way of having the storage abstracted from the database.

In terms of feature parity, our goal is basically 100% parity. Anything you can do analytics wise in those tools you should be able to do in PostHog. We're going to try to keep up the same pace we've had for the last 4 weeks.

It's cool to see an open-source product analytics tool in YC!

I'm the co-founder of a company that had a similar value proposition back in 2017. We got invited to the interview at the YC office but couldn't convince people because of a number of reasons:

1. GDPR was not huge back in 2017 so the idea of creating an open-source alternative was not attractive enough.

2. We were targeting the companies that want to build their own data pipeline on cloud and the cloud providers such as AWS were claiming that their products (specifically Kinesis & Redshift) make it dead-easy to create such a data pipeline. At first, we thought that we were doing something complementary to cloud providers but soon we realized that we were competing with them. Our potential customers were trying to create such a data pipeline in AWS thinking that it will be simple and AWS actually made it easy to start in the beginning. However; data enrichments and the cost optimizations are really tricky when your data grows and our product was optimized for these workloads. AWS doesn't really need partners like us, we're saving costs from the customers but make AWS lose money in the long run because of these cost optimizations. The switching cost becomes more than just increasing the Redshift capacity by 2x as you store all the data in Redshift.

3. We're not native speakers so we probably couldn't express ourselves back then.

Time flies. We got into 500Startups Batch 21 that year but had to pivot last year since we couldn't make money to create a sustainable business.

Shameless plug: Right now, we provide the same feature-set (segmentation, funnel, retention, and SQL) for different CDPs such as Segment, Snowplow, Firebase, and in-house solutions. You can think of it Amplitude or Mixpanel but on top of your data warehouse. We generate SQL queries and run them on your data-warehouse just like a BI tool.

I would love to collaborate if you're open to partnerships since we're now complementary to each other. :) You can see how the product looks like from here: https://rakam.io/product

This is really interesting, I might give this a try over the weekend!

This is awesome and long overdue. We (I'm the Metabase founder) have been working on the Business Intelligence side of things, but have always struggled in our own usage with the ubiquity of closed source solutions for event collection (read: Google Analytics) and the relative lack of attention on this problem on the open source side.

I'm actually using this with Metabase and the two compliment each other really nicely!

Also, thanks for making something really cool :)

Awesome! We've been using Metabase for a while now, and it's amazing!

Nice one. Given all the genuine love for Metabase here in the comments, I will get it featured on SaaSHub. That should bring you some love more ;)

Metabase is great. I did a workshop on it last night and people were really impressed that it was free.

They also loved the X-ray feature.

Is metabase similar to statsbot? I just googled metabase, am I looking at the right one?

That is Sameer Al-Sakran, CEO of https://www.metabase.com.

And here is a recently show-hn'd OSS metabase alternative: https://news.ycombinator.com/item?id=22347516

You might also like reading a recent news.yc discussion on BI tools: https://news.ycombinator.com/item?id=21513566

Huge Metabase fan, use it on every project.

I love the idea of capturing all events and providing the user with an option to label "useful" events. Similarly, I'd like to capture API calls. In a typical modern SPA + REST api setup, calls to the REST api often correspond to events. An analytics integrations that captures all api calls and provides tools to label/transform these api calls as events would prove similarly useful.

We were debating applying it at a framework or query level but were nervous about this being harder to "clean up" conceptually - what would the equivalent of the "PostHog UX toolbar" need to look like to make that possible do you think?

Api calls could be treated as events. Furthermore, match groups should be more powerful allowing any query of the form [field] [operator] [value] (e.g. Text equals {value} or URL contains "/checkout"). However, the challenges that come to my mind are:

- (i) How to handle the request/response being essentially two events

- Should they be merged into one event?

- (ii) How to handle request/response bodies and http headers?

- Passing the bodies as event properties does not seems intuitive and it would be valuable to query them with XPath / JSONPath.

- Http headers could be passed as event properties

It is encouraging to see YC accepting Open Source products. It is generally difficult to monetize OSS and will be interesting to see how a collaboration with YC helps a startup like this.

Hi, I am the partner at YC who funded PostHog, though originally for a different idea.

I think this can be a great business, we have funded startups following similar models like Gitlab, Mattermost, etc. Excited to keep funding more :)

Given what you do, do you think open-source projects ran "on premise" (i.e. with your cloud provider of choice) paying a licence to use has a future beyond the enterprise scale?

We fairly recently released Bullet Train as 100% open source (https://bullet-train.io/). We're beginning to make it work, and have been surprised at the amount of enterprise interest it has generated.

Just because the code is open source doesn't mean you can't make money out of it.

maybe not the best choice of name


Actually I love the name. Seems tongue in cheek, a small criticism to "extreme" data-driven decision making that makes unfounded assumptions and oftentimes mistakes the forest for the trees, or over-optimizes the short term to a detriment of the long term.

Yes but post hog is also internet slang that means post a picture of your penis

Yeah, this name has some unfortunate connotations

Just yesterday I discussed a blog post that I have on my mind, with somebody from dbt, on the rise of the open source analytics stack.

there's a bunch of great hosted tools out there, across ETL, workflows, dashboards, etc. Think Fivetran, Segment, Matillion, Periscope, etc. And then of course the warehouses like Snowflake, Redshift, etc.

But I think there are three issues with that stack, roughly like this (I've got to do some more thinking, would appreciate your input):

- Privacy: you have your customer data flying around in all these different tools. it's hard to impossible to track your compliance

- cost: all these vendors charge in some way by data volume, MAUs, etc. - you get taxed multiple times for the same data stream. It all adds up.

- control: your data is subject to pre-determined schemas, proprietary formats, black boxes, etc. - mismatch for the same metric across different tools, less flexibility to manipulate your data and pack up and go elsewhere.

I think there's a valid open source alternative for every layer of the stack:

Segment --> Rudder Labs, Snowplow

Matillion --> Airflow, dbt

Fivetran --> Stitch / Singer

Periscope, Looker, Tableau, etc. --> Metabase, Superset

Warehouses --> just yesterday I learned about materialize.io here on HN

And then add open source products like PostHog, that add additional value for very specific use cases (in this case product analytics).

Not arguing the value of the hosted products. They are amazing to use if you just get started. But there's a great open source "stack" available that long term likely will be more transparent, more flexible, and cheaper.

Would love your thoughts!

Founder of RudderLabs (RudderStack now) here. You nailed it - Control, Cost & Privacy (Data-Ownership) are the main drivers we have seen in people adopting us. Kind of related to Control is the whole drive to adopting open-Source technologies. We engineers always wanted open-source but increasingly larger enterprises are interested too - maybe to avoid vendor locking or maybe just because cool engineers (who grew up adopting open-source stuff) are finally getting into decision making positions.

Shameless plug - we wrote a blog on setting up an Open Source Analytics Stack with one of our deployments highlighting these issues.


PostHog looks awesome!! Congrats on your launch. Would love to collaborate and share notes.

It's really unfortunate how HN culture (awesome in most respects) forces you to use defensive phrases like "shameless plug" for sharing work that you are really proud of and the community is proud of as well

That's not an HN thing so much as an internet thing: https://www.google.com/search?ei=95VPXs6rG8_4-gSVzZuIBw&q=%2....

I wouldn't take the self-deprecating language literally. It's probably a useful signal. It communicates "I'm a genuine community member sharing my work with the community" in a way that is hard for spammers to replicate.

Completely agree with you there, I think it's especially true for an often overlooked yet massive subset of the businesses : the smaller players.

They tend to have neither the resources nor knowledge that there's a world out there beyond the regular free to use but you pay with everybody's data and consolidate their power big corp (FB, Google, ...)

I think the way forward would be something akin to CloudFormation with presets on your own server (DO / Scaleway / AWS / ...), something managed over which you still have ownership and can plug off. My axis is particularly around marketing but I suppose you could expand.

PS : I would add Redash to the list of BI tools.

For many projects I've often looked for easy solutions that would handle exposing data connectors to the end user of a project. Somewhat like an embedded Zapier with community maintained connectors (e.g., for the end user to grab SalesForce data and sync it with your account).

I suppose Singer might be the closest thing from your list above, but still you'd have to build out a large amount of auth & end user tooling to get it to work.

Every B2B SaaS developer these days has to build in a ton of integrations. Even an affordable hosting service for this would work well (embedded integration where you don't require your customer to sign up or pay for a second product).

The interesting part to me here is that all the non-open source tools are very big, successful companies

The business model for OS has been really tough historically, which probably has made raising money harder, so growing as rapidly as the SAAS companies therefore really hard.

It's quickly becoming clear (GitLab/Mattermost/Sentry) that there are some great ways to build enormous companies like this though. And, that's of course assuming you want to build a huge company :)

At a personal level, we found that this kind of business is just more fun to build... making cool stuff in the open and if we do a good job, getting inbound interest from bigger companies that have developers who need to use our tech at scale.

I don't think it'd work for everything - if you are a tool that developers don't interact with much, then I'd imagine it's tougher to build a real community.

There's also some innate complications with the nature of a company adopting an open source tool. it's classic 'build vs buy', so if a company is investing in engineers building their own in-house solutions on top of open source tools, the additional cost of licensing support/additional tools doesn't really make sense. so there's immediate friction to monetize open source users.

on the other hand, SaaS products are optimized to sell to business groups who want the hard parts taken care of for them and they perceive value in saving time/money thanks to the SaaS product. I've seen both situations (monetizing open source versus monetizing a SaaS product) first hand, and it's clear that open source companies can be at a bit of a disadvantage. If they DO monetize their users, it's usually via their own SaaS offering to augment the open source tool.

Congrats on launch!

What are some of the selling points compared to more mature OS solutions like Matomo? Also, isn't the enterprise version the opposite of your thesis? I.e. "it bothered us how we needed to send our users’ data to 3rd parties", but then provide a hosted version which would do the same thing? How do you think about that?

Whilst Matomo and PostHog both provide analytics, Matomo are more focussed on session-tracking, rather than user/event-tracking. That means they’re better at things like analytics for traffic sources.

The things you can do with PostHog that you can’t easily do with Matomo, are things like pulling up identifiable user event histories, or plotting trends in product usage over time.

The enterprise version is just a private repo we'd give you access to that's still self hosted. We can also provide hosted deployments of any version, but that's really just for people that can't set it up themselves... hosting it isn't our core focus.

Oh man, good luck with social media. The Chapo Trap House community uses "post hog" as a memetic insult.

Hey - congrats on launching! I'm adding Posthog to my list of codebases to check out.

Can you talk more about your tech stack?

As an aside - it seems like most of the analytics companies I've heard of went through YC: MixPanel, Heap, Amplitude!

Thanks so much! It's built on pretty simple/proven technologies: Postgres, Django and React. We've already seen this scale to millions of daily events on basic Heroku dynos. We can swap in other databases if you end up going beyond that.

Got it! And how do you identify users of your open source version who you can upsell your paid version to, since there's no sign up needed to start using PostHog?

Or is this something you don't do?

We thought long and hard about this. We spoke to founders of other big OS projects - we think if developers at big companies want to use it, they'll want to use some of the enterprise features and reach out. We’ve already seen that start happening :-)

Congratulations, looks great. Im building something similar, except on a smaller scale: https://reactric.netlify.com

I would just like to say THANK YOU! I can't believe it has taken this long for someone to solve this problem in this way

Very cool - congrats on the launch! I like the docker deploy command that you posted on your landing page. Tried that out and it is super easy to get up and running.

Do you have a sample dataset to feed into our local environment or demo environment to test out the UI? Id love to poke around a bit before deploying to Heroku and setting it up on a site.

Thanks! There's a hidden 'demo' page [0] that you can click around that will create some events. You can also add that URL when you create an action to test out the editor.


Great idea to be compatible with Mixpanel libraries! Makes switching over really easy

Looks cool :)

Curious as to how deep you plan to go on the peripherals to product analytics - attaching additional attributes to users to group them (eg. Subscription level), getting a view into attribution channels for marketing strategy etc.

We have got the ability to do grouping in the backend at the moment, but the UI isn't quite there yet - we definitely want "team" level analytics as a good starting point as we've already had this question several times. We know that's important for B2B SAAS, a world we have come from before.

We don't aim to go "data science" deep with analytics, as we suspect you'd rather just integrate Metabase/Tableau/etc. We can see some cool ways to use it for attribution though - as you can host it we don't need to charge you enormous fees if your MTUs are very big... we see lots of B2C companies using product analytics on the product, but not the website, and struggling with tracking say UTM tags the whole way through.

There are two "out there" areas that we're really interested in right now...

1) We're thinking of focussing more on precisely what a developer (not product, not marketing) needs, as we think there is an underserved and enormous group here. Imagine when you're building something being able to run a command in your CLI, then being able to open a browser with a good understanding of which pages/features are being used as you work. The point being - give developers user data so they know how to build for impact.

2) We also want to explore integrations with other platforms to push stuff to them. I can't stop refreshing our own product, so I think pushing an Action to Slack, for example, would be helpful and would get it into everyday workflows a bit more easily. We don't want to do too much here and kind of hope the community spot these kinds of things and run with them :)

What's your reaction to the above? I'd love to know if you had a specific pain point in mind

I'm curious as to how you plan to scale PostHog to larger users. As the person who scaled Heap, here is my honest opinion of this. I think there is going to be a huge challenge ahead in scaling query performance. This was perpetually a challenge at Heap and was for a long time the main limitation on Heap's growth.

The challenge was tough enough for Heap and PostHog is going to be at a huge disadvantage due to the lack of multi-tenancy. When you use Heap, your data is stored across Heap's entire cluster of machines. When you run a query, that query is ran simultaneously against every single machine in Heap's cluster. Even though your data may be taking up something like .1% of the total disk space, when you run a query, 100% of the disk throughput of Heap's cluster will go to processing your query. It's not an overstatement to say this alone results in a >50x improvement to query performance.

I honestly think Heap wouldn't be possible without multi-tenancy. It's hard enough as is to get queries that process multiple terabytes of data to return in seconds when you have a fleet of dozens of i3s available. I'm not sure how you would do that with a fleet a tiny fraction of that size. If you're curious about Heap's infrastructure, Heap's CTO, Dan Robinson, has given a number of talks on how it works[0][1].

That's not to say that PostHog won't work for anyone. I previously tried (and failed) to start a company based on optimizing people's Postgres instances. One of the big takeaways I had was that no matter how you use it, Postgres will work completely fine as long as you have <5GB of data. I think if you have a modest amount of data, something like PostHog would work perfectly fine for you. Since the Postgres optimization business didn't work out, I wound up pivoting to freshpaint.io which eliminates the need to setup event tracking for your analytics and marketing tools by automatically instrumenting your entire site. Since I started working on it, things have been going a lot better.

[0] https://www.youtube.com/watch?v=NVl9_6J1G60

[1] https://www.youtube.com/watch?v=iJLq3GV1Dyk

For larger volumes of events, we wouldn't recommend using Postgres.

The nice thing about single-tenancy is that in reality lots of users have small enough datasets that scaling isn't a problem. Heap et al have to scale to all of their users combined (as you said, terabytes), we just have to scale to the biggest user. Postgres also allows you to get started very quickly and do lots of queries yourself.

In our docs we explain our thinking more. Postgres is great for the vast majority of use-cases, and we're working hard to optimise those queries. Once users get beyond Postgres, we have integrations with databases that can scale well across many hosts, and we provide services around this to help people size their servers correctly.

> Heap et al have to scale to all of their users combined.

The hard part wasn't scaling the system to handle all users combined. The hard part was designing the system such that when an individual user runs a query, they would get their results back in a reasonable amount of time.

Having every user in a single cluster made this easier because an individual customer could make use of the computer power of a cluster that was sized to fit the data for everyone in it. In other words, if Heap doubled the number of customers, Heap would get twice as fast for everyone. That's not true for PostHog.

> Heap et al have to scale to all of their users combined (as you said, terabytes), we just have to scale to the biggest user.

A decent sized Heap customer had multiple terabytes of data with the largest being well beyond that. You're going to have to figure out how to scale PostHog to that point without the benefits of multi-tenancy.

> Once users get beyond Postgres, we have integrations with databases that can scale well across many hosts, and we provide services around this to help people size their servers correctly.

I think a cluster of servers that could churn through terabytes of data in seconds would be prohibitively expensive for any individual customer to purchase.

" One of the big takeaways I had was that no matter how you use it, Postgres will work completely fine as long as you have <5GB of data. "

Surely you meant "5TB", not "5GB"?

> Surely you meant "5TB", not "5GB"?

I meant what I said. You can literally just setup a PG instance and it will work perfectly fine up to a few GB. At that point, you will probably start to see certain slow queries due to bad query plans. All you need to know are the basics of EXPLAIN ANALYZE, create a few indexes. That will get you to ~100GB at which point you will start to have to deploy more serious optimizations like partitioning, denormalization, etc. Once you get to multi-TB postgres instances, you will have to look at ways to horizontally scale your DB. This can be done in the Postgres world with something like Citus, but you would probably also want to look at non-Postgres based alternatives.

This is kind of shifty... is it 5GB or 100GB?

Yes, if you are dealing with large databases, you need to learn about... dealing with the large databases. 5GB is something that a small laptop will do.

You should mention on the README that the production dockerfile (and posthog/posthog:latest) are busted, they do not create any database. Spent last hour debugging it :) Otherwise, looking really good!

Apologies! We've made that clearer now.

Congrats on the launch!

I've been using PostHog with my app for about a week now, and so far the results have been good. Pretty straightforward to integrate with a Swift iOS app too!

This looks great but was there a reason for going with Postgres over something purpose built for analytics like Clickhouse? I am seriously considering building a similar tool for our platform as the cost of Mixpanel/Amplitude is cost prohibitive at our scale. We had to move off Google Analytics and run our own Matomo server, and it brings MariaDB to a grinding halt.

In any case, will be looking closer at this. Looks very interesting. Thanks.

We have actually integrated Clickhouse already for this reason. We started with Postgres as it works well for smaller volumes, but we have this integration in a paid version.

Thanks James. I'll keep this in mind and might shoot you guys an email. Cheers and good luck!

Have you tried Countly then? It's a serious contender to Mixpanel/Amplitude, self hosted and backend is MongoDB (if that is something you are interested in).

Your sign-up page seems broken [1] on my Firefox.

[1] https://imgur.com/a/wYxbKj4

Fixed :)

Great concept! Just one small issue: to "post hog" is internet slang for publishing pictures of one's penis. I am not kidding.

Given the logo, I'm assuming this is intentional?

Congrats on launch - a quick demo video would be wonderful to go with this.

It would be interesting to see quick deploy to firebase, aws (lambda??) or other services.

Any idea what a moderate size website (10k users per day,500k events) would cost to run on Heroku?

We’d love to do more quick deploy buttons.

We’ve seen 500k events work with the hobby dev dyno and database which is $14/month. Depending on how much data history you want to keep you could upgrade to standard-0 on Heroku or spin up an RDS instance which is cheaper.

Could you please point where is backend code in your github ?

Hi! If you mean the PostHog code itself, most of it is in the PostHog folder.

If you want to send events from your own backend to PostHog, there's instructions for Ruby/Python/Node/API in the docs[0]

[0] https://github.com/PostHog/posthog/wiki

Congrats on the launch!

Will have to see where I can fit this in to a project.

Looks pretty sweet. I like the UI for adding event captures.

Amazing, going to give it a try. How are you going to monetize? Like Metabase?

Edit: Nevermind, just read your last paragraph :-)

Thank you! This is much needed!

Exactly what I need right now. Trying it out (and congrats!) :)

Very nice, I'll use this on my next project for sure.

Looks great!


How do you plan on monetizing?

edit: whoops didn’t read.

That's up there.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact