Sai from PeerDB here. Temporal has been very impactful for us and a major factor in our ability to build a production-grade product that supports large-scale workloads.
At a high level, CDC is a complex state machine. Temporal helps building the state machine taking care of auto-retries/idempotency at different failure points and also aids in managing and observing it. This is very useful to identify root causes when issues arise.
Managing Temporal shouldn’t be complex. They offer a well-maintained, mature Docker container. From a user standpoint, the software is intuitive and easy to understand. We package the temporal docker container in our own Docker setup and have it integrated into our Helm charts. We’ve quite a few users smoothly using Enterprise (that we open sourced recently) and standard OSS!
Thanks for reaching out! Just to be clear, from what I can tell, both PeerDB and Temporal are great (and I’ve been hoping to learn Temporal for a while). At some point I considered self-hosting PeerDB but my impression was that it required multiple nodes to run properly and so it wasn’t budget friendly - this is also based on your pricing plans with $250 being the cheapest which suggests that it’s not cheap to host it (I’m trying to minimize costs until I have more customers). Please correct me if I’m wrong! Can you give me an example of a budget friendly deployment, e.g. how many EC2 instances for PeerDB would I need for one of the smaller RDS instances?
Given the acquisition by ClickHouse (congrats!), what can we expect for the CDC for sinks other than CH? Do you plan to continue supporting different targets or should we expect only CH focus?
Edit: also, any plans for supporting e.g. SNS/SQS/NATS or similar?
Great question! PeerDB can be just run on a single EC2 instance (using either Docker or Helm charts). A typical production-grade setup could use 4 vCores with 16GB RAM, You can scale up or down based on resource usage for your workload.
To elaborate more on the architecture (https://docs.peerdb.io/architecture), the flow-worker does most of the heavy lifting (actual data movement during initial load and CDC), while the other components are fairly lightweight. Allocating around 70-80% of provisioned resources to the flow-worker is a good estimate. For Temporal, you could allocate 20-25% of resources and distribute the rest to other components.
Our open-source offering (https://github.com/PeerDB-io/peerdb) supports multiple connectors (CH and non-CH). Currently, there aren’t any plans to make changes on that front!
1. i am not sure if the helm chart can be used for the oss version?
2. if a helm chart needs sh files, it’s already an absolut no-go since it won’t work with gitops that well.
Hi, the helm chart uses the OSS PeerDB images.
The sh files were created to bootstrap the values files for easier (and faster) POCs.
You can append a `template` argument when running the script files which will lead to a set of values file being generated, which you can then modify accordingly.
There is a production guide for the same as we have customers in production using GitOps (ArgoCD) with the provided charts (https://github.com/PeerDB-io/peerdb-enterprise/blob/main/PRO...)
OP here. It depends on the workload/query - number of columns in the query, filters, presence of aggregates, etc. Overall, I've seen ClickHouse perform well with window functions. This is a common strategy customers use to deduplicate data.
Ack, true VC-backed startup optimize for TAM and therefore go more generalized. At PeerDB, we were still focused on just Postgres and were expanding from CDC to other ETL/data-movement use cases for Postgres, such as Active<>Active, Database migrations etc. With the pace at which Postgres is growing, I believe that ETL for Postgres can meet that billion-dollar TAM. Anyway, we were recently acquired by ClickHouse and are now doubling down on providing a world-class CDC experience from Postgres to ClickHouse. :)
Excited about the work here. However, my 2 cents - for this to become a reality (serious production use at scale), I don’t think it’s just based on the choice of the analytical engine (here, DuckDB), but rather on how well the Postgres extension is built. The Postgres extension framework is complex, still maturing, and doesn’t offer full flexibility to implement features. We saw this closely at Citus. It was a deterrent to competing with native analytical databases like ClickHouse and Snowflake. A bunch of customers, including CloudFlare and Heap, switched from Citus to ClickHouse and SingleStore, respectively. This was one of the inspirations to start PeerDB , to make it magical for customers to move data from Postgres to native and purpose-built analytical databases like ClickHouse.
Being a Postgres fan, Good luck and best wishes with the effort here!
Super cool. Great work team! Love the deploy feature to deploy the entire playground to the cloud and get a connection string. Helps devs get started with Postgres projects very quickly.
Great question! I'm expecting it to support parent/child tables too as the way we implemented partitioned table support is querying the pg_inherits metadata table - https://github.com/PeerDB-io/peerdb/blob/2d30e5fae887552f93c... However, inheritance (old way of partitioning) isn't a common thing with Postgres. Out of 100s of workloads I've seen in the past decade, it came up a couple of times...
Thanks for posting this question! I previously worked at Citus for 8 years, where we tried to bring real-time analytical capabilities to Postgres. It was common to see POCs go sideways, and several customers (including CloudFlare and Heap) moved from Citus to an analytics-specialized database. For example, CloudFlare moved from Citus to ClickHouse. This was one of the inspirations for me to build a company (PeerDB) that brings specialized OLTP and OLAP databases together.
This is not to say that Postgres cannot support larger-scale analytical workloads, but it will take time. ClickHouse has taken 10 years of effort and development to get where it is now.
I would love to understand how tablespace performs at scale in production workloads. Are there any references that you could share. :)
At a high level, CDC is a complex state machine. Temporal helps building the state machine taking care of auto-retries/idempotency at different failure points and also aids in managing and observing it. This is very useful to identify root causes when issues arise.
Managing Temporal shouldn’t be complex. They offer a well-maintained, mature Docker container. From a user standpoint, the software is intuitive and easy to understand. We package the temporal docker container in our own Docker setup and have it integrated into our Helm charts. We’ve quite a few users smoothly using Enterprise (that we open sourced recently) and standard OSS!
https://github.com/PeerDB-io/peerdb/blob/main/docker-compose...
https://github.com/PeerDB-io/peerdb-enterprise
Let me know if there are any questions!
reply