More

Callicles · 2025-10-21T00:53:28 1761008008

Hey,

We went from the get go to that infrastructure for multiple reasons in the first place:

* Having a durable buffer before ensures if you have big spikes that gets eaten by the buffer, not OLAP which when it is powering your online dashboard you want to keep responsive. Clickhouse cloud now has compute/compute that addresses that but open source users' don't.

* When we shipped this for the first time, clickhouse did not have the async buffering in place, so not doing some kind of buffered inserts was forwned upon. * As oatsandsugar mentioned, since them we also shipped direct insert where you don't need a kafka buffer if you don't want it

* From an architecture standpoint, with that architecture you can have multiple consumers

* Finally, having kafka enables having streaming function written in your favorite language vs using SQL. Definitely will be less performance to task ratio, but depending on the task might be faster to setup or even you can do things you couldn't directly in the database.

Disclaimer I am the CTO at Fiveonefour

hodgesrm · 2025-10-21T01:43:13 1761010993

> Clickhouse cloud now has compute/compute that addresses that but open source users' don't.

Altinity is addressing this with Project Antalya builds. We have extended open source ClickHouse with stateless swarm clusters to scale queries on shared Iceberg tables.

Disclaimer: CEO of Altinity

bonobocop · 2025-10-21T19:47:23 1761076043

The durability and transformation reasons are definitely more compelling, but the article doesn’t mention those reasons.

It’s mainly focused on the insert batching which is why I was drawing attention to async_insert.

I think it’s worth highlighting the incremental transformation that CH can do via the materialised views too. That can often replace the need for a full blown streaming transformation pipelines too.

IMO, I think you can get a surprising distance with “just” a ClickHouse instance these days. I’d definitely be interested in articles that talk about where that threshold is no longer met!

maxjustus · 2025-10-21T15:33:22 1761060802

Nothing stopping an OSS user from pointing inserts at one or more write focused replicas and user facing queries at read focused replicas!

Callicles · 2025-08-15T21:27:41 1755293261

Not yet, but we have that on the roadmap!

Callicles · 2025-08-15T20:11:18 1755288678

MooseStack maintainer here. I helped author the post. Happy to answer any questions, but very curious to get feedback. We’ve been thinking a lot about developer experience for the OLAP stack.

Callicles · 2025-07-31T00:46:08 1753922768

I believe I am saying child processes can write to stdout as the main process is shutting down. Also, if the child processes are not shut down properly and are left dangling, and the child processes were set up as 'inherit' to be able to write directly to stdout/stderr then yes.

Callicles · 2025-05-20T14:05:02 1747749902

Not sure if this is what you are asking about, so if I misread feel free to correct me. You don’t have to install moose first on the deployment machine, in the tutorial I go through that to generate a dummy moose application to be deployed.

It is the same idea as a nextjs application you deploy through docker, you have your application and then you build your docker container that contains your code, then you can deploy that.

I tried to limit the port bindings, we usually expose moose itself since one of the use case is collecting data for product analytics from a web front end, which pushes data to moose. And then usually people want to expose rest apis on top of the data they have collected. The clickhouse ports could be fully closed, this was an example of if you want to connect PowerBook to it

Callicles · 2025-05-20T13:56:31 1747749391

Hi!

We are built on top of them. Right now the techs above are what’s backing the implementation but we want to add different compatibilities. So that you can eventually have for example airflow backing up your orchestration instead of temporal.

You can think of moose as the pre-built glue between those components with the equivalent UX of a web framework (ie you get hit reloading, instant feedback, etc…)

Callicles · 2025-05-19T22:54:53 1747695293

I put this Docker-Compose recipe together to make kicking the tires on Moose—our open-source data-backend framework—almost friction-less.

What you get:

• A single docker compose up that spins up ClickHouse, Redpanda, Redis and Temporal with health-checks & log-rotation already wired.

• Runs comfortably on an 8 GB / 4-core VPS; scale-out pointers are in the doc if you outgrow single-node.

• No root Docker needed; the stack follows the hardening tips ClickHouse & Temporal recommend.

Why bother?

Moose lets you model data pipelines in TypeScript/Python and auto-provisions the OLAP tables, streams and APIs—cuts a lot of boilerplate. Happy to trade notes on the approach or hear where the defaults feel off.

Docs: https://docs.fiveonefour.com/moose/deploying/self-hosting/de...

18-min walkthrough video: https://www.youtube.com/watch?v=bAKYSrLt8vo

pitah1 · 2025-05-20T01:02:18 1747702938

I have a small open-source project, that uses docker compose behind the scenes, to help startup any service. You can look to add it in (or I am also happy to add it in) and then users are one command away from running it (insta moose). Recently just added in lakekeeper and various data annotation tools.

insta-infra: https://github.com/data-catering/insta-infra

Callicles · 2025-05-20T02:04:36 1747706676

Interesting. How do you do dependencies between those pieces of infrastructure if there's any? For example, in our Docker Compose file, we have temporal that depends on progress and then moose depends on temporal. How is that expressed in Insta-Infra?

pitah1 · 2025-05-20T02:12:40 1747707160

It leverages docker compose 'depends_on' for the dependencies (https://docs.docker.com/compose/how-tos/startup-order/). For example, airflow depends on airflow-init container to be completed successfully which then depends on postgres.

https://github.com/data-catering/insta-infra/blob/main/cmd/i...

Callicles · 2025-04-23T20:11:46 1745439106

Callicles · 2025-04-23T17:12:46 1745428366

Founder here. Thanks for the interest! We built Moose because we were tired of the complexity involved in setting up and maintaining data pipelines.

What makes Moose different is how it simplifies the entire workflow - from ingestion to processing to serving data through APIs. We've found teams spend too much time wiring together different tools rather than focusing on the actual data insights.

The local development experience was a big focus for us. You can instantly test your changes with real data without waiting for deployments. And we've made sure the same code runs identically in production to eliminate those frustrating "works on my machine" moments.

Happy to answer any questions about our technical approach or how we're handling specific use cases. We're particularly interested in hearing about pain points you've experienced with existing data systems or any feedback you might have on Moose.

Zephyr314 · 2025-04-23T18:01:39 1745431299

Do you guys have examples of people actually using this in production? I'm curious how it scales beyond dev.

Callicles · 2025-04-23T18:22:06 1745432526

We are currently in Production on Boreal https://www.fiveonefour.com/boreal, our hosting solution for Moose with F45 https://f45training.com, a global studio fitness studio brand. We wrote a case study with them here: https://www.fiveonefour.com/blog/case-study-f45. So we Have a 24/7 consumer facing deployment that we have been running for the last 5 months.

We are going towards 1.0 from an API perspective, we have just landed what we internally call DMV2 which is the latest iteration of the abstraction level for the api. Think SST / Terraform CDK vertically integrated for Data.

If you are looking to work with Moose in production we would love to chat with you :)

f45_greg · 2025-04-23T18:32:51 1745433171

Hi Zephyr! I'm the Head of Engineering at F45 Training. We had early access to moose, and we've been using it in production since last year with thousands of our members. We use moose to manage the backend for LionHeart - our heart rate tracking system in studio. We also use Moose's paid hosting service called Boreal. It's a new product so still a bit rough around the edges - but it has scaled really well for us and the 514 Team has been terrific.

wiradikusuma · 2025-04-23T19:38:39 1745437119

Does it support Timescale(DB)?

Callicles · 2025-04-23T19:48:25 1745437705

Not yet, but if you are interested in using we could chat and add support for it. We want to expand support eventually for all major OLAP provider.

Callicles · on June 29, 2023

Probably not what the people are asking for but it is what they need