I've been working on Gazette for a little while now, having used it as a proprietary solution in my last couple of companies, and having now open-sourced it. I'd appreciate the community's take on it.
One of Gazette's contributions is that it unifies "my real-time event stream" and "all of my historical data" into a single source-of-truth with a representation that's super easy to integrate (plain old files on S3).
Going forward, what excites me is its potential to unify 1) stream processing, 2) Data Lake build-out, and 3) ad-hoc analysis (via BigQuery/Athena/Hive/etc external tables) into a single system. If that's interesting, I'd love to talk with you about it.
I think this looks very neat. I think you have exactly the right idea about offloading the physical storage to object stores like S3 -- I had the same idea some time ago, and using a version of it for an internal analytics streaming system.
Thanks for making this open source. I'm considering an application where I might be able to use this.
I've been working on Gazette for a little while now, having used it as a proprietary solution in my last couple of companies, and having now open-sourced it. I'd appreciate the community's take on it.
One of Gazette's contributions is that it unifies "my real-time event stream" and "all of my historical data" into a single source-of-truth with a representation that's super easy to integrate (plain old files on S3).
Going forward, what excites me is its potential to unify 1) stream processing, 2) Data Lake build-out, and 3) ad-hoc analysis (via BigQuery/Athena/Hive/etc external tables) into a single system. If that's interesting, I'd love to talk with you about it.