
Launch HN: Batch (YC S20) – Replays for event-driven systems - dsies
Hello HN!<p>We are Ustin and Daniel, co-founders of Batch (<a href="https:&#x2F;&#x2F;batch.sh" rel="nofollow">https:&#x2F;&#x2F;batch.sh</a>) - an event replay platform. You can think of us as version control for data passing through your messaging systems. With Batch, a company is able to go back in time, see what data looked like at a certain point and if it makes sense, replay that piece of data back into the company&#x27;s systems.<p>This idea was born out of getting annoyed by what an unwieldy blackbox Kafka is. While many folks use Kafka for streaming, there is an equal number of Kafka users that use it as a traditional messaging system. Historically, these systems have offered very poor visibility into what&#x27;s going on inside them and offer (at best) a poor replay experience. This problem is prevalent pretty much across every messaging system. Especially if the messages on the bus are serialized, it is almost guaranteed that you will have to write custom, one-off scripts when working with these systems.<p>This &quot;visibility&quot; pain point is exacerbated tenfold if you are working with event driven architectures and&#x2F;or event sourcing - you must have a way to search and replay events as you will need to rebuild state in order to bring up new data stores and services. That may sound straightforward, but it&#x27;s actually really involved. You have to figure out how and where to store your events, how to serialize them, search them, play them back, and how&#x2F;when&#x2F;if to prune, delete or archive them.<p>Rather than spending a ton of money on building such a replay platform in-house, we decided to build a generic one and hopefully save everyone a bunch of time and money. We are 100% believers in &quot;buy&quot; (vs &quot;build&quot;) - companies should focus on building their core product and not waste time on sidequests. We&#x27;ve worked on these systems before at our previous gigs and decided to put our combined experience into building Batch.<p>A friend of mine shared this bit of insight with me (that he heard from Dave Cheney, I think?) - &quot;Is this what you want to spend your innovation tokens on?&quot; (referring to building something in-house) - and the answer is probably... no. So this is how we got here!<p>In practical terms, we give you a &quot;connector&quot; (in the form of a Docker image) that hooks into your messaging system as a consumer and begins copying all data that it sees on a topic&#x2F;exchange to Batch. Alternatively, you can pump data into our platform via a generic HTTP or gRPC API. Once the messages reach Batch, we index them and write them to a long-term store (we use <a href="https:&#x2F;&#x2F;www.elassandra.io" rel="nofollow">https:&#x2F;&#x2F;www.elassandra.io</a>). At that point, you can use either our UI or HTTP API to search and replay a subset of the messages to an HTTP destination or into another messaging system.<p>Right now, our platform is able to ingest data from Kafka, RabbitMQ and GCP PubSub, and we&#x27;ve got SQS on the roadmap. Really, we&#x27;re cool with adding support for whatever messaging system you need as long as it solves a problem for you.<p>One super cool thing is that if you are encoding your events in protobuf, we are able to decode them upon arrival on our platform, so that we can index them and let you search for data within them. In fact, we think this functionality is so cool that we really wanted to share it - surely there are other folks that need to quickly read&#x2F;write encoded data to various messaging systems. We wrote <a href="https:&#x2F;&#x2F;github.com&#x2F;batchcorp&#x2F;plumber" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;batchcorp&#x2F;plumber</a> for that purpose. It&#x27;s like curl for messaging systems and currently supports Kafka, RabbitMQ and GCP PubSub. It&#x27;s a port from an internal tool we used when interacting with our own Kafka and RabbitMQ instances.<p>In closing, we would love for you to check out <a href="https:&#x2F;&#x2F;batch.sh" rel="nofollow">https:&#x2F;&#x2F;batch.sh</a> and tell us what you think. Our initial thinking is to allow folks to pump their data into us for free with 1-3 days of retention. If you need more retention, that&#x27;ll require $ (we&#x27;re leaning towards a usage-based pricing model).<p>We envision Batch becoming a foundational component of your system architecture, but right now, our #1 goal is to lower the barrier to entry for event sourcing and we think that offering &quot;out-of-the-box&quot; replay functionality is the first step towards making this happen.<p>.. And if event sourcing is not your cup of tea - then you can get us in your stack to gain visibility and a peace of mind.<p>OK that&#x27;s it! Thank you for checking us out!<p>~Dan &amp; Ustin<p>P.S. Forgot about our creds:<p>I (Dan), spent a large chunk of my career working at data centers doing systems integration work. I got exposed to all kinds of esoteric things like how to integrate diesel generators into CMSs and automate VLAN provisioning for customers. I also learned that &quot;move fast and break things&quot; does not apply to data centers haha. After data centers, I went to work for New Relic, followed by InVision, Digital Ocean and most recently, Community (which is where I met Ustin). I work primarily in Go, consider myself a generalist, prefer light beers over IPAs and dabble in metal (music) production.<p>Ustin is a physicist turned computer scientist and worked towards a PhD on distributed storage over lossy networks. He has spent most of his career working as a founding engineer at startups like Community. He has a lot of experience working in Elixir and Go and working on large, complex systems.
======
worldsoup
I've worked on a very similar product in the past and can affirm that there is
definitely enterprise interest for a good solution to event replay for orgs
that are already doing event sourcing...I'm curious if offering out of the box
replay will actually lower the bar and drive more orgs to pursue event
sourcing? The CLI search functionality is really cool and useful as well.

~~~
dsies
Hey there!

Re: lowering the bar - we hope so. What we've noticed is that the papers that
talk about event sourcing mention replays but don't talk at all about the
implementation (or give any pointers). We're hoping that if at least that part
is done for you, you've got one less thing to worry about.

As for the CLI tool - thanks! We found it super useful ourselves and figured
others would too. I like to think of it as a sort of intelligent `netcat` for
messaging systems :D

------
pratio
Congrats on the release. We've made a ragtag solution in-house that is
complicated but works on those few unfortunate events that we need it. There's
a demo on request but it would be helpful if we can have a better way to test
the product. Maybe an endpoint where we stream maybe 10000 events and see them
replay? What sort of pricing tier are we talking about?

~~~
dsies
Thank you!

Re: ragtag in-house solution that is complicated

^ That's _exactly_ what we're talking about. These systems get complex pretty
quick and you end up with duct tape in more than a few places.

As for demo - yeah, our plan is to open up registrations for accounts soon
which will allow you to pump data into us for free with a low retention
period.

We've still got some pieces to tighten before we can open the gates fully but
we'll try to make it happen soon (within next few weeks?). In the meantime, if
you want a demo, ping us and we'll make it happen.

------
logicx24
Wow, this is a great idea. I recently worked on a team building streaming data
pipelines, and we built a bespoke system to do exactly this: end-to-end test
our software. We had past messages written to a >300TB sharded file, and wrote
a microservice to read each shard and publish it to the message queue for our
staging instance, and then run data validation/anomaly detection on the
output. It was useful but incredibly painful to use and maintain, and Batch
would have been a fantastic solution for accomplishing this.

------
kanobo
Congrats, looks useful! Just an opinion, but I think you should skip the cool
large animation on your homepage and just start with the "Our platform is
essential in scaling and maintaining your business.". I had no idea what Batch
was until I scrolled way below the fold.

~~~
dsies
Yeah... we've heard this before. But the wavy stuff makes me feel so ...
_caaaaalm_ :)

~~~
kanobo
It is nice and calming, btw your twitter link at the bottom your site is
broken (it links to
[https://batch.sh/www.twitter.com/batchsh](https://batch.sh/www.twitter.com/batchsh)).

~~~
uzarubin
Fixed!

------
cflyingdutchman
How does bookmarking work/How do I keep track of how far I've read while
replaying from Batch? Will you also index by date? It can take a long time to
replay a lot of data; do you have any numbers on the read rates you support
per topic?

~~~
dsies
Great questions!

> How does bookmarking work/How do I keep track of how far I've read while
> replaying from Batch?

We do not have any bookmarking functionality built (yet) as we currently
expect folks to just tweak their search query. Each one of the events has a
new id attached to it that you can query and reference during search.

> Will you also index by date?

We do! Every event has a microsecond timestamp attached to it.

> It can take a long time to replay a lot of data; do you have any numbers on
> the read rates you support per topic?

We've done some initial replay throughput tests and have been able to reach
~10k/s outbound via HTTP - of course, this is all _highly_ dependent on where
you're located. We expect that for folks who need super high throughput, we'll
probably need to be closer to them - we fully expect to have to peer with some
of our customers and optimize for throughput by doing gRPC and ... batching :)

So far, we've done _most_ of our testing on inbound and we are currently able
to sustain ~50k/s (with ~5KB event size). Our inbound is able to scale
horizontally and so can go waaaaay beyond 50k/s if needed.

We have _a ton_ of service instrumentation so we've got good visibility around
throughput (and thus should know well in advance as to when we're starting to
hit limits).

------
yamrzou
Congrats on the launch!

Two questions:

\- If I have some data in Kafka, why would I want to pump it into your
platform instead of spawning an Elasticsearch instance and using something
like Kafka Connect to write to it and gain visibility?

\- If I use Kafka as a permanent data store (with infinite retention), I can
easily replay all events with existing clients (or with plumber). What
additional functionality does the "replay" feature offer compared to that?

~~~
dsies
Hey there!

> \- If I have some data in Kafka, why would I want to pump it into your
> platform instead of spawning an Elasticsearch instance and using something
> like Kafka Connect to write to it and gain visibility?

To avoid having to build, own and maintain the infra you just mentioned. As
the number of events on your system increase, you will have to scale ES and
other pieces of the system as well.

Our point is just that - if you know what's involved in collecting and
indexing the events - that is awesome but maybe you shouldn't have to spend
time building the infra around that stuff.

> If I use Kafka as a permanent data store (with infinite retention), I can
> easily replay all events with existing clients (or with plumber). What
> additional functionality does the "replay" feature offer compared to that?

I think it depends on your definition of "easily replay" \- a kafka replay for
a topic that's being consumed by a consumer group would require you to
disconnect that consumer group and then run a shell script to move the
offsets. You also would not have any way to replay any specific messages -
your only point of reference would be an offset (and keyname, if you use it) -
not terribly flexible.

With Batch, you get to drill in and replay the _exact_ messages you want (and
avoid having to pump and dump potentially millions of messages your consumer
doesn't care about).

~~~
yamrzou
Makes sense, thanks for the clarification!

------
treis
If I'm writing all my messages to durable storage why not work off the durable
storage? I'm definitely not an expert in this area so perhaps I'm missing
something. My logic is that if you're paying the resource cost to write all
your messages why not pay the resource cost to read/write back there?

~~~
dsies
I think you're asking why don't we just be the hosted kafka/rabbitmq/etc and
offer all of this stuff in one place. (let me know if that's wrong).

That's a totally legit point - we've talked about offering it all in-house
before but it would require us to split our efforts into two - operating a
PaaS (for a bunch of different messaging tech) and running the event
collection platform.

Operating the PaaS part would be a full-time effort and there's a lot of
competition out there. We've decided to focus on the observability/replay part
first (since there is a lot less competition) and then later maybe explore the
hosted bus option.

LMK if that's not what you meant :)

~~~
treis
>I think you're asking why don't we just be the hosted kafka/rabbitmq/etc and
offer all of this stuff in one place.

The other way around. If I'm not storing my messages today it's probably
because it is too expensive in terms of storage or compute to do so. But,
presumably, you can't do that any cheaper than I can. And now we are
duplicating the work so even more resources are being consumed making it that
much more expensive than just doing it myself.

It seems like your service is something I'd want to run pointed towards my
Kafka/RabbitMQ/whatever servers. I don't see how duplicating that stream is
cost effective.

~~~
dsies
Ahh gotcha. If you need event introspection, doing it in-house is extremely
likely to be more expensive (and definitely time consuming) than offloading
it.

For example: if you are sending serialized data on your bus - you will need to
write something that will deserialize it before inserting it into your elastic
search cluster - and now you're managing even more infra (message systems,
decoders, document storage).

There is definitely a price attached to the luxury - but we're betting that
it'll be _significantly_ less than doing it yourself.

------
danenania
This looks interesting! A couple questions (that may also apply to event
sourcing more generally):

\- How do you handle events with side effects (sending emails, for example),
and ensuring they aren't triggered on replay when they shouldn't be?

\- How do you handle randomness, like uuid generation?

~~~
dsies
> How do you handle events with side effects (sending emails, for example),
> and ensuring they aren't triggered on replay when they shouldn't be?

Someone else already addressed this, but to paraphrase: your application
should be able to deal with duplicate events (and gracefully handle side-
effects).

> How do you handle randomness, like uuid generation?

Are you referring to id generation and tagging in events (ie. aggregate id's)?
If so, that'd be an application responsibility - you'd have to determine how
to properly attach id's.

Hmm. But that does bring up an interesting idea - what if we provided a way to
"group" events and generate aggregate id's on your behalf. Maybe that's what
you meant - it's an interesting idea.

We currently don't do anything "extra" in regards to grouping events - we tag
each individual event but that's about it.

------
james_s_tayler
I feel like a tagline like "event sourcing made easy" would hook me more and
get me interested in _attempting_ to decipher your marketing page to
understand the USP.

Pretty cool idea though. Hope it pans out for you guys.

~~~
dsies
Understood. Yeah, the issue here is that the entire space is pretty complex -
it feels like any angle you approach it from, it'll still be complex.

Will try to figure out a way to better communicate what we do.

~~~
mark-ruwt
As you're brainstorming, try to imagine what you would shout in a loud bar
(remember those?) if someone asked you what your company did. It can be a
helpful mental tool to strip things down to the essentials.

Two-sentence pitches are much harder than two-minute pitches.

------
kbyatnal
One of my previous co. used Kafka and hacked something similar together on an
internal Retool dashboard + DynamoDB. This definitely makes a lot of sense!

Will this work with Celery (python) configured with RabbitMQ as the broker?

~~~
newtoyou
It will work on any Rabbit queue as long as you are not using the default
exchange for the queue.

------
shay_ker
Hi Dan/Ustin,

Congrats on the launch. The pain-point makes sense to me. I'm just curious -
what's the big picture for you all? I imagine it must be larger than just
replay.

~~~
uzarubin
Batch is betting that more companies are going to be utilizing event sourcing
in order to scale. We want to be a foundation piece in their data
infrastructure and support their transition into event sourcing by initially
offering replays. We want to be a "One stop shop" for all event sourcing
needs.

~~~
shay_ker
Cool! I don't have much data on how many companies are using events for key
workflows, but I do know that many, many companies would _love_ to replay HTTP
requests!

~~~
uzarubin
That's awesome! We support http and gRPC collection as well. Let us know what
you have in mind.

------
ZephyrBlu
It seems like you're solving quite a complex problem!

I'm curious how long it took you to build this initial product given the
complexity.

YC has a bias for shipping quickly, but my gut instinct is that it would have
taken you a while to build this initial version.

Did it only take a few months, or closer to 8-12+?

~~~
dsies
Ha, great observation!

Yes and no :)

We've been exceptionally lucky to have several of our close friends help us
out with building an MVP (also helps that our friends have serious
experience!). There's a total of 6 of us - 3 people focusing on infra,
frontend and Java connector bits, which allowed myself, Ustin and another dev
to put 100% of our attention on backend services + arch.

That enabled us to knock this out in a few months. Without the assist, it
would probably be closer to your estimate.

Something that may be of interest to some folks: we saved a _significant_
amount of time by not having to run our own k8s - we use EKS, it's very nice.
Also, MSK - not having to run/manage ZK clusters and kafka nodes is a (costly)
privilege haha

~~~
ZephyrBlu
Nice, I'm jealous of the extra help you're getting haha (I'm trying to build
something solo). It sounds like you're already making a lot of progress.

Good luck building out the rest of the product!

------
pepelotas
I've solved the replaying bit before with a brute approach and AWS Athena. It
would ingest all the events from S3, filter the unwanted ones out, and put the
rest in SQS ready for consumption. It was definitely expensive though, not
something you would run often.

~~~
uzarubin
This is definitely a valid approach and it is an expensive path. Athena
charges a hefty amount per query if you have a large dataset. In addition,
this approach won't work if your data is serialized with something like
protobuf.

------
benoittwake
Very interesting tool. You're absolutely right that I wouldn't spend my
innovation tokens on it. Congratulations for your work !

------
Monotonic
Does this have support for Rabbit pub/sub? There's a bit of confusing wording
on the page that makes it unclear.

~~~
dsies
100% - we use Rabbit internally for our own systems so it has first-class
support.

I think maybe we should just list out the messaging systems we support on the
front page, so you don't have to dig through stuff... Good point. Let me know
if you've got any other suggestions.

------
ponker
I hear so much about Kafka, could someone give the two-sentence description of
what it is and who uses it and for what?

~~~
uzarubin
From their website: Kafka is an open-source distributed event streaming
platform.

There are many use cases from piping website activity tracking, metrics, log
aggregation and stream processing. For us, it's a communication layer utilized
by our microservices. An event goes into the stream and any services that
cares about that data will consume it. In other words, it's like an ultra-
resilient, scalable, redis pub-sub with history that runs on the JVM. You can
read more about the use cases here:
[https://kafka.apache.org/uses](https://kafka.apache.org/uses)

edit: Sidenote, Kafka is often waaaaaay overkill - if you need messaging, use
something simpler like Rabbit or NATS or Redis and only use Kafka if you know
why you need it.

~~~
ponker
Thanks. So an event that goes in would be something like, “user logged in,”
and services that care about that data would be...? Sorry still having some
trouble understanding it.

~~~
newtoyou
Pretty much. A good example might be an online store. Let's say one of your
internal services deals with notifying UPS, FedEx, or DHL to pick up and
deliver a package from your warehouse and ship to a customer. You could use
something like Kafka to store messages about delivers which your internal
service will pickup and process and then notify the delivery companies API.

Something like Batch could be helpful in this situation. For example, let's
say a dev makes a deploy that breaks only the FedEx delivery notification or
the FedEx API breaks in a way your were not expecting. Once the issue is fixed
on the dev side or FEDEX side you could use Batch to search for all FEDEX
delivers that were handled improperly during the time frame of the issue. This
way you are not randomly resending messages to all your delivery companies for
an issue that was only related to one vendor.

~~~
ponker
makes sense, thanks. How would this be better than the logs you'd get from
each service?

------
randtrain34
Is Pulsar support on the roadmap?

~~~
uzarubin
We are planning to support as many messaging systems as we can. We will
definitely investigate Pulsar. Going to add it to our feature list and make an
issue on plumber to support introspection on Pulsar. Cheers!

~~~
zok3102
Good to know that Pulsar is on your roadmap. Also, kudos to see user-land
tooling around a common painpoint for teams doing any event processing at
scale.

~~~
uzarubin
Thank you! We felt the pain point while actively trying to build observability
tools in order to debug our messaging systems. We built plumber to standardize
some of our internal tools and then decided to open source it to help others
who are feeling the pain.

------
rswail
Nice concept and interesting, expect a demo request incoming :)

------
Nikhil833032
Congratulations on this release....That is really useful!

~~~
dsies
Thank you very much!

------
LukeEF
'Light beers over IPA' sorry, I'm out.

;)

------
pdubs1
Has anyone ever told you that you're "batch it crazy"?

~~~
dsies
No, but this is now definitely going on a sticker :D

