Hacker News new | past | comments | ask | show | jobs | submit login

Kafka wasn't a problem at all. Actually, I was shocked how well kafka just worked. It did everything I expected it to do, exactly as advertised in the documentation. Well, except for how annoying it was to get it installed and working on ubuntu. I will definitely be using kafka in future projects.

I think all my technology choices worked out fine. I dumped server-sent events halfway through in favour of websockets because WS support binary packets. But that was a pretty easy change affecting at most 50 lines of code.

I still wish we had an efficient (native) solution for broadcasting an event sourcing log out to 100k+ browser clients, with catchup ('subscribe from kafka offset X'). Nodejs is handling the load better than I expected it to, but a native code solution would be screaming fast. It should be relatively simple to implement, too. Just, unless my google-fu is failing me I don't think anyone has done it yet.




"an efficient (native) solution for broadcasting an event sourcing log out to 100k+ browser clients, with catchup"

Seems to me like you just described long-polling, which you dismissed in the article as "so 2005".


So for context I wrote a websocket-like TCP implementation on top of long polling a few years ago[1], before websockets were well supported in browsers. I'm quite aware of what long polling can and cannot do.

Yes, I did dismiss it out of hand in the article. The longer response is this:

In this instance long polling would require every request to be terminated at the origin server. I need to terminate at the origin server because every connection will start at a different version. The origin server in this case is running JS, and I don't want to send 100k messages from javascript every second. Performance is good enough, but barely. And with that many objects floating around the garbage collector starts causing mischief.

The logic for that endpoint is really simple - it just subscribes to a kafka topic from a client-requested offset and sends all messages to the client. It would be easy to write in native code, and it would perform great. After the per-client prefix, each message is just broadcast to all clients, so you could probably implement it using some really efficient zero-copy broadcast code.

The other approach is to bunch edits behind constantly-created URLs, and use long-hanging GETs to fetch them. I mentioned that in the blog post, but its not long-polling - there's no poll. Its just old-school long-hanging GETs. I think that would work, but it requires an HTTP request/response for each client, for each update. A native solution using websockets would be better. (From memory WS have a per-frame overhead of only about 4 bytes)

[1] https://github.com/josephg/node-browserchannel


Btw, there's a Kafka docker based install that's great for spinning up Kafka and testing quickly.

Nice work on all this!


I had always crazy problem in setting up Kafka until I discovered https://github.com/Landoop/fast-data-dev




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: