
Why we wrote our Kafka Client in Pony - hackmanytrades
https://blog.wallaroolabs.com/2018/01/why-we-wrote-our-kafka-client-in-pony/
======
krylon
Man, I really hope Pony grows a decent library ecosystem _fast_! The language
itself is sooo nice, the type system is pure bliss (at least this side of
Haskell), and I even managed to get beyond the first peak of "I have no clue
what is going on here" to using the type system sensibly.

I do not care about scalability so much (I do not complain, either), but the
language and especially the type system was an eye-opener for me. Having a
guarantee - proof! - that entire classes of bugs that haunt software written
in C, C++, Java will be impossible to smuggle past the compiler sounds like a
realistic approximation to the four-dimensional compiler I once envisioned,
that would retroactively turn any and all runtime errors into compile time
errors (without creating a paradox, of course!).

~~~
reilly3000
4D compiler? That sounds amazing. I imagine that could be extended with a
Kafka stream of errors from staging and production. I guess that would require
a ‘system aware’ compiler.

------
royjacobs
What was the reason for not contributing a "reactive" API to the existing C
client?

Especially because now you have a maintenance burden on your own client which,
ironically, performs quite a lot slower than the C client although that was
the major concern for not using it in the first place!

~~~
hackmanytrades
That's a great question.

Yes, Pony Kafka is currently slower than the C client. But it is also almost
completely untuned as of right now. We expect there is a lot of low hanging
fruit on that front that will give us significant gains.

There is also the secondary concern regarding the thread pools internal to
Pony and librdkafka. We've seen first hand how CPU cache invalidation can
impact performance so we are very aware of the potential negatives if the Pony
and librdkafka threads ever end up fighting with each other over the same CPU
resources.

~~~
royjacobs
Sorry, that was my question-- was there a way to avoid using librdkafka's
threadpool and essentially use it as a 'dumb' client, moving all the async
stuff to your Pony actor layer?

~~~
hackmanytrades
From what I understand of librdkafka, there's no easy way to disable the
internal thread pool it uses.

I'd imagine that the internal thread pool for sending/receiving data from
Kafka is as core to librdkafka as the internal thread pool for running actors
is to Pony and trying to remove or disable either of them would be a large
undertaking.

------
manigandham
> Pony Kafka sends data to Kafka about 5% - 10% slower than librdkafka but
> reads data from Kafka about 75% slower than librdkafka.

What? So what's the point? Wouldn't it better to just contribute and optimize
the existing clients then?

The Kafka/confluent team specifically chose to implement everything in
librdkafka (because kafka is client-side logic heavy) and then make thin
wrappers for every language so performance, bugs and stability can all be
worked on in a single place.

~~~
dmix
That was for what I'd imagine is v0.1 quality code, they said there were
plenty of optimizations available to make it at least parity with the C
client. Secondly, the issue was the system performance between Wallaro and the
Kafka client, not necessarily the raw performance of the library.

~~~
hackmanytrades
Yes, that's correct on both counts.

Pony Kafka is almost completely untuned as of right now. We expect there is a
lot of low hanging fruit on that front that will give us significant gains.

And yes, we're concerned about the potential thread pool contention between
Pony and librdkafka.

------
littlestymaar
Pony is such a cool project, it's what Go could have been if it wasn't stuck
in the middle of the eighties : it has a super expressive type-system, and a
bit like in Rust, it gives you the data-race freedom.

I really wish it gains traction.

~~~
Avshalom
(this has nthing to do with wallaroo or kafka but:) go isn't stuck in the 80s
it just stuck on Rob Pike's belief that him personally being lazy and jury
rigging in general is "simple" and a virtue.

~~~
acdha
I don’t think that making criticisms personal like that accomplished anything
useful. The fact that a large number of other people agree with his decisions
suggests that it’s not “being lazy” and more balancing different goals. It
would be far more productive to understand those pressures rather than simply
assuming the worst.

~~~
Avshalom
Any criticism of any project Rob Pike has been attached to or Pike himself has
over the last 40 years accomplished nothing no matter if they are technical or
personal.

------
hackmanytrades
Hi, I'm the author of the blog post and the primary author of Pony Kafka. I'll
be around to answer questions.

~~~
cs702
_" Why we wrote our Kafka Client in Pony (wallaroolabs.com)"_

I love the fact that the words "Kafka," "Pony," and "Wallaroo" are all
together in one sentence, and not as part of an elaborate joke. I love even
more the fact that Serious Business Executives will have to use these words to
discuss useful technology. Awesome names.

~~~
hackmanytrades
Just one of the many perks of working here at Wallaroo Labs. 8*P

------
lmm
Kafka is written in Scala, though it exposes a Java interface for ease of use.

~~~
hackmanytrades
Thanks! Blog post corrected.

------
princess-aslaug
Ehm, pick a niche programming language, build your client for a complex
system. All cool, but it's not the straight line to solve problems in a
company in the simplest and most effective way. Good thing if you have some
spare R&D time maybe but hardly a sane approach for most.

~~~
fnord77
The investors must be thrilled.

------
mac01021
As someone who has worked on building a custom consumer API for kafka (in a
distant past, before the 9.0 Java consumer came out) I am curious about how
much engineering effort this required.

I can see from github that the project is about 16k lines of code. I wonder
how many developers worked on it, how much of each of their time it required,
how many false starts in the architecture of the library had to be
abandoned...

~~~
hackmanytrades
99% of Pony Kafka is written by me (for better or worse). I've been working on
it since about May 2017 off and on with it being my primary focus for the
majority of that time. However, due to working arrangements and other
commitments, I've only spent about 12 or so weeks of time on it (where 1 week
is equal to 5 days and 1 day is equal to 8 hours).

There have been a few iterations on the abstractions and API of the library
but the majority of the architecture has been the same from the initial design
sketch. I started by envisioning the features the library needed for end users
and also internally in order to fully take advantage of Pony's actor
concurrency model. From there I worked out the data and functionality
ownership of the various bits (i.e. which actor does what and why). Lastly, I
ran it by a couple of folks here at Wallaroo Labs to make sure I wasn't making
any obvious mistakes.

The biggest change so far has been caused by building in the leader failover
handling in relation to the data/responsibility ownership transferring from
one actor to another. That's not entirely completed yet but it has mostly been
an internal change. The end user API has also changed, but that has mostly
been about fixing abstractions and/or data ownership issues.

I'm sure there will be additional changes as I have time to go through to fix
abstractions and add in other features. Dynamic configuration changes, exactly
once semantics, and group consumer functionality are all likely to impact the
end user API along with requiring internal changes.

------
ramchip
Out of curiosity, what pushed your team towards Pony rather than Erlang? It
seems your team (or at least part of it) has experience with both languages
which is interesting.

~~~
spooneybarger
Short answer is in another thread here.

After discussing our performance goals with folks we knew at Basho, they
expressed a lot of skepticism that we could meet them with Erlang.

How to plug-in potentially heavy user computations in non-Beam based languages
is an incredibly tricky problem as well.

If you'd be interested in chatting more, I'm happy to that. See:
[https://news.ycombinator.com/item?id=16266220](https://news.ycombinator.com/item?id=16266220)
for more information.

------
rurban
Dipon, why not rewriting Kafka itself in Pony? The biggest problem is the
synchronous API, and as pony service it would be much better, being controlled
by async actors, without polling. Scala is very similar to pony.

------
lobster_johnson
I'm not overly familiar with Pony, but I'm curious, and the code looks nice
and clean. One oddity though; so many of the identifiers have "Kafka" in them.
Does Pony not have module namespacing?

~~~
spooneybarger
It does.

But the point you raise...

"Should this be HTTPLogger or Logger given that it is in the HTTP package"

and variations thereof is something that has been a point of contention at
almost every job I've been at.

by default with Pony if you use a package, you'll have the classes imported
directory into your namespace so...

HTTPLogger is more clear in that case, but you could use a qualified import
and then have something like http.Logger.

It's a matter of preference.

~~~
lobster_johnson
I understand, but the sheer amount of duplication is rather overwhelming.
Also, a lot of it seems like implementation details related to the
API/protocol and so on that don't need this kind of naming uniqueness.

Go solves this by never dumping namespaces into another namespace: You have
http.Request, and that's it, which is both unambiguous and self-explanatory.
Name clashes can occur (e.g. packages have the same name, or a local variable
has the same name), but that's rare.

------
StreamBright
I really like why we wrote x in y sort of blog posts. It is almost always like
this: because we could or because for our use case it works.

~~~
sidlls
It's actually almost always a post-hoc justification for scratching an itch.

~~~
StreamBright
Exactly. Pony looks interesting though. I am wondering how is it comparing to
Erlang. Seems like they are trying to address the same problem (actor model
with memory safety) with different approaches.

------
DerBesserWisser
TL;DR

Because it's more fun to learn a new programming language than deliver
features to users. We're VC financed so no need to get money, also spend it
before people talk about profitability, then the good times are over (see
Etsy). We also can add Pony to our CV and move on in one year to the next
company where we will introduce the next big thing to add to our CVs. Plus 10%
more salary! Kaching! #LivingTheLife

~~~
klibertp
You jealous?

Also, picking a right tool for the job can be a real advantage, which makes it
easier to deliver features.

Getting so close to C implementation (in terms of speed) with Pony is actually
insane if you look at the number of guarantees Pony gives you. Next time you
dereference a NULL pointer please remember that it's impossible in Pony. Oh,
and next time you spend a week debugging some hairy locking issue, consider
that issue wouldn't happen in Pony at all. EDIT3: removed EDIT1 from here.

Currently, Pony is in direct competition with Go (but uses the other
concurrency model) and Erlang/Elixir (but is natively compiled). People and
companies frequently choose Go or Erlang, so I don't really understand why
they shouldn't choose Pony if their use-case fits.

EDIT2: And here I am getting downvoted... I wonder, is anything I wrote not
true?

~~~
jetti
>so I don't really understand why they shouldn't choose Pony if their use-case
fits.

The difference between Pony and Go or Erlang (or even Elixir) is that the Pony
team still are making breaking changes to the language. That means that your
dev team may need to spend time to update features due to breaking changes in
the language. Also, the ecosystem isn't there like it is for Go or Erlang.

~~~
klibertp
Yeah, but both Go and Elixir (Erlang less so - commercial and internal PLs
work a bit differently) were in the same situation at some point: very small
ecosystem, small community, lots of changes to the language. Adopting a
language at this stage of evolution has _a set of very well-known risks_ , but
it has to be done by someone for the language to ever reach maturity. Trying
to use it seriously is one of the best ways to contribute to the language.

In any case, if you are aware of the risks and plan to mitigate them - by, for
example, employing people capable of debugging and fixing the language's
implementation - you're left with some risk and a lot of advantage (if you're
lucky and your domain is indeed the one your language is best suited for).
It's a gamble, of course, but then nearly every decision (other than buying
IBM) is one.

