
Apache Pulsar is an open-source distributed pub-sub messaging system - LinuxBender
https://pulsar.apache.org/
======
addisonj
I just finished rolling out Pulsar to 8 AWS regions with geo-replication.
Messages rates are currently at about 50k msgs/sec but still in the process of
migrating many more applications. We run on top of kubernetes (EKS).

It took about 5 months for our implementation with a chunk of that work mostly
about figuring out how to integrate our internal auth as well as a using
hashicorp vault as a clean automated way to get auth tokens for an AWS IAM
role.

Overall, we are very pleased and the rest of the engineering org is very
excited about it and planning to migrate most of our SQS and Kinesis apps.

Ask me anything in thread and will try and answer questions. At some point we
will do a blog post on our experience.

~~~
bubbleRefuge
Isn't using Kubernetes kind of an anti-pattern due to failover and rebalancing
logic clashing? If Kubernetes is killing and re-starting nodes and the
cluster's brokers are detecting dead brokers and rebalancing partitions as a
result, it seems counterproductive.

~~~
addisonj
This is one of the main benefits of Pulsar is that because state is split
between brokers and bookkeeper and bookkeeper doesn't need re-balanced (due to
it's segment based architecture where you choose new bookies with each new
segment), we really don't have to worry about re-balancing (in general, not
just in case of failover) of storage. It is true that topics map to a single
broker, but generally, Pulsar has _really_ good limits on memory so we don't
see nodes getting killed by limits and we only really see re-scheduling for
real issues.

While there certainly is some aspects you need to be aware of, generally,
Pulsar is much more "cloud native" and maps quiet well to k8s primitives.

------
mattboyle
We tried to adopt this but found the documentation very lacking and a severe
lack of quality client libraries for our language of choice (go).the
"official" one had race conditions in the code as well as "todo" for key
pieces littered throughout. There is another from comcast which is abandoned.
We had a serious discussion about picking up ownership of the library or
writing our own but as a small start up we didnt feel we could do it and still
develop the product. I'll continue to keep an eye on pulsar but for now Kafka
is the clear go to imo. It's well documented, great SAS offerings (confluent)
and tons of books and training courses for it.

~~~
cbartholomew
We provide a SaaS offering of Apache Pulsar in AWS, Azure, and GCP:
[https://kafkaesque.io/](https://kafkaesque.io/)

~~~
jjeaff
Cool name. That's one of those company names that almost seems like someone
thought it would make a good company name first and thought it was so fitting,
they should build a company around it.

~~~
cbartholomew
Thanks!

------
bsaul
Sidenote question :

Are we heading toward a split between apache/java/zookeeper stacks and go/etcd
on the other ? I've seen an issue related to that question on pulsar, and this
got me investigating the distributed KV part of the stack.

It seems by looking at some benchmark that etcd is much more performant than
zookeeper, and that to some people, operating two stacks seems like an
operation maintenance cost a bit too high. Is that a valid concern ?

Also, i've seen that kafka is working on removing the dependency to zookeeper,
is pulsar going to take the same road ?

~~~
nvarsj
It's an interesting observation.

I think that the modern approach to distributed systems is moving towards
golang style microservices and lightweight / simple system design with RPC
communication, reconcile type loops for state reconciliation, and backing CP
databases. I think this is the influence of k8s (and maybe google's approach
to distributed systems).

I will almost certainly get downvoted for this (as I always seem to when I
criticize the JVM), but Apache/JVM style architecture feels REALLY long in the
tooth to me. I think you are committing to an outdated and very expensive
approach to building software if you use anything running on the JVM,
especially Apache based anything. Cassandra is a great example of this - out
of the box it's a terribly performing database that is extremely expensive to
run and tune. Throw enough resources and time at it and you can get it to
acceptable scalability - but running on the JVM which is a huge memory hog
will always make it expensive to run (and even then, you will always get
terrible latency distributions with the JVM's awful GC).

If I was building a business I would run far far away from any JVM based
solution. The only thing it has going for it is momentum. If you need to hire
100s of engineers off the street for a large project, then a JVM based stack
is about your only option unfortunately.

~~~
james-mcelwain
> the JVM's awful GC

This just makes it seem like you are trolling. JVM devs have done more to
advance state of art in this area than any other language. The problem is that
most JVM apps just produce too much garbage, not necessarily that the algo
itself is awful.

Either way, there's no such thing as an optimal GC algorithm, just different
trade-offs depending on your use case. Not everyone cares about latency.

~~~
geodel
> The problem is that most JVM apps just produce too much garbage,

This is strange way of saying that Java the language and most frameworks
around it force apps to generate this much garbage.

~~~
james-mcelwain
No, it's a way of saying that Java != the JVM. The JVM's GC is not "awful"
precisely because so many Java frameworks _are_ awful.

------
alfalfasprout
I keep seeing new message queue solutions pop up over the years and it's just
been my impression at least that this is one area where silicon valley really
is way behind the trading industry.

Reliable pub/sub that supports message rates over 100k/sec (even up to the
millions) has been available for a while now and with a great deal of
efficiency (eg; the Aeron project). The incredible amount of effort to support
complex partitions, extreme fault tolerance (instead of more clever recovery
logic), etc. add a lot of overhead. To the point of talking about "low
latency" overhead in the order of 5ms instead of microseconds or even
nanoseconds as is expected in trading.

Worse, many startups try to adopt these technologies where their message rates
are miniscule. To give you some context, even two beefy machines with an older
message queue solution like ZeroMQ can tolerate throughput in excess of what
most companies produce.

This is not to discredit the authors of Pulsar or Kafka at all... but it's
just a concerning trend where easy to use horizontally scalable message queues
are being deployed everywhere. Similar to how everyone was running hadoop a
few years back even when the data fit in memory.

~~~
tylertreat
ZeroMQ is not a message queue, it's a networking library.

~~~
injinj
I'll bet he is aware. The problem with the trading industry is that they have
hundreds of users with bespoke solutions catering to extreme performance
criteria rather than hundreds of thousands like NATS. They will keep
reinventing the wheel every time a nanosecond can be saved by the latest
hardware stack.

------
zackmorris
This looks promising. Is there such thing as a generalized SQL query engine
that runs over any key-value store that provides certain minimal core
operations?

For example, say you have a KV Store with basic mathematical Set operations
like GET, SET, UNION, INTERSECT, EXCEPT, etc. The Engine would parse the SQL
and then call the low-level KV Store Set operations, returning the result or
updating KV pairs. This explains how Join relates to Set operations:

[https://blog.jooq.org/2015/10/06/you-probably-dont-use-
sql-i...](https://blog.jooq.org/2015/10/06/you-probably-dont-use-sql-
intersect-or-except-often-enough/)

Another thing I'd like is if KV stores exposed a general purpose functional
programming language (maybe a LISP or a minimal stack-based language like
PostScript) for running the same SQL Set operations without ugly syntax. I
don't know the exact name for this. But if we had that, then we could build
our own distributed databases, similar to Firebase but with a SQL interface as
well, from KV stores like Pulsar. I'm thinking something similar to RethinkDB
but with a more distinct/open separation of layers.

The hard part would be around transactions and row locking. A slightly related
question is if anyone has ever made a lock-free KV store with Set operations
using something like atomic compare-and-swap (CAS) operations. There might be
a way to leave requests "open" until the CAS has been fully committed. Not
sure if this applies to ledger/log based databases since the transaction might
already be deterministic as long as the servers have exact copies of the same
query log.

Edit: I wrote this thinking of something like Redis, but maybe Pulsar is only
the message component and not a store. So the layering might look like:
[Pulsar][KV Store (like Redis)][minimal Set operations][SQL query engine].

~~~
atombender
One of the challenges with layering SQL on top of a KV store is query
performance.

The most obvious way to model a secondary index on top of a pure KV store is
to map indexed values to keys. For example, given the (rowID, name) tuples
(123, "Bob"), (345, "Jane"), (234, "Zack"), you can store these as keys:

    
    
      name:Bob:123
      name:Jane:345
      name:Zack:234
    

At this point you don't need or even want values, so this is effectively a
sorted set.

Now you can easily find the rowID of Jane by doing a key scan for
"name:Jane:", which should be efficient in a KV store that supports key range
scans. You can do prefix searches this way ("name:Jane" finds all keys
starting with "Jane"), as well as ordinal constraints ("age > 32", which
requires that the age index is encoded to something like:

    
    
      age:Bob:\x00\x00\x00\x20:123
    

To perform an union ("name = 'Bob' OR name = 'Jane'"), you simply do multiple
range scans, performing a merge sort-ish union operation as you go. To perform
an intersection ("name = 'Bob' AND age > 10"), you find the starting point for
all the terms and use that as the key range, then do the merge sort.

This is what TiDB and FoundationDB's record layers do, which both have a
strict separation between the stateless database layer and the stateful KV
layer.

The performance bottleneck will be the network layer. Your range scan
operations will be streaming a lot of data from the KV store to the SQL layer,
and potentially you'll be reading a lot of data that is discarded by higher-
level query layers. This is why TiKV has "co-processor" logic in the KV store
that knows how to do things like filter; when TiDB plays your query, it pushes
some query operators down to TiKV itself for performance.

Unfortunately, this is not possible with FoundationDB. This is why
FoundationDB's authors recommend you co-locate FDB with your application on
the same machine. But since FDB key ranges are distributed, there's no way to
actually bring the query code close to the data (as far as I know!).

I'm sure you could do something similiar with Redis and Lua scripting, i.e.
building query operators as Lua scripts that worked on sorted sets. I wouldn't
trust Redis as a primary data store, but it can be a fast secondary index.

~~~
ryanworl
FoundationDB's client bindings have a locality API which allows you to query
the client's metadata cache of which key ranges are on which storage
processes. This would allow you to build that feature of routing a query to
the data.

~~~
atombender
I didn't know that. Very cool, thanks!

------
samzer
For a moment I thought Bajaj and TVS came together.

~~~
lonesword
Captain here.

TVS and Bajaj are major motorbike manufacturers in India, and TVS had a model
named "Apache" and Bajaj had a model named "Pulsar".

Flies away

~~~
opendomain
Thank you for the explanation.

------
mbostleman
This might be entirely off topic, but I'm having issues using RabbitMQ whereby
durability suffers because messages are sent to remote hosts thus exposing
them to both the network and remote host availability. On a previous platform
I used an MSMQ based system which didn't have this problem since it uses a
local store and forward service. So all sends are to localhost and are not
affected by the network or the receiver availability. The MSMQ system was my
first and only experience with messaging up to now, so I was surprised that
any system would not work that way. How is this dealt with in other systems?
Is it just a feature that exists or not and you just decide if it's important?
And maybe just to shoe horn it to be on topic, does Pulsar use a local
service?

~~~
manigandham
That's an inherent issue with distributed solutions and is impossible to
solve. The only way to deal with it is using various techniques like
acknowledgements, retries, local storage, idempotency, etc. MSMQ handles that
stuff behind the scenes but the problem itself will always exist if there's a
network boundary.

These other systems are designed to be remote with a network interface. You
can use the client drivers to handle acknowledgements/retries/local-buffering
in your own app or use something like Logstash [1], FluentD [2], or Vector [3]
for message forwarding if you want a local agent to send to. You might have to
wire up several connectors since none of them forward directly to Pulsar
today.

Also RabbitMQ is absolute crap. There are better options for every scenario so
I advise using something else like Redis, NATS, Kafka, or Pulsar.

1\.
[https://www.elastic.co/products/logstash](https://www.elastic.co/products/logstash)

2\. [https://www.fluentd.org/](https://www.fluentd.org/)

3\. [https://vector.dev/](https://vector.dev/)

~~~
GordonS
> Also RabbitMQ is absolute crap. There are better options for every scenario
> so I advise using something else like Redis, NATS, Kafka, or Pulsar

Yikes, that's a bit harsh! I've been using RabbitMQ on multiple projects for
several years, and I think it's a great pub/sub system. It also has a large
userbase and community around it, as well as lots of plugins available.

I've never heard of using Redis for pub/sub - were you suggesting rolling your
own on top of Redis? I'm not familiar with NATS (but it's been mentioned
several times here, so I will definitely learn more!), but Kafka is a
streaming log/event system, not a pub/sub system like RabbitMQ (different use
cases).

I will say though that if something is wrong in your RabbitMQ config, the
stack traces that Erlang produces when it crashes are a _nightmare_ to
decipher!

~~~
manigandham
Just because it's widely used doesn't mean it's good (see PHP). It's slow with
single-threaded topics that usually max out around 25k msgs/sec, fragile with
dropped connections, stalled queues, corrupted data, terrible clustering that
breaks often and doesn't support sharding, has a silly HIPE mode where you can
choose to compile the Erlang code for more performance which turns startup
time into minutes, etc.

It's poor architecture and implementation. One of the worst products built
with Erlang. It's hard to work on (for new contributors) while not really
using any of the natural advantages of the language.

Redis supports pub/sub channels. It's very fast and if you already use it then
it saves running another system.

Streaming log vs pub/sub is a fuzzy distinction and basically has no
difference at this point. Publishers send to a topic and consumers listen. You
can do the ephemeral pub/sub in Kafka by reducing retention to seconds, using
a single partition per pub/sub topic, and having consumers listen from latest
without a consumer group. Or use Pulsar which does both much more naturally.

Also I didn't mention but there are great commercial messaging products like
AMPS [1] or Solace [2] if you need much more advanced features and support.

1\. [https://www.crankuptheamps.com/](https://www.crankuptheamps.com/)

2\. [https://solace.com/](https://solace.com/)

~~~
GordonS
I didn't k ow about Redis pub/sub, I'll check it out! With a quick glance, I'm
not sure it supports message durability (persisting messages to storage)?

IME, _any_ Erlang system is difficult for noobs to contribute to; Erlang's
syntax alone is... frightening :)

I really haven't hit any of the issues you mentioned with rabbit. Several
clusters in production for years have been rock solid, connections are stable
(and when they do drop, the client libraries, of which there are a multitude,
can handle reconnection), throughput is excellent, and I've never once seen
data corruption (and combined we must have processed many billions of
messages).

The whole HIPE debacle I can at least agree on though! (I seem to recall it's
deprecated now, but I might have dreamed that)

~~~
manigandham
Redis pub/sub is ephemeral. Send to a topic and any active listeners will
receive.

If you need persistence then Redis v5 has a new data type called Streams which
is similar to Kafka/Pulsar. Push messages to a stream and read with multiple
consumers (using consumer groups) that can ack each msg to track read state.

------
adkafka
I really enjoyed the talk given by Quentin Adam & Steven Le Roux at Devoxx
2019. It gave a great overview of Apache Pulsar. Hope someone else can find it
useful too!

[0]
[https://www.youtube.com/watch?v=De6avNyQUMw](https://www.youtube.com/watch?v=De6avNyQUMw)

------
firstposterone
Splunk just acquired streamlio and most of the core devs got sucked up. While
pulsar is a great product - are you not concerned that these guys are getting
paid $$ bank to do something else now?

~~~
matteomerli
spoiler: we're still working on Pulsar

~~~
firstposterone
That’s good news! I guess it would be very helpful to formally address this
concern. Is there something that has been written / published to that effect?

------
eerrt
How does this compare to Redis Pub-Sub or RabbitMQ?

~~~
sz4kerto
Very different. Pulsar is primarily a Kafka competitor.

\- it is much more performant than RabbitMQ \- it's a commit log as well, not
just a pub-sub system, ie. it is a good candidate as the storage backend for
event sourcing \- it supports geodistributed and tiered storage (eg. some data
on NVMe drives, some on a coldline storage) \- it's persistent, not in-memory
(primarily)

.. and so on.

~~~
EGreg
What about ZeroMQ?

Why use RabbitMQ and Kafka if you can use ZeroMQ? Meaning, isn’t it far more
performant and distributed?

Maybe I am missing something here.

~~~
sz4kerto
> Why use RabbitMQ and Kafka if you can use ZeroMQ?

They are totally different, you're comparing apples with oranges.

ZeroMQ gives you basic, very fast tooling to communicate between distributed
processes. ZeroMQ does not provide tooling for e.g. maintaining a strictly
ordered, multi-terabyte event log. And so on.

~~~
EGreg
Yes but isn’t this a bit like comparing git / bitkeeper vs subversion /
perforce?

Basically, one is decentralized and you can set up a massively parallel
architecture, with eg each topic or subthread having its own pubsub.

The other is a monolithic centralized pubsub architecture.

You could argue that git in large institutional projects converges to a
monolithic repo so at that point it’s less efficient even than svn.

But for most use cases, ZeroMQ would allow far more flexible distributed
systems topologies and solutions. No?

Edit: HN and Google are both awesome:
[https://news.ycombinator.com/item?id=9634925](https://news.ycombinator.com/item?id=9634925)

~~~
ryeguy
Zeromq is just a bit of sugar on top of tcp sockets. It isn't a message queue
or anything close. You would be wasting a ton of time reimplementing a lot of
basic features like retries, persistence, service discovery, dead letter
queues, priority, and a ton of other stuff.

------
reggieband
I'm still on the fence with these distributed log/queue hybrids. From a
theoretical perspective it seems these are excellent. I just have this nagging
suspicion that there is some even-worse problem architectures based on these
systems will harbor. This kind of ambivalence is something I find myself
having to battle more and more in my career as I age. Most of the time the
hype around new design/development patterns leads to a worse situation. Very
rarely it leads to a significant improvement. I dislike that my first
impression looking at a system like this is risk aversion.

~~~
zomglings
Your risk aversion seems justified. It seems reasonable to estimate that very
few teams are in the position of needing the kind of scale/scalability that
something like Apache Pulsar offers. They are much more likely to be in either
a state where they will not put Pulsar through its paces or where they already
have a solution in place that serves their scale/scalability needs.

When a team you are on starts discussing switching over to a technology like
Pulsar because of its amazing benefits, unless your pants are on fire, it is
much more likely than not that you do not stand to gain much from the benefits
that such software brings but you are accepting the maintenance burden that it
represents.

------
throwawaysea
A lot is said or referenced in this conversation about why people chose Pulsar
over Kafka. I'm not an expert in this area but are there use cases where Kafka
is still better?

~~~
EdwardDiego
As someone with a few years of Kafka and the ecosystem under my belt, but no
experience of using Pulsar in anger, the areas I can see where Pulsar is
behind are mainly ancillary, and will likely be caught up by the community
given a year or two.

Kafka Streaming - Pulsar functions don't intend to (by the looks of it)
provide all of the functionality available in Kafka Streaming. E.g., joining
streams, treating a stream as a table (a log with a key can be treated as a
table), aggregating over a window. They seem to be more focused on do X or Y
to a given record. That said, you don't need Kafka Streaming for that, other
streaming products like Spark Streaming can do it also (although last I
checked, Spark Structured Streaming still had some limitations compared to
Kafka Streaming - can't do multiple aggregations on a stream etc.) A use case
I have and love Kafka Streaming's KTables for is enriching/denormalising
transaction records on their way through the pipeline.

Kafka Connect - Pulsar IO will get there with time, but currently KC has a lot
more connectors - for example, Pulsar IO's Debezium source is nice, (Debezium
was built to use Kafka Connect, but can run embedded), but you may not want to
publish an entire change event stream onto a topic, you might just want a view
of a given database table available - so KC's JDBC connector is a lot more
flexible in that regard, and Pulsar IO currently doesn't have a JDBC source.
It also looks like its JDBC sink only supports MySQL and SQLite (according to
the docs) - KC's JDBC connector as a sink has a wider range of support for
DBs, and can take advantage of functionality like Postgres 9.5+s UPSERT.
Likewise, there's no S3 sources or sinks - the tiered storage Pulsar offers is
really nice, but you may only want to persist a particular topic to S3.

KSQL - KSQL lets your BAs etc. write SQL queries for streams. That said, I do
like Pulsar SQL's ability to query your stored log data. When I've needed to
do this with Kafka, I've had to consume the data with a Spark job, which adds
overhead for troubleshooting.

So yeah, that's the main areas I can see, but it's really a function of time
until Pulsar or community code develops similar features.

The only other major difference I can see is that at the current time, it's
comprised of three distributed systems (Pulsar brokers, Zookeeper, Bookkeeper)
which is one more distributed system to maintain with all the fun that
entails.

That said, I'll be keeping my eye on this, and trialling it when I get some
spare time, as I've found that people will inevitably use Kafka like a
messaging queue, and that is a bit clunky. Plus I'm a little over having
people ask me how many partitions they need :D

~~~
mac01021
Do they usually provide similar throughput on the same hardware?

~~~
EdwardDiego
I couldn't say tbh, I'm keen in running a trial with Pulsar alongside Kafka in
production, might write it up when I've done so.

------
microcolonel
I'm sure Pulsar is worth it if you use most of what they're offering, but the
Java client library is crusty, throws exceptions for control flow. I'm looking
at a persistence mechanism built on top of NATS to replace it. The NATS layer
would make it simpler to decouple the gateways from the persistence layer, and
support our bulk computing needs.

------
dikei
From reading their documents, I really like the design of Pulsar. However,
Kafka has been working so well for us and has much better integration with
other components of our stack (Flink, Spark, NiFi, etc) that there's no
compelling reason to switch.

I think Pulse should really focus on the integration with the rest of the
Apache stack if they want to gain traction.

------
breckcs
Being able to scale the durable-storage layer independently has a lot of
advantages. More thoughts here:
[https://twitter.com/breckcs/status/1203736751681896449](https://twitter.com/breckcs/status/1203736751681896449).

------
eatonphil
Most of the comments are just pro-Pulsar but what's the architectural trade-
off? (Non-architectural trade-off is that Pulsar is a new system to learn for
folks familiar with maintaining and using Kafka.)

~~~
manigandham
Pulsar is better designed than Kafka in every way with the main trade-off
being more moving pieces. That's why the recommended deployment is Kubernetes
which can manage all that complexity for you.

Pulsar also lacks in the size of the community and ecosystem where Kafka has
much more available.

------
bovermyer
How does this compare with NATS?

~~~
sqreept
NATS is a simpler PUB/SUB system that delivers in the UNIX spirit of small
composable parts. Apache Pulsar or Apache Kafka deliver the banana, the ape
holding it and the rest of the jungle.

~~~
tylertreat
Check out Liftbridge ([https://liftbridge.io](https://liftbridge.io)) as a way
to add these capabilities to NATS.

~~~
omegabravo
your FAQ still says it's not production ready? Is this still the case, I've
been keeping my eye on this project

~~~
tylertreat
It's getting very close. I had wanted to make a production-ready 1.0 release
before the end of the year, but we're in the process of switching from
protobuf to flatbuffers. Once that is complete, a stable release will be made.

------
barbarbar
How is it compared to kafka?

~~~
SkyRocknRoll
Most of the flaws of Kafka are carefully studied and fixed in Apache pulsar. I
have written a blog about why we went ahead with pulsar
[https://medium.com/@yuvarajl/why-nutanix-beam-went-ahead-
wit...](https://medium.com/@yuvarajl/why-nutanix-beam-went-ahead-with-apache-
pulsar-instead-of-apache-kafka-1415f592dbbb)

~~~
progval
> when consumers are lagging behind, producer throughput falls off a cliff
> because lagging consumers introduce random reads

I am confused by this. The format of Kafka's log files is designed to allow
reading and sending to clients directly using sendfile, in sequential reads of
batches of messages.
[http://kafka.apache.org/documentation/#maximizingefficiency](http://kafka.apache.org/documentation/#maximizingefficiency)

~~~
manigandham
Kafka brokers handle connections to consumers and data storage. This creates
contention as the primaries for each partition have to service the traffic and
handle IO. Consumers that aren't tailing the stream will cause slowdowns
because Kafka has to seek to that offset from files which aren't cached in
RAM.

Pulsar separates storage into a different layer (powered by Apache Bookkeeper)
which allows consumers to read directly from multiple nodes. There's much more
IO throughput available to handle consumers picking up anywhere in the stream.

------
manigandham
This is one of the best overviews on Pulsar with comparisons to Kafka:
[https://jack-vanlightly.com/blog/2018/10/2/understanding-
how...](https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-
pulsar-works)

------
mickster99
Is there a reason you went with Pulsar over Kafka? How is the pulsar
community? Where are you turning when you have support issues?

------
srameshc
This is great !! What would be the easiest way to run a 3 node cluster ?

~~~
ckdarby
The standalone mode will let you get started as a developer. You grab tar.gz,
uncompress, run standalone.sh.

There are helm charts for running an actual cluster.

------
jsilence
Would Pulsar be suitable for IoT messaging? An alternative to mqtt?

------
js4ever
"high-level APIs for Java, C++, Python and GO", no love for Node.js? :(

~~~
tyingq
It exists: [https://pulsar.apache.org/docs/en/next/client-libraries-
node...](https://pulsar.apache.org/docs/en/next/client-libraries-node/)

~~~
c0brac0bra
However it currently lacks the ability to listen for messages and run an event
handler when one comes in: [https://github.com/apache/pulsar-client-
node/pull/56](https://github.com/apache/pulsar-client-node/pull/56)

You have to manually call ".receive()" to attempt to receive a message.

~~~
gperinazzo
Using `.receive()` will occupy a worker thread from node until it returns.
Having multiple consumers waiting on receive will clog up the worker
threadpool, preventing anything that uses it from running. If you want to use
the consumer right now, I would suggest always using a timeout on the receive
call, and waiting between timed-out calls to receive. This is extremely
important if you have multiple consumers.

------
buboard
Why is apache developing all those servers that are only useful to a handful
of companies that are rich enough to build them themselves? How about building
something that individuals can use, like, i dunno, apache server itself?

~~~
Eikon
If you did bother to read the linked page, you would have understood that it’s
a yahoo project handed over to Apache for management like many of Apache’s
projects.

~~~
buboard
yeah i m talking more generally about their full list here:
[https://www.apache.org/](https://www.apache.org/)

~~~
mindw0rk
There are a lot of projects that were handed to Apache to manage. Kafka for
example was initially created by LinkedIn. So yeah, you are right, big corps
are actually creating those tools, and in addition to this, giving it away as
open source to public.

~~~
eronwright
Best part is that the tools are put into production before being open-sourced.
In other words, they actually work.

