
Distributed Named Pipes - mhausenblas
http://dnpip.es
======
techdragon
This is an excellent idea... but I'm a little put off by the reference
implementation using DC/OS and Kafka... Its a little bit 'heavy' for a
reference implementation, at least that's my personal preferences.

Id like to see a more 'low tech' version, or in the very least, a plan to
transition from a 'heavy' reference implementation to a 'lighter' one once one
became available since having a concise reference implementation makes porting
and compatibility significantly easier.

~~~
bjt
Yes.

> To try it out yourself, you first need a to install a DC/OS cluster and then
> Apache Kafka...

When I see that, I know the cost of satisfying my curiosity is going to be a
lot higher than I'm willing to pay right now.

~~~
mhausenblas
Fair point, I suppose. Is
[https://dcos.io/docs/1.8/administration/installing/local/](https://dcos.io/docs/1.8/administration/installing/local/)
an option?

------
dln_eintr
The name doesn't really work in this project's favor.

UNIX pipes are stream interfaces, whereas this looks to be message based -
that's a fundamental difference.

Named pipes are uniquely identified in a well-defined local namespace, i.e the
filesystem, whereas this seems to be an abstraction on top of Kafka with
service discovery TBD.

This confused me while reading up on the project.

~~~
cle
How is it a fundamental difference? Unix streams are character messages, so
this is basically a superset of that functionality.

(I don't know much about Unix streams, so correct me if I'm wrong.)

~~~
michaelmior
Generally when a protocol is referred to as message-based it's because there's
some framing around the payload. This can make it impractical as compared to a
streaming protocol that just passes data back in forth. In some real-time
cases, you can't tolerate the latency of waiting to bundle multiple items in a
single payload but you also can't tolerate the overhead of framing many small
messages.

------
linsomniac
The most interesting thing I got from this was the DC/OS link, I hadn't heard
of it before but I think I'll look at it to see how it might fit with our
future direction with/instead of Kubernetes.
[https://dcos.io/](https://dcos.io/)

------
tlrobinson
So... a message queue?

~~~
rohan_
I'm just as confused. What does this have to do with named pipes?

~~~
pdkl95
[http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html...](http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html#mq)

"A message queue works kind of like a FIFO but supports some additional
functionality."

~~~
rohan_
So it's a bare-bones message queue? What's the advantage then? Performance?

------
jsjohnst
Apologies, but what does this provide that Kafka doesn't already?

~~~
Tepix
It uses the file paradigm just like local named pipes. That makes this tech
accessible to pretty much every UNIX command.

~~~
Animats
Not really. Each message has to be an atomic item, not a stream of bytes,
since there can be multiple readers consuming a single queue.

The interface specification has problems. "A pull does not remove a message
from a dnpipes, it merely delivers its content to the consumer." So if the
same consumer pulls the same dnpipe again, does it get the same message? Do
messages ever get removed from dnpipes without a reset? Unclear.

Does pull block, support async completions, or just return an error when no
data is available?

Reset, rather than sending an EOF which passes through the queue, implies that
shutdown is either drastic or requires external coordination to empty the
queue and stop the sending end before the reset.

~~~
mhausenblas
> ... if the same consumer pulls the same dnpipe again, does it get the same
> message

No. Each message is delivered at most once. But good point, need to make that
clearer!

> Do messages ever get removed from dnpipes without a reset.

No. Again, something I need to clarify as it seems.

> Does pull block, support async completions, or just return an error when no
> data is available?

It blocks.

> Reset, rather than sending an EOF which passes through the queue, implies
> that shutdown is either drastic or requires external coordination to empty
> the queue and stop the sending end before the reset.

I don't follow. Reset empties the underlying queue.

In general, since the consumers start to consume not from the beginning of
time (as in Kafka's `--from-beginning`) but wherever they happen to be (that
is, in Kafka terminology from `latest` index) this shouldn't be a problem.

I tried to model as close as possible and as it makes sense after the
semantics of (local) named pipes. I might have fudged up here but I'm not 100%
clear on where :)

~~~
Animats
_> Do messages ever get removed from dnpipes without a reset._

 _No. Again, something I need to clarify as it seems._

So what happens after the system has been running for a while? Why doesn't the
dnpipe system fill up with old messages that will never be read again? If new
subscribers don't see them, and old subscribers have read them, why are they
not removed? It would seem that once every subscriber who subscribed before a
message was sent has received that message, the message is dead and can be
removed. Why keep the history? Did you really mean that?

Also, what happens if one of many subscribers stops making PULL requests?
Maybe it's blocked on something, or hung. Do the queues start to build up?
Does PUSH eventually block?

 _> Reset, rather than sending an EOF which passes through the queue, implies
that shutdown is either drastic or requires external coordination to empty the
queue and stop the sending end before the reset._

 _I don 't follow. Reset empties the underlying queue._

On most queuing systems, when a publisher wants to shut down, they close the
channel's sending end or send and EOF message. When all subscribers have read
up to the EOF, they close their receiving end. The last messages get
processed, and then the subscribers stop. If the only shutdown mechanism is a
reset, that can lose messages not yet received. That's OK if the intent is
just "kill everything and terminate", but not if you need a clean shutdown.

(ROS, the Robot Operating System, has a publish/subscribe system something
like this. It's a soft real time system, so old data is discarded and you
never want to block a publisher. They chose to lose messages if a subscriber
isn't reading often enough. That's appropriate to a robotics use case. It
probably wouldn't be for a containerized web backend. See [1].)

[1]
[https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_patt...](https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern#Message_Delivery_Issues)

~~~
jsjohnst
Exactly my thought when reading his reply!

