

Internal Design of Onyx - nickik
https://github.com/MichaelDrogalis/onyx/blob/master/doc/user-guide/internal-design.md

======
sixdimensional
I'm interested in this general approach being implemented in so many newer
engines (those which Onyx apparently competes with) - the distributed pipeline
processing/directed acyclic graph (DAG) approach. So many of these engines
have related components - message queues, central coordinator, etc.

It seems this is the current popular technique (and rising in popularity) to
solving some really long-standing, hard, data processing problems and related
performance concerns - really an extension beyond the progress we made with
distributed MapReduce algorithms and the like in Hadoop.

There are certainly a lot of implementations coming out - why would one choose
Onyx over say, something like Spark? Is it just the compatibility with the
Clojure programming language and the associated, implied benefits of
functional programming paradigms (among others?).

~~~
XPherior
Hello! Michael Drogalis, the developer, here.

I talked a little bit about why I wrote Onyx on the Clojure mailing list:
[https://groups.google.com/d/msg/clojure/OmHzAEfYe9U/33e0a0l3...](https://groups.google.com/d/msg/clojure/OmHzAEfYe9U/33e0a0l3qkQJ)

It's mostly driven by the need to use data structures in places where I didn't
have them - I only had things like macros and functional composition.

~~~
sixdimensional
Thanks for the info, and awesome work on Onyx!

------
bkirwi
Very interesting! I'm working on a similar project at the moment, but I
haven't come across Onyx at all before -- I'll start digging through.

In particular, I'm interested to know if this supports exactly-once messaging
semantics, and if so, what the implementation looks like. Anyone here have
experience with this?

~~~
XPherior
I'm not sure if exactly-once messaging is possible in a distributed system.
Onyx gets close in that it uses transactions to move data across queues
atomically. But code can fail _just before_ a transaction is committed. The
transaction will only be committed once, but the code leading up to the
transaction will be run twice. Is that really exactly-once? IMO, it's not.

~~~
bkirwi
Right: I guess I distinguish between exactly-once _messaging_ and exactly-once
_processing_. it doesn't seem possible to guarantee that the code will be run
exactly once, but you can promise that the outputs are made visible exactly
once... as long as your system can capture the relevant outputs, of course.

It seems like transactions ought to be enough for the latter -- I'll have a
look. Thanks!

------
angersock
I've sent a link to this out to my team as an example of the sort of
documentation and technical writing I enjoy. Go one level up in the directory
and you'll see quite good explanations of deployment practices,
implementation, and high-level architecture stuff.

~~~
XPherior
Thank you. :)

