
Go Python, Go: Stream Processing for Python - spooneybarger
https://blog.wallaroolabs.com/2017/10/go-python-go-stream-processing-for-python/
======
dajonker
What about the use of Pony? I did not hear about it before but I do like some
of the ideas of the language I read on their website. However, they also state
"Pony is pre-1.0. We regularly have releases that involve breaking changes.
This lack of stability is plenty of reason for many projects to avoid using
Pony." How are you going to deal with that?

~~~
spooneybarger
I personally work at both Wallaroo Labs and am on the Pony core team. A number
of engineers at Wallaroo Labs are committers to Pony. We are active members in
the Pony community and a large driver of many of those breaking changes to
Pony. At least 3/4 of the breaking changes to Pony over the last 9 months came
from us at Wallaroo Labs. From our side it's not particularly problematic to
make the occassional update.

~~~
fermigier
"At least 3/4 of the breaking changes to Pony over the last 9 months came from
us at Wallaroo Labs." -> And you're proud of yourselves ?!?

(Just kidding of course).

~~~
spooneybarger
I'm going to be writing a blog post on this, but you inspired me to touch on
it a bit.

When we went looking at what we should use to implement Wallaroo, one of the
things that appealed to us about Pony was that it was a high-performance actor
based runtime that we could help mold. We gave consideration to writing our
own implementation in C but, we'd still be working on that rather than talking
here on HN now if we had.

Pony has gotten us quite a bit. The runtime was used for a number of
production projects at one of the major banks so we knew that while there
might be some bugs (what code doesn't have them) that we could use it to
jumpstart our process.

I gave a talk about Pony and its relation to Wallaroo where I put out the
figure of 18 months. I think that as a wild ass guess, using the Pony runtime
saved us about 18 months of work to build the foundation we want/need for
Wallaroo.

Speaking as a member of the Wallaroo Labs team, it's really nice to have a
community based project that you can help mold and grow with. It's been a boon
to us as a small development shop in a way that either writing from scratch
ourselves or using an existing more widely used runtime wouldn't be.

Speaking as a Pony core team member, I encourage other companies who think
they could benefit from a high-performance actor based runtime to have a look
at Pony. You could have a large hand in shaping it into the runtime that you
need.

------
harel
While the comments about being pythonic are valid, I think this looks
fantastic and I'll take it for a spin around the block. Are you using this in
your production environment already?

In regards to the 'name' being a function - a class attribute might indeed be
more correct but a function allows for dynamic computation names where its
applicable.

~~~
iterati
So use @property then?

~~~
pryelluw

        __SOME_VAR = "foo"
    
    
        @property
        def some_var(self):
            return self.__SOME_VAR
    
    ?

~~~
TremendousJudge
Yes, just like that. Check out this fantastic talk on being pythonic:
[https://www.youtube.com/watch?v=wf-
BqAjZb8M](https://www.youtube.com/watch?v=wf-BqAjZb8M)

------
wdroz
I like the absence of java-based tools. But IMO the Word count example is
really verbose :
[https://github.com/WallarooLabs/wallaroo/blob/0.1.2/examples...](https://github.com/WallarooLabs/wallaroo/blob/0.1.2/examples/python/word_count/word_count.py)

With Apache Spark, the Word count example is a lot shorter:
[https://github.com/apache/spark/blob/master/examples/src/mai...](https://github.com/apache/spark/blob/master/examples/src/main/python/wordcount.py)

~~~
jasonrhaas
The word count example has... 8 classes. To count words.

------
majewsky
Proof that you can write Java code in any language. [https://steve-
yegge.blogspot.de/2006/03/execution-in-kingdom...](https://steve-
yegge.blogspot.de/2006/03/execution-in-kingdom-of-nouns.html) comes to mind
when reading the snippets.

~~~
spooneybarger
We are in the process of working with folks on a more "pythonic" API. Our
first goal was to get something that would be easy to use out the door then
get feedback on ideas we have for something more idiomatic to the language.
It's an approach we plan to take when we add support for other languages such
as Go and JavaScript over the next few months.

We are looking for folks to help drive the next version of the API. The first
one was done with feedback from a couple of clients who were interested in
using Python and in many ways reflects their tastes. Feedback from a wider
range of Python users is something we are actively soliciting at this point.

 _disclosure: I work at Wallaroo Labs, creators of Wallaroo._

~~~
RHSman2
I'll be in touch. Best email please?

~~~
spooneybarger
Sean@wallaroolabs.com

------
ericfrederich
Question for spooneybarger or anyone else at Walaroo Labs.

How do you see this comparing to something like Dask? Would it compete with
Dask, or be able to somehow work together with it?

Dask seems to let you write idiomatic Python code and not even think about
splitting, joining, etc... and it builds the pipeline automagically by
introspecting the AST.

~~~
spooneybarger
We are actively looking at how we can better handle use cases like Dask
handles. A couple folks we are working with use Dask heavily and are looking
to switch. They are far more expert in Dask than we are so I don't want to
shoot my mouth of. The issues they've general discussed with us are around
performance and scaling problems they have had. We are still actively learning
from them and hope to have a first version to start addressing those use cases
around the end of this year. Ideally middle of Decemeber.

Dask is very "batch" oriented, which as I said above, is something we are in
the process of adding to Wallaroo. Wallaroo is very stream processing
oriented. Wallaroo's strength are working with stateful, event-by-event
applications.

If you were to take word count as an example. Dask would be great if you had a
body of text in files or whatnot that you needed to count. There's a beginning
and end to that task. Count the words in this text. Wallaroo would shine if
you had a stream of never ending text, like twitter's trending topics.

That's a very coarse outline of a couple of differences. While we are working
with clients to help them move off of Dask by adding that functionality to
Dask, I also think that if you wanted to, you could use Wallaroo along side a
more batch oriented system like Dask. Stream processing and batch processing
are complementary. A number of technologies (us included) are looking to unify
them. Why? Well, there's a lot of operational overhead to running a batch
system and a streaming system. A lot of folks would like to run a single
system that works well for both.

I hope that answers your question.

~~~
happy-go-lucky
> Stream processing and batch processing are complimentary

Must be _complementary_ , I think.

~~~
spooneybarger
Indeed. I regularly mix the two up. Thanks. Editing to fix.

------
jimktrains2
I've used the similar Apache beam project before.
[https://beam.apache.org](https://beam.apache.org)

I mainly used their java libraries, but the python binding have been coming
along.

~~~
ccygoogle
We recently checked in streaming support in Beam Python here:
[https://lists.apache.org/thread.html/82f5b5d2ab2ddd3849584f6...](https://lists.apache.org/thread.html/82f5b5d2ab2ddd3849584f6dfe4128cbd551b10cc0864cae6701d3fc@<dev.beam.apache.org>)

You can take a look at and run the streaming wordcount example in Beam:
[https://github.com/apache/beam/blob/master/sdks/python/apach...](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/streaming_wordcount.py)

In addition to the available local execution, Google also offers running Beam
pipelines as a managed service in Cloud Dataflow
([https://cloud.google.com/dataflow/](https://cloud.google.com/dataflow/)).
Python streaming is in private alpha--contact us at dataflow-python-
feedback@google.com if you'd like to try it out.

Note: I work for Google on Apache Beam and Cloud Dataflow.

~~~
jimktrains2
I used Google Cloud Dataflow at a previous job. I really did like it and feel
like Beam is set up pretty well. Thanks!

------
Dowwie
No support for Python 3

~~~
spooneybarger
Python 3 support is on the roadmap for later this year.

In general, our roadmap is determined by what we think is important but also
is heavily influenced by the needs of folks we are working closely with.

There's an almost infinite number of things we could work on so we live to
drive our direction based on the needs of folks we are working with. In the
case of Python, the early users were all Python 2.7 and thus we focused there.
We've recently started working with folks who are looking for Python 3 support
(in particular, 3.6) so we are going to be adding it.

If anyone is interested in adding features, language support etc to Wallaroo,
we'd love to help. You can find us on freenode in the #wallaroo channel or
stop by our user mailing list
([https://groups.io/g/wallaroo](https://groups.io/g/wallaroo)) and we can help
you out.

~~~
lima
Enterprise user here.

Can confirm that we moved all the things to Python 3 (and it was easier than
expected). Especially all the data processing pipelines.

No Python 3 is a deal breaker in 2017.

~~~
spooneybarger
We'll have you covered by the end of the year if you are interested in
checking us out then.

------
buriama8
Could anyone recommend any resources comparing the various stream processing
frameworks? Apache Storm (which I am familiar with), the below mentioned
Apache Beam (of which I just heard for the first time), this new Wallaroo and
any others? Beside their homepages, that is.

~~~
spooneybarger
Stepping outside of my "Wallaroo Labs employee" role for a moment.

Comparisions can be really hard. What's right for one application or project
isn't right for another. I'd be happy to chat over email with anyone
interested in stream processing about the types of applications they are
looking to build, the requirements they have etc.

I get nice use cases and information we can use at Wallaroo Labs to help drive
our product. In return, I will give unbiased feedback on what you should be
looking for to solve a given problem.

My personal email is in my HN profile.

Disclosure:

In case it isn't obvious, I work at Wallaroo Labs, the makers of Wallaroo. I'm
also one of the authors of Storm Applied, Manning's book on Apache Storm.

~~~
fnl
I'd be interested in understanding which design is closest to yours, however.
Flink? Akka Streams? Another?

~~~
spooneybarger
That's complicated and nuanced. I'd be happy to have that conversation over
email.

Sean@wallaroolabs.com

------
w_t_payne
Hahaha.... that's so so so similar to something I've developed for my machine
vision pipelines...

~~~
spooneybarger
Would you be interested in chatting more about that? We are always looking for
use cases we can learn from. My personal email address is in my profile if you
are interested.

------
pknerd
Looks good. Guess this is the first library I heard dealing with streaming in
Python.

~~~
spooneybarger
Thanks!

------
vog
Very interesting library, haven't heard of it yet.

~~~
spooneybarger
Thanks. We've been working hard on Wallaroo. Plenty more improvements to come
but we felt that it was time to get it out there and get feedback to help us
drive the work we do over the next few months.

~~~
ryanschneider
Can I suggest having a Docker image of all the pre-compiled bits available? I
was really interested in trying it out but don't know if I want to invest the
time trying to compile everything for a tool that I don't know a lot about
yet. If setup was just a `docker pull` away I would probably already be going
through the tutorial.

If the 4 terminals required were instead just:

`docker run -d --name wallaroo -v ~/wallaroo-
tutorial/celsius:/srv/application-module -p 0.0.0.0:4000:4000
wallaroolabs/wallaroo-quickstart`

Or something similar I think a lot more people on this thread would be trying
it out right now.

It looks really promising! If I get the spare time I'm definitely interested
enough to give it a whirl.

~~~
spooneybarger
First: thanks!

Re docker:

It's one of the options we are looking at to make it easier to get up and
running.

Any particular reason that docker appeals to you?

We're not sure at this point in time what the best means is.

If there was a docker based QuickStart, what would you expect to be able to do
with it?

Run the first example app? Something more?

------
chrisgd
Very cool, thanks for sharing!

~~~
spooneybarger
Thanks! It's really nice to get positive feedback on something that you've
spent months working on and are still pouring yourself into every day.

