
Stream Processing with Apache Flink - brakmic
http://blog.brakmic.com/stream-processing-with-apache-flink/
======
jamesblonde
Here's a good intro on Flink streaming: [https://dato.com/videos/data-science-
summit-seif-haridi.html](https://dato.com/videos/data-science-summit-seif-
haridi.html)

Summary: advantages are real-time (10s of milliseconds) processing latency,
exactly-once semantics, failure-recovery model based on Chandy Lamport,
backpressure support.

------
DannoHung
Hey, I know you're saying you're not an expert on Flink or Spark, but do you
know if Flink and/or Spark allow you to merge a stream in real-time into a
data set that you can run queries against?

Generating analytics directly off a stream has never seemed that useful to me.
Being able to take all the data in the stream and merge it with previously
collected data, then query that merged data is the sort of thing I need to be
able to do.

~~~
capkutay
"Being able to take all the data in the stream and merge it with previously
collected data, then query that merged data is the sort of thing I need to be
able to do."

Sorry for the shameless plug, but Striim does that with a SQL-like interface.
Joining streams and previously collected, cached data is as easy as writing a
join.

You can try it out here: [http://www.striim.com](http://www.striim.com)

~~~
brakmic
If I'm not mistaken, "striim" neither is an open-source project nor an open-
sourced product.

------
vonnik
Really impressed with Flink and the Data Artisans team. If I had to bet, I'd
say they're the future of streaming.

------
LgWoodenBadger
Curious that there's no comparison with the other stream processing platform
of choice, Apache Storm.

~~~
brakmic
Hi,

Some time ago I wrote an article about Apache Spark:
[http://blog.brakmic.com/data-science-for-losers-
part-3-scala...](http://blog.brakmic.com/data-science-for-losers-part-3-scala-
apache-spark/)

However, I don't think that writing a comparison of such complex technologies
within an introductory article would be of much value to readers. Of course,
it would be easy to create some tabular structure containing "pro/contra"
about Flink and Spark, but this would be way too simplistic, imho.

To be honest, I don't consider myself an expert on Flink and/or Spark so it
would be unfair towards my readers to embed such a comparison within the
article.

But a properly written comparison could be of value for those who have to
decide what kind of processing (batch/streaming/machine learning etc.) they
need to fulfill some specific (that is, "business") tasks.

Kind Regards, Harris

