
Confluent, a company for Apache Kafka and realtime data - sbilstein
https://www.linkedin.com/pulse/article/20141106180403-2945786-announcing-confluent-a-company-for-apache-kafka-and-realtime-data?trk=hp-feed-article-title
======
bkirwi
Kafka is a Big Deal, and it's great to see it get first-class support.

In the same way that Hadoop is starting to feel outdated, but HDFS doesn't
seem to be going anywhere -- I think we'll see a lot of innovation in stream
processing frameworks in the next few years, but Kafka will just keep on
going.

------
beagle3
Good luck to the Confluent guys!

There's a missing piece in the realtime puzzle, at least one that I haven't
been able to find, for which Kafka is an overkill - perhaps someone here knows
of a solution:

I have tens of system endpoints connected through unreliable (Line-of-
sometimes-occluded-sight, 2G and 3G WWAN, some are in vehicles so connections
are intermittent).

I just want to consistently tail their logs in a bandwidth-efficient,
connection-drop resistant way; and I can't find any standard thing that does
this.

Kafka would fit the bill in general, but would require a lot of work (reading
textual logs into kafka, querying kafka for new stuff across connection,
reading from kafka and writing to text files) - and I'm not sure how well it
deals with dropped connections.

My existing solution is to rsync the log directories (--append, --inplace) as
infrequently as I can from an operational view, which is 1 minute. It is
relatively bandwidth efficient (although could be much better), robust with
respect to connection issues, and generally works.

However, it is less efficient than it could be: if directories have a lot of
files, like /var/log often does, there's a lot of sync overhead. The delay is
1 minute instead of a couple of seconds (which is what you would get with a
simple "tail -f" through a TCP connection), and it doesn't play well with
common log rotation schemes (though that's relatively easy to work around).

Anyone has a better solution, kafkaesque or otherwise?

~~~
mwarkentin
Sounds like a job for logstash: [http://logstash.net](http://logstash.net)

If you don't mind using a 3rd party service, you could look into using
Papertrail, Loggly, etc.

~~~
beagle3
3rd party is not an option. I'll have to look at logstash, thanks.

------
marknadal
It is surprising how long it has taken for the common practice of databases to
be distributed. I'm glad that people are starting to move in that direction,
and open source their work. This is a win for our industry overall.

We're working on a database/cache/messaging system too,
[http://github.com/amark/gun](http://github.com/amark/gun) it is dedicated to
removing the pain I and other Javascript/NodeJS developers had when it came to
managing/debugging databases (devops and sysadmin work is frustrating).

------
capkutay
Good luck to these guys. Its certainly an up and coming area with some overlap
with an existing, mature market. Tibco (the information bus company) is a $4b
company with relatively dated technology...Informatica is another legacy
solution in this area with tons of revenues. Either way, the market is
certainly poised for some disruption

------
joshmn
In case anyone else is wondering why the article is linked to LinkedIn, Jay
Kreps, co-founder/CEO was previously with LinkedIn. LinkedIn is also an
investor of Confluence.

~~~
lern_too_spel
I doubt anybody is wondering that. The very first section is titled "Origin at
LinkedIn."

------
davidjgraph
Sounds somewhat like Confluence, this will get confusing. Already I can see
joshmn has made the mistake in his comment.

------
mlhamel
I'm a bit confused. The more i read the doc, the more i feel it look Just
Another Queue system. I don't really see the difference between Kafka and
let's say RabbitMQ, Celery or also statsd...

~~~
possibilistic
There are differences, the biggest of which is throughput. Kafka can handle
incredible load. The messaging semantics are also a bit different. Here's a
pretty good comparison:

[http://www.quora.com/RabbitMQ-vs-Kafka-which-one-for-
durable...](http://www.quora.com/RabbitMQ-vs-Kafka-which-one-for-durable-
messaging-with-good-query-features)

------
gfodor
Congratulations guys and good luck on this next adventure! How does Samza fit
into this new venture?

~~~
mountaineer
Samza's role in this is a question I had as well. But, certainly Hadoop needs
some polish first.

------
mountaineer
I love the true engineers' launch, with no logo (twitter account is just the
egg).

------
mintplant
Does anyone have a mirror for those of us without LinkedIn accounts?

~~~
bcantoni
Same article is on the new company blog:
[http://blog.confluent.io/2014/11/06/announcing-confluent-
a-c...](http://blog.confluent.io/2014/11/06/announcing-confluent-a-company-
for-apache-kafka-and-real-time-data/)

~~~
mintplant
Thank you!

