
Fluentd: a high performance unified logging layer - edsiper2
http://www.linux.com/news/enterprise/high-performance/147-high-performance/847237-fluentd-a-high-performance-unified-logging-layer
======
marcusmartins
Heka by Mozilla is another alternative -
[http://hekad.readthedocs.org/en/latest/](http://hekad.readthedocs.org/en/latest/).
I have been running in production to ship docker logs to our Elasticsearch
cluster.

~~~
jedisct1
Heka's really nice, and the lua sandbox makes it easy to write new codecs.

We (OVH) recently tried Heka to push syslog data into Kafka, but it eventually
kept crashing under load. Other alternatives were way to slow for our needs.

So we ended up writing a simple tool, not as flexible as Heka but about 10
times faster
[https://github.com/jedisct1/flowgger](https://github.com/jedisct1/flowgger)

------
cwyers
> We will go through the installation process, basic setup, listen to events
> through the HTTP interface, and look at a simple use case of storing HTTP
> events into a MongoDB database.

...and I'm out.

~~~
kiyoto
One of the maintainers of Fluentd here. Care to elaborate?

~~~
wyaeld
step1: build useful tool for serious people

step2: mention mongo

step3: enjoy having the room to yourself

:-P

~~~
kiyoto
I'm not here to really defend MongoDB, but the amazing thing about MongoDB is
that, despite all the vitriol against it, it continues to be used at companies
and projects that are far greater than what many naysayers ever touch: Stripe
and Wish.com immediately come to my mind.

Also, I am giving MongoDB the benefit of the doubt per Curt Monash's law of
databases:

Rule 1. It takes at least 7 years to build a database

Rule 2. You are not an exception to Rule 1.

~~~
ploxiln
I worked at bitly for a couple of years, we used mongodb for one of our
secondary datastores. We had the expertise to keep it working, but we really
hated it.

It's this funny situation where some engineer starts off using it because it's
very convenient to run, and maybe the data model is convenient. You scale it
up a bit, debug it, tune it, stuff you'd have to do with any system. At the
point when you're sure mongo really sucks, it's too late. The business and
even technical management side always prioritizes some other project over
replacing it, even though you spend a surprising amount of time each month
maintaining the mongodb cluster (performance degredation that compaction
doesn't completely fix, and other stuff...).

I get the sense that with 3.0.3 or so, mongodb is a real database now. But
it's been years of pain and false advertising. I'd still always vote against
it. (Even though at my current place some people started using it for a
service... :(

------
riquito
I'm just starting to adopt fluentd but I'm scared by the fact that the
"official" drivers have different interfaces and unclear leadership.

e.g.

[https://github.com/fluent/fluent-logger-
php](https://github.com/fluent/fluent-logger-php)
[https://github.com/fluent/fluent-logger-
python](https://github.com/fluent/fluent-logger-python)

and a bit of drama

[https://github.com/fluent/fluent-logger-
php/issues/36](https://github.com/fluent/fluent-logger-php/issues/36)

~~~
edsiper2
note: I am one of Fluentd maintainers.

There is nothing to be scared:

\- Fluentd is an enterprise solution already adopted by thousands of users.

\- Fluentd is sponsored and made by Treasure Data[0] where we collect around
800k events per second.

\- You have to make a difference between what is official and what is third
party. Fluentd have more than 300 plugins and is likely that you will find
some differences on how each extension is used, but at the end everything is
compatible. We make sure to maintain a clean list[1] of functional and
maintained extensions.

\- We lead the project and we invite you to reach us anytime through our
mailing list or other communication channel[2]

reach us anytime :)

[0] [http://www.teasuredata.com](http://www.teasuredata.com)

[1] [http://www.fluentd.org/plugins](http://www.fluentd.org/plugins)

[2] [http://www.fluentd.org/community](http://www.fluentd.org/community)

------
annnnd
> Its built-in reliability through memory and file-based buffering to prevent
> inter-node data loss have...

I wonder if we can expect "Call me maybe - fluentd" from Aphyr soon. ;)

------
Sanddancer
This feels like they are trying very hard to avoid calling their syslog a
syslog, probably with good reason. A lot of distributions provide a very
minimalist build of whichever syslog daemon they've decided to use, and as
such, people get the idea that syslogs can't write to databases, or parse
json, or listen on pipes or any other number of things modern syslogs can do.

~~~
kiyoto
>This feels like they are trying very hard to avoid calling their syslog a
syslog

I _wish_ rsyslog and their friends were really that easy to extend, and I say
this as a maintainer of Fluentd. Fluentd came about precisely because syslog
family of data collectors fall short in certain ways:

1\. tag-based data routing: as you get more and more data sources, it becomes
very important to keep track of what goes where. In my view, this is one of
the key reasons Fluentd is used at many companies, and why Kafka has become
popular on the message queue side (topic-based stream modeling)

2\. Extensibility: afaik, it's not all that intuitive for most programmers and
sysadmins to extend and add new inputs, outputs, filters, etc. for rsyslog
and/or syslog-ng. Admittedly, this is a subjective point, but looking at both
Logstash and Fluentd's vast lists of plugins [1][2], I feel justified to make
this claim.

3\. Configurable transport logic: I've never met anyone who is happy with
syslog's buffering and/or failovers. Because we've heard so much about this
particular problem, when we were building Fluentd (...4 years ago), we took
extra care to make buffering and failover easy to configure and extensible.

Happy to answer more questions =D

[1] [https://www.fluentd.org/plugins/all](https://www.fluentd.org/plugins/all)

[2] [https://github.com/logstash-plugins](https://github.com/logstash-plugins)

------
vpeters25
For years I've been considering coding a tamper proof logger, something where
each entry has a hash that depends on the entry's log and the hash of the
previous entry. This could help detect potential system compromise.

I haven't really taken the time to look for a logger with such a feature, it
would be nice to know if fluentd has something like this.

~~~
ploxiln
systemd's "journald" does this, it's called "Sealing", and it's enabled by
default.

I turn it off on my stand-alone systems, it's a waste of processing on them
IMHO (particularly if you don't backup the sealing key)

------
jedisct1
I've been using Fluentd for years, and it's a super useful tool.

It initially had some memory leaks that prevented us to use it in production,
but it's now very stable. Writing new input/output plugins is extremely
simple. And yes, it's written in Ruby, but give it a spin before judging; it's
fast enough for most needs.

------
IMTDb
Is there any benefit/disadvantage between fluentd and logstash ? I am not
using either one, but I'll need to centralize my logs soon. My understanding
tells me that these are two very similar projects, but I might be wrong.

~~~
jamespo
They are similar but fluentd has cleaner conf IMO.

Interesting that graylog (an endpoint for fluentd) are providing their own log
collector now.

------
kolev
Isn't Fluentd in Ruby though? It's 2015 and we need something like this in Go
[0] [1] or Rust [2].

[0] Heka: [https://hekad.readthedocs.org/](https://hekad.readthedocs.org/)

[1] Chainsd:
[https://github.com/mikeszltd/chainsd](https://github.com/mikeszltd/chainsd)

[2] Flogger:
[https://github.com/jedisct1/flowgger](https://github.com/jedisct1/flowgger)

~~~
allengeorge
Why does the language it's written in matter? If it has the features you want,
the reliability you want, and performs well given the load you're applying -
that should be enough, no? It's not like we're embedding it into an app we're
writing.

~~~
102030485868
Well, in terms of maintenance it's a little bit more work.

Sure, it has the features and reliability. But does that really justify having
to maintain a completely new environment? Maybe it does, maybe it doesn't; it
depends.

No you may not be embedding it in your app, but it's now a part of your stack.
You'll need to keep an eye on updates, etc for a completely different
environment.

Plus there are other factors, like approved languages. Certain companies only
allow using languages X and Y. Don't even think about language Z. I had to re-
write an 80loc Python script to a much larger Perl one because I just didn't
grok Perl all too well at the time. It didn't matter that Python was installed
on the system. All that mattered was that Perl was approved and Python was
not.

~~~
allengeorge
Valid points.

I'd like to address the "completely new environment" issue though. That should
only be considered if you're thinking of owning and modifying the component.
There are many software packages that you're going to use as-is, and interact
with only via an API or CLI. In those cases the depth of the API/CLI, the
quality of its documentation and the strength of its community matter way more
than implementation language. (For example - who cares if nginx is written in
C although your entire stack is Python or Java?)

If you're thinking of continually making changes in a package however - that's
a different matter.

------
elcct
I started building something of similar concept a while ago. But had to pause
the development for some time. It is written in Go, so much easier to install
etc. It has plugins for Hadoop, Mongo, RabbitMQ, File and stdout :) Can take
data from tail, HTTP, RabbitMQ, heartbeat and other chains.

[https://github.com/mikeszltd/chainsd](https://github.com/mikeszltd/chainsd)

~~~
jedisct1
"much easier to install etc."

Fluentd is extremely easy to install. They provide packages with everything
you need; you don't even have to install Ruby beforehand.

~~~
elcct
Sure, that is not the best point, but I like to have as little stuff as
possible installed and polluting servers with Ruby doesn't feel right to me.
Of course this is just a personal taste.

------
core0
We'd love to use a ruby-based solution like this, but the docs say it will
lose data whenever the receiving end crashes. Any plans to fix that?

The way it was described in the docs gave me the impression there is no
acknowledgement of network writes - if that's true won't even clean shutdowns
lose data sometimes?

~~~
frsyuki
ACK of network transfer is available ("require_ack_response" option). This
option ends up choice of at-most-once semantics vs. at-least-once semantics.
You need to choose and you can choose.

Fuentd provides "buffer_type file" to buffer records on disk. Shutting down
won't loose data. If you need to choose memory buffer for performance reasons,
fluentd enables "flush_at_shutdown" option by default.

You would also want to use <secondary> feature. This lets you to write a
buffer chunk to another storage if the primary destination is not available
"retry_limit" times.

Those concerns would be solved by the document:
[http://docs.fluentd.org/articles/out_forward#buffered-
output...](http://docs.fluentd.org/articles/out_forward#buffered-output-
parameters)

~~~
core0
Ah, thanks - require_ack_response sounds like what I was missing. Some blogs
are from before this was added in 0.12 so I didn't know about it.

I am still interested in forwarder failure cases - I have replied to kiyoto's
comment talking about the HA docs, which still talk about some other cases
that can lose messages.

In this case:

* The process dies immediately after receiving the events, but before writing them into the buffer.

Is it possible to require acknowledgement that the log event has been written
to the buffer? Is that separate to what require_ack_response does?

