
Show HN: Vector – A High-Performance Log and Metric Router Written in Rust - zhs
https://github.com/timberio/vector
======
reacharavindh
I was dragging my feet to build a log shipper solution. I was going to use
Filebeat -> ElasticSearch -> Kibana.

This looks great. My primary attraction is possibly low memory footprint of
this program over Filebeat. Secondary attraction is how easy it appears to
enable transformations.

Now, if I can make a suggestion for your next/additional project..... A neat
system metric collector in Rust that can export to Prometheus with same
principles.

Low memory footprint,

Rust,

Single binary,

Customizable with a single config file without spending hours in manuals,

Stdin, Stderr -> transform -> Prometheus.

I’m learning Rust and eventually plan to build such a solution but I think a
lot of this project can be repurposed for what I asked much faster than
building a new one.

Cheers on this open source project. I will contribute whatever I can. Thanks!!

~~~
lukes386
Thank you! Very glad it looks useful to you.

It's still slightly rough around the edges, but Vector can actually ingest
metrics today in addition to deriving metrics from log events. We have a
source component that speaks the statsd protocol which can then feed into our
prometheus sink. We're planning to add more metrics-focused sources and sinks
in the future (e.g. graphite, datadog, etc), so check back soon!

~~~
wikibob
Just a question, are you familiar with work that's been done on the OpenCensus
Collector and Agent [0]?

There was discussion earlier this year about creating a design doc for
OpenCensus to handle logs. I'm not sure if that got finished, or it was
sidelined while the OpenCensus & OpenTracing merger was worked on. Both
projects will combine under the OpenTelemetry name.

I've been quite happy with the OpenCensus instrumentation SDKs.

I think the Logs & Metrics space is interesting, especially because there is
so much overlap, both are just ways of representing data about an event that
occurred in the software.

OpenCensus is fairly widely backed: Google, Microsoft, Etsy, Scalyr...[1]

[0] [https://github.com/census-instrumentation/opencensus-
service...](https://github.com/census-instrumentation/opencensus-
service/blob/master/DESIGN.md)

[1]
[https://opencensus.io/community/users/](https://opencensus.io/community/users/)

~~~
lukes386
I've looked into OpenCensus/OpenTracing/OpenTelemetry (and the apparently
unaffiliated OpenMetrics?) a bit, but I'm not as familiar as I'd like to be.
It does seem like they're focused primarily on application-level
instrumentation and the ability to ship that metrics and tracing data to
different backends.

Vector's perspective is that your applications and infrastructure are already
emitting all kinds of interesting data via logs, metrics, etc, and the primary
challenge is to collect, enrich, and manage the storage of that data. We have
no plans to integrate Vector into your application or introduce some kind of
Vector-specific method of exporting data.

We'll definitely be watching OpenTelemetry as it moves forward and would very
much like to be a compatible part of that ecosystem. To the degree that they
use common open standards for their communication protocols, that could just
fall out naturally.

------
kalkin
Seems similar to Veneur (like many other projects mentioned in comments here;
didn't realize this space was so crowded!) - down to the first two letters of
the name: [https://github.com/stripe/veneur](https://github.com/stripe/veneur)

Veneur is more metrics-focused, but might offer inspiration as you work on
metrics support in Vector - in particular the SSF source, internal
aggregation, and Datadog and SignalFX sinks.

~~~
lukes386
Absolutely, Veneur is something we looked at quite a bit when it popped up.
It's clear Stripe was feeling a lot of the same pain points we were when we
started building Vector and they've come up with something really impressive.

As you mentioned, it seems they've focused more on metrics out of the gate,
while we've spent more of our time on the logging side of things (for now).
We're working to catch up on metrics functionality, but interoperability via
SSF is an interesting idea!

------
jhgg
We use a rather bespoke syslog -> clickhouse log sink
([https://github.com/discordapp/punt/tree/clickhouse](https://github.com/discordapp/punt/tree/clickhouse))
we wrote in house because logstash (and then subsequently elastic starch) was
too slow. Would love to switch off of it and to this! Hopefully a clickhouse
sink comes soon! Maybe will contribute one upstream!

~~~
reacharavindh
Out of curiosity, could you tell us a little more about your log analysis
workflow? Once they are in Clickhouse, how do you visualise/search/analyse
your logs? What is your equivalent of Kibana?

~~~
jhgg
We do rollups into bigquery where we have a bunch of dashboards to look at
stuff historically.

I did really like Kibana, ultimately, we had to ditch it (because of ditching
ES). Of course, this was a good thing, as I more than once degraded ingest the
ES cluster by just using Kibana to do some aggressive filtering. Clickhouse
handles these without problem.

I think a more complete world view may be to pipe logs into kafka, and ingest
them into Clickhouse/Druid for different types of analysis/rollups.

Our current logging volume exceeds ~10b log lines per day now. Clickhouse
handles this ingest almost too well (we have 3 16 core nodes that sit at 5%
CPU). This is down from a... 20ish node ES cluster that basically set pegged
on CPU... and our log volume then was ~1b/day.

For more ad-hoc, we just use the clickhouse-cli to query the dataset directly.
We are tangentially investigating using superset with it.

~~~
reacharavindh
Thanks for the response. Lot of tips to go research about for me.

I was mentally debating between trying to find a schema for our logs, and
store them in a database where it can be queried efficiently from

Vs

Throwing logs into ELasticSearch in a lazy way and let it index the whole
thing to enable us do full-text search on logs. But, with a limitation of only
have a few days worth of data in ES indexes.

Kibana’s visualisation is what is holding ES up for me. I will look into
superset+Clickhouse to see if I can come up with a good analysis front for our
log data.

------
SwellJoe
Just a heads up: There are several figures in your docs where the entirety of
the useful information on the page is in the image and they don't have alt
tags or any accessible way to get the information (that I can find anyway).
e.g. [https://docs.vector.dev/use-cases/security-and-
compliance](https://docs.vector.dev/use-cases/security-and-compliance)

------
kevsim
I really appreciate the comparison table. Very rarely are things totally new
and novel so it’s very nice to know what other things it’s _like_

------
nickserv
Could this replace a simple fluentd setup right now or are there still major
functionalities missing?

Specifically, I'm ingesting nginx logs in JSON format, cleaning up invalid
UTF8 bytes (usually sent in the forwarded-for header to exploit security
vulnerabilities), and sending to elasticsearch on an automated 90 day
retention policy (daily indexes).

Seems like a fairly common use case for webservers.

------
rishiloyola
Well logstash is now supporting good persistent queue.
[https://www.elastic.co/guide/en/logstash/current/persistent-...](https://www.elastic.co/guide/en/logstash/current/persistent-
queues.html#durability-persistent-queues)

I don't know why author didn't put correctness tick mark on it.

~~~
binarylogic
It's very likely we're doing something wrong with this test, but after many
hours of trying we couldn't get our simple test to pass for Logstash, even
though it passed for others:

[https://github.com/timberio/vector-test-
harness/tree/master/...](https://github.com/timberio/vector-test-
harness/tree/master/cases/disk_buffer_persistence_correctness#results)

Definitely open to feedback on what we're doing wrong.

~~~
butteroverflow
This is totally off topic, but holy crap, first message in 11 years. I would
have lost my password about a dozen times by now.

~~~
itamarhaber
That answer was definitely worth the wait.

(there's a "Forgot Password" link somewhere here).

[do you always look at poster's activity? sounds time consuming.]

------
dandigangi
Freaking A... Rust performance is nuts when done right. Very cool.

------
dm03514
How does this compare to telegraf?

[https://github.com/influxdata/telegraf](https://github.com/influxdata/telegraf)

Biggest thing that pops out to me is LUA engine (seems amazing :) )

~~~
binarylogic
Telegraf is nicely done. We spent a lot of time testing solutions in our test
harness ([https://github.com/timberio/vector-test-
harness](https://github.com/timberio/vector-test-harness)) and Telegraf was
the most impressive of the tools we tested, so kudos to the Influx team on
that.

But to answer your question, telegraf is very heavily metrics focused, and
their logging support appears to be limited (reducing logs to metrics only).
Vector is _currently_ focused on logging with an eye towards metrics, but
still has work to do on the metrics front.

For example, we opened the door with the `log_to_metric` transform
([https://docs.vector.dev/usage/configuration/transforms/log_t...](https://docs.vector.dev/usage/configuration/transforms/log_to_metric))
to ensure our data model supports metrics, but we still have a lot of work to
do when it comes to metrics as a whole. Our end goal is to eventually replace
telegraf and be a single, open, vendor neutral solution for both logs and
metrics.

Happy to clarify further :)

------
envolt
Off Topic -

Does either of Filebeat or Logstash support config hot reload, as mentioned in
the Vector's doc?
[https://docs.vector.dev/usage/administration/reloading](https://docs.vector.dev/usage/administration/reloading)

Edit - Found It -
[https://www.elastic.co/guide/en/logstash/current/reloading-c...](https://www.elastic.co/guide/en/logstash/current/reloading-
config.html)

~~~
binarylogic
They do, we actually put together a correctness test for this behavior:

[https://github.com/timberio/vector-test-
harness/tree/master/...](https://github.com/timberio/vector-test-
harness/tree/master/cases/sighup_correctness)

Logstash is not graceful. Our testing shows that they basically shut it down
and start it again.

------
nh2
For clarification, does "mib/s" mean "Mbit/s" (since lowercase b usually
stands for bits, and uppercase B usually for Bytes)?

If yes, how comes log processing runs at only so low throughput in general?

That is not to talk down your achievements (as per your benchmark page, you do
better than similar projects in terms of throughput), but I'm genuinely
curious why modern machines that have 40 Gbit/s memory bandwidth are capped at
(in your case) 76.7Mbit/s. What's the bottleneck?

~~~
amanzi
The capitalisation is confusing, but "Mi" means "mebi" \- either Mib for
mebibits or MiB for mebibytes. The correct term for 1024 * 1024 bits is a Mib,
and, 1024 x 1024 x8 bits is a MiB.

~~~
nh2
That is correct, but it doesn't answer the question whether it's Bytes or bits
(which makes an 8x difference).

Given that the reported values don't care about the "m" (which means "milli"
\-- clearly doesn't make sense for bytes), I don't think we can rely on the
casing of the "b" to tell us the answer.

------
zackkitzmiller
Congrats to the Timber Team on this one. Ben and Zach were a couple of former
colleagues on mine.

------
asprouse
What is a high level example of why I'd use this?

~~~
lukes386
Hi! I work on Vector. For a motivating example, let's say you have an
application fronted by nginx. Using Vector would allow you to ingest your
nginx logs off disk, parse them, expose status code and response time
distributions to prometheus, and store the parsed logs as JSON on S3.

There are obviously plenty of ways to accomplish that same thing today, but we
believe Vector is somewhat unique in allowing you to do it with one tool,
without touching your application code or nginx config, and with enough
performance to handle serious workloads. And Vector is far from done! There's
a ton more we're working to add moving forward (thinking about observability
data from an ETL and stream processing perspective should give you a rough
idea).

~~~
geodel
Our company uses Splunk. I am not on admin/ops side so possibly missing
details. The way I understand is that there is Splunk forwarder running on our
app servers. And then there is Splunk server URL from there I get consolidated
logs in browser where I can search and run many other statistical function.

So is Vector like Splunk forwarder or more than that?

~~~
lukes386
Vector can act as a Splunk forwarder, but is designed to be much more
flexible.

In addition to forwarding to more storage systems (S3, Elasticsearch, syslog,
etc), Vector can do things like sampling logs, parsing them, and aggregating
them into metrics. Depending on your needs, this makes it easier to reduce
your Splunk volume and reduce costs, transition to something like an ELK
stack, etc.

We're also working to build up the metrics side of Vector's capabilities. In a
way, you can think of Vector as a stream processing system for observability
data, capable of feeding into a variety of storage backends.

~~~
geodel
Thanks. This is all very interesting. I should try it on our app servers.

~~~
lukes386
Thanks for your interest! And please feel free to get in touch if you have any
questions or feel there are things we could do to better support your use
case: [https://vector.dev/community/](https://vector.dev/community/)

------
scurvy
Heka, but in Rust instead of Golang?

~~~
tveita
There is also a Heka but in C.

[https://github.com/mozilla-services/hindsight](https://github.com/mozilla-
services/hindsight)

Unfortunately deployment of Hindsight isn't as nice as Heka since you need to
compile it yourself with all the Lua extensions you need, and the
documentation is very disorganized.

Vector looks great on those counts, will be excited to try it if they get
features like reliable Vector-Vector transport and more flexible file
delimiters.

~~~
sciurus
Hindsight isn't exactly heka in C; it's useful for data processing but not as
a general-purpose log shipper. Currently we're using fluentd for the latter.

(Disclosure: I work for Mozilla on the team that runs services used by firefox
users and developers)

------
Dowwie
This is so exciting! An enterprise-grade solution to log workflow. To those
unfamiliar with the Rust ecosystem, this project (Vector) addresses the 'L'
within the ELK stack, and probably more.

------
the_duke
There already are a lot of projects in this space.

While better performance is always great, most are already plenty fast for the
majority of use cases.

The main power comes from the multitude of inputs and outputs. Vector has a
lot of catching up to do there. But if they manage to offer a noteworthy
performance gain... one more is always a good thing.

PS: the Logstash numbers seem suspiciously low. I'd bet it's some JVM config
issue. Logstash can come to a crawl if it does not have enough memory.

~~~
leetbulb
Yep, I push about 50-100MB/s through a single instance of Logstash (Redis
(list) -> S3). That configuration is not in the benchmark table, but surely
it's more demanding than TCP -> Blackhole, TCP -> TCP, etc.

Regardless, Vector looks very nice and I'll be testing it out :)

~~~
the_duke
They are using the default config with 1GB memory. Sadly that's absolutely
nothing for Logstash.

Reported an issue in their test harness.

~~~
otterley
IMO that's totally fair -- 1GB is more than adequate to run a log forwarder.
The fact that Logstash can't perform well under such conditions is worthy of
consideration when selecting a solution.

~~~
the_duke
It's perfectly valid to mention the higher memory requirements as a
considerable drawback.

But this is far from the possible performance that Logstash offers, so the
comparison is at least misleading.

Also, there is no information about the memory usage of the other contestants,
which probably don't have memory limits if they don't run on the JVM.

I say all this as someone who stopped using Logstash, in part due to the high
requirements - but benchmarks should strive for a fair comparison.

------
peter_l_downs
Congratulations to the Timber team! I've had the pleasure of doing a small
amount of contract work for them, have nothing but praise for them all.

------
gnufx
When people say "high performance" about these things, I wonder how they
compare, for instance, with the Sandia tools.* One things that matters there
is avoiding system noise (jitter) on the monitored systems with transport over
RDMA.

* [http://ovis.ca.sandia.gov/](http://ovis.ca.sandia.gov/)

------
heliostatic
Love Timber.io, so it's great to see some of your innovations coming back out
as open source projects. Thanks!

------
seruman
Flume, but in Rust?

[https://flume.apache.org/](https://flume.apache.org/)

~~~
the_duke
And about 50 other projects in all kinds of languages...

------
warpspin
A pity it does not support at least once or exactly once delivery for the
Vector sink.

Also, the documentation seems to miss information about which sinks support
TLS?

We're currently looking for a distributed-over-the-internet logging setup and
are interested in evaluating Rsyslog/RELP alternatives.

------
rishiloyola
How do you handle back pressure?

Do you have specific module for it?

------
eeZah7Ux
All this bombast and hubris is very unprofessional.

------
li4ick
I wonder how popular posts on hackernews would be if people didn't mention
"written in rust" in the title.

~~~
thenewwazoo
Probably significantly less, but I suspect "written in Rust" is part of what
makes these posts interesting. People have an interest in what you can _do_
with this language they're constantly hearing about (and hearing people rave
about). There are lots of incredible languages that few people are _using_ for
real products. "Written in Rust" is therefore interesting on multiple axes.

