
Grafana Loki – Like Prometheus, but for logs - bovermyer
https://github.com/grafana/loki
======
kureikain
I still remember when Grafana first come out because of InfluxDB. It's amazing
at that time. I try to setup Graphite and eventually found out
Grafana+InfluxDB and go w/ it from there.

Grafana move really fast. Very practical. It first if I remember use
ElasticSearch/InfluxDB to store its config itself. Then move to its own
SQLite. Then MySQL/Postgres. Eventually add alerting. Add multiple data
source.

Nowsday my team even use its with SQL for a cheap version of Periscope/Mode.

Then now they figure out this log thing. ElasticSearch is great but try to run
it yourself. On a 50 nodes k8s, I bet your first ES config will down after
10mins the first time FluentD come up and send all the log under the sun since
the cluster come up

Given that Grafan guys know how to deal with visualization, I trust them to
deliver a great experience again for log.

Look at their history, I'm going to bet hard on this.

~~~
outworlder
> Then now they figure out this log thing. ElasticSearch is great but try to
> run it yourself. On a 50 nodes k8s, I bet your first ES config will down
> after 10mins the first time FluentD come up and send all the log under the
> sun since the cluster come up

Tell me about it. I have not one, but multiple k8s clusters (some over 50
nodes), sending data to a single Elasticsearch cluster. A lot of tuning was
done, and it's not yet perfect.

Elasticsearch is amazing, but it requires external support, and the tools
available to do that are not on par. Curator specifically – it's dumb as a
rock and is very limited on what it can do. Even a hot/warm architecture is a
challenge to do on curator alone, you should ideally have custom scripts to
manage it. Which is a shame as this is one of the "reference" architectures by
Elastic. Whatever Curator does should really be part of ES itself.

The Kibana + ELK combo won't give you things like log tailing (logtrail is
hackish and hard on the servers). The log forwarders are horrible to work with
(be it logstash, filebeat – or worse: fluentd).

Some days it just works and you are happy. Some other days, you wonder why we
moved away from syslog senders and text files...

~~~
kureikain
I actually have same conclusion as you. ES isn't easy to tune and run well.

A thing many overlook are restart it. You cannot just go and `systemctl
restart` it, once that occurs, constant rebalance happenning because it puts
the load to the rest of system and eventually bring them down, all around :(.
That make it harder to operator in K8S when it needs time to detach/attach
volume, plus the overhead of overlay network.

So I myself didn't know about a perfect tunning. But the log search on Kibana
is super helpful though.

That's why I kind of sold on this Loki thing. I have a good feeling that
software in Go tend to require less config/tunning compare with Java. Eg,
InfluxDB.

> Some other days, you wonder why we moved away from syslog senders and text
> files...

I wonder the same. I actually like to grep log with tail more than a crappy
web ui like Kibana, Sumologic etc.

Tailing log in a webui is clunky.

~~~
styfle
> once [a reboot] occurs, constant rebalance happening because it puts the
> load to the rest of system and eventually bring them down, all around

I have this same problem with a 3 node cluster, no Kubernetes. This was using
Elasticsearch 2.3 and it worked for a year or two and then all of a sudden,
any network interruptions would cause the thing to get split brain and all
indices would go red. It would take 8+ hours to go back to green and usually
required a reboot of all servers or else it would never recover.

It was happening often enough that the decision was just run a single ES node
since the total data was only 25GB :(

------
akavel
I'd be interested in replacing an in-house logs filtering/aggregation solution
with Loki, but there's a lot that is not clear to me from the initial
announcement & materials. Could you please help me understand what are the
answers to the following questions?

\- Can I use Loki without Prometheus? I'd like to feed raw logs to it, with
custom-generated metadata. I don't want to have to use Prometheus, nor
InfluxDB.

\- Can I edit (modify) metadata for some old log line after it was already
inserted? Specifically, I need to be able to rebuild metadata later, if I add
some new "filters" to my logs (I want to be able to apply them retroactively).

\- Can I run aggregate queries on the metadata (sum, avg, min/max)? If not,
what can I do with the metadata? Can I graph the metadata on normal Grafana
graphs?

\- Is the text of the logs compressed? If yes, what compression algorithm is
used? If not, why?

\- Where can I find the API docs (or at least the API source code) for Loki?

~~~
gouthamve
1\. Yes, you can. We initially targeted Kubernetes because that's what we use
and packaging for it is simple. We'll soon release packages for all distros,
support syslog/journald etc.

2\. Hmm, please open an issue regarding this with the use-case that prompted
this? This is not a use-case we have but if this is important for more people,
we'd be happy to support it!

3\. You can only select based on the metadata, the metadata is just a set of
kv pairs. Like {app=cassandra, namespace=prod, instance=cassandra-minion-0}

4\. Yes, we used gzip. Please see the sheet referenced at the end of the
design doc [0] to see what we compared with.

5\. It's mainly protobufs right now [1] over HTTP, but we'll be adding more
docs soon. Mind opening an issue for this so that we don't forget?

[0]
[https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3...](https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3-q6vBAsZYIQ5ZeYBkyM/edit)
[1]
[https://github.com/grafana/loki/tree/master/pkg/logproto](https://github.com/grafana/loki/tree/master/pkg/logproto)

~~~
akx
Hm, why is zstd marked as "BROKEN"? zstd would've been my first choice for
something like this.

------
netingle
Author here! Really excited to release this at KubeCon; happy to answer any
questions you might have.

~~~
zellyn
I'm curious about this, because at Square we maintain our own homegrown log
aggregation system, and it's not really a core competency.

While a _lot_ of our logging needs seem like they would be fulfilled by this
system — because we attach trace IDs to log messages, and because (at least in
Payments) you can usually find the appropriate trace ID by searching for a
Payment ID, which could be annotated too — there are definitely many times
I've copy/pasted the text in quotes from a log-generating line of code in a
Java or Go file, to find out if it's being executed, or as a handle into a
subsection of code/logging.

In the linked design doc, you include a motivating tweet near the top, saying,
“just give me log files and grep, I am dying”. But unless I'm misreading
things, there's no `grep` here. Right?

I'm guessing you could narrow down (using metadata) and _then_ grep, but if
the narrowest metadata you have is app name and time range, you're still going
to be grepping over a lot of data…

~~~
netingle
It’s defiantly something that’s missing from the readme, and perhaps not that
obvious in the grafana explore view either - but it is there! You can push a
regexp match server side and have that distributed to each Loki node, giving
you distributed grep.

Will make it more obvious. Davkals has an iteration of the UI that makes it a
separate field, which will also help.

~~~
GordonS
Maybe I'm not understanding this - the docs say Loki is all about storing
compressed log data with metadata, such that only the metadata is indexed. Are
you saying you can search the compressed, unindexed data using regex? If so,
wouldn't that potentially be incredibly slow?

~~~
gouthamve
The good thing about this is that the grepping can be parallelised and
distributed on to several nodes. Having said that, once you select the
relevant metadata right, you should be able to narrow it down enough for the
queries to be snappy enough.

While this will definitely be slower than something that indexes the contents,
you'll be able to store much more in Loki at much lower costs.

~~~
zacmps
What are you using to run the regex? ripgrep could make up for some of the
loss from not having it indexed.

~~~
ecnahc515
[https://github.com/grafana/loki/blob/master/pkg/iter/iterato...](https://github.com/grafana/loki/blob/master/pkg/iter/iterator.go#L252:6)

Looks like the Go regex lib, which isn't super performant, so it could
potentially be improved if it ends up being an issue.

------
gnur
Looks like a good light-weight alternative for setting up a full elasticsearch
cluster for some light log insights.

I've been kind of building something similar myself, but this looks much
better.

The free cloud demo does look it is getting hammered right now ..

~~~
netingle
>Looks like a good light-weight alternative for setting up a full
elasticsearch cluster for some light log insights.

Exactly! Elastic is a really powerful system but I think there is growing
sentiment that its overkill for container logs. This is exactly where we see
Loki really helping - almost complimenting Elastic even.

> I've been kind of building something similar myself, but this looks much
> better.

:blush: thanks! We'd love you input on Loki too...

> The free cloud demo does look it is getting hammered right now ..

Yeah, looks like I'm going to be scaling that all day... Our motivation for
over the free service for the next few months is to really battle harden the
system - anyone sending us data is really helping us iron out the kinks and
improve the open source. Its early days, but expect it to get much better over
the next weeks and months..

~~~
skrebbel
> Elastic is a really powerful system but I think there is growing sentiment
> that its overkill for container logs

I don't know much about kubernetes, but Loki looks super interesting for our
application logs too.

Could someone maybe ELI5 what the difference is between "container logs" and,
well, any other kind of logs? Don't most dockerized application send their
stdout to Docker, and aren't those, then, the container logs? What kind of
logs _aren't_ "container logs" and therefore a better fit for ElasticSearch
than Loki?

Thanks :-)

~~~
gouthamve
I work with Tom on Loki. When we said we wanted to focus on Kubernetes, we
meant we wanted to correlate metrics with logs and focusing on Kubernetes lets
us do that well.

Having said that, this will work for any logs as long as you can tag the logs
meaningfully. We'll soon be releasing packages for all major distros and
journald.

------
retzkek
So this is something like mtail [1], with the ability to also drill down into
the raw logs, and integrated into Grafana? Sounds great!

We're heavy users of Grafana, Graphite, Prometheus, and Elasticsearch, and
while the latter started in a BI role it's expanded to take on pretty much
anything we can throw at it. However, there's still tons of system and service
logs we're not gathering yet, because the effort to get them to Elasticsearch
and store/maintain them is not worth it, especially since the value is not
always clear until well _after_ the pipeline is set up.

I'm definitely excited to see what loki can do for us, and just upgraded one
of our test Grafanas to nightly to start playing with Explore, looks great!

[1] [https://github.com/google/mtail](https://github.com/google/mtail)

~~~
davkal
Grafana Explore author here. It's still in Beta and would love feedback.
Simply open an issue at [0].

What we tried to get right is the seamless switch from a Prometheus to Loki
where it's retaining the labels of the query to essentially find the logs that
come from the same e.g., "job". The assumption is that you need to be
consistent with your relabelling rules of Prometheus and Loki.

[0]
[https://github.com/grafana/grafana/issues/new](https://github.com/grafana/grafana/issues/new)

------
arianvanp
Serious question. Why not just put logs in postgres? Rich query language.
Indexes. Has support for time-series based indexes. Optionally can support
full text search if needed.

~~~
ohthehugemanate
Apart from the query support, which others have mentioned, there's the sheer
scale of data. The most recent time series implementation I worked on had to
ingest terabytes per minute from a wide variety of sources in a single
factory... And the goal was a single system combining every factory worldwide.
Easily measured in exabytes per hour, with thousands of sources. A traditional
relational database - even a really good one like postgres - just isn't built
with that kind of use case in mind. There really is something to a database
specifically designed for a particular kind of query on ridiculous quantities
of data.

~~~
hnaccy
According to my napkin math that's more than the LHC's raw output, do they
really need all that for factories? Seems nuts.

~~~
TylerE
Need? Probably not. But often you don't know what metrics you care about until
something goes wrong - so you over-engineer it and log all the things.

------
kbenson
We have way too many technologies in our industry with names borrowed from
culture.

Before clicking to look at the comments, I totally thought this was about some
interesting creation mythos with a character that taught humanity how to
harness wood from nature to build the first wood dwellings.

I mean, that's totally something that could feasibly get posted to HN and do
well.

~~~
netingle
There are only two hard problems in computer science: cache invalidation and
naming things.

~~~
cjslep
> There are only two hard problems in computer science: cache invalidation,
> naming things, and off by one errors.

That's how I learned the phrase.

~~~
netingle
And Loki has all three ;-)

------
burtonator
We built a system like this similar to this at Datastreamer to store the logs
from our web crawl.

We used Grafana + KairosDB and turned the logs into tags essentially.

Our entire productions system had easy 'taps' that you could annotate to
monitor everything about our crawler.

For example, number of HTTP requests, their status codes, the language of the
content.

We also record intersections of the tags like lang+domain.

The downside of a system like this is that you have to know all your metrics
apriori... If not and you need them at runtime you're out of luck.

The UPSIDE is that you use like 1/100th of the total size you would originally
need for raw logs.

What we found is that you quickly converge on the tags you need and then you
don't end up adding many more.

Everyone says 'disk' is cheap but in our situation our logs outpace the amount
of data we would collect. We'd have 1000s of petabytes of logs by now.

~~~
aarbor989
Is it just me or can no one else find how to actually send the log data to
Loki either?

------
GordonS
From the design doc:

> We will be able to pass on these savings and offer the system at a price a
> few orders of magnitude lower than competitors. For example, GCS cost
> $0.026/GB/month, whereas Loggly costs ~$100/GB/month[0]

Maybe I'm misunderstanding, but I don't see how Loggly costs anything like
$100/GB/month? It seems clear to me that the $99 plan costs $100/GB/ _day_ (or
$100/30GB/month). Not exactly cheap, but nowhere near as expensive as the
design doc reckons.

[0] [https://www.loggly.com/plans-and-pricing/](https://www.loggly.com/plans-
and-pricing/)

~~~
gouthamve
Yes, you're correct. We'll correct the mistake in the doc.

~~~
GordonS
This premise was very far off, and cost effectiveness is key to the design
rationale - do you think this impacts upon the utility and attractiveness of
Loki?

~~~
gouthamve
Yes, we have an internal doc where we compared with more providers (including
loggly) where we got the pricing right. The line currently mentioned in the
doc is a typo.

The rationale still holds and one of the primary reasons we built Loki is to
have a an easy to manage scalable "open-source" solution.

I personally have been told to log less at a previous company because of the
associated costs of logs and I don't think I utilised anything beyond grep
(with a few exceptions). I personally feel the trade-offs are right and a
simple greppable/tailable solution that is cheap is missing from the eco-
system.

~~~
GordonS
A good response, thanks for confirming that you've built this with the correct
pricing in mind.

------
ryeguy_24
Where do people recommend logging infrastructure changes (e.g. database
changes, server configuration) for all things potentially not in code?

~~~
bmurphy1976
We need more context. Server configuration should definitely be in code/source
control these days. Database changes, well, what kind of changes are you
talking about?

~~~
ryeguy_24
These days, as we have multiple departments and services, it’s hard to track
down any issues we identify without knowing what kind of changes have happened
to potentially cause it. As we don’t have a single development platform
(multiple acquisitionss) this is hard for us at least until we have unified
systems (will take time). I’m referring to any infrastructure or operational
change to the stack. For example, deployments/roll outs, scaling servers up or
down, port changes, updates to software on key servers, changes to database
table structure, manual changes to database data. Any changes at all to the
production stack.

------
rusbus
If you're just tailing logs on the server (or with kubectl) but need a bit
more power, angle-grinder [1] is pretty good (and of course doesn't need to be
hosted or anything, it's just a CLI app)

[1] [https://github.com/rcoh/angle-grinder](https://github.com/rcoh/angle-
grinder)

------
webo
Is this the same technology kausal.co guys had? It’s a shame it was just shut
down after the acquisition.

~~~
netingle
This is a continuation of our ideas from Kausal, yes! Really glad we joined
Grafana as they have given us the time and resources to pursue this.

And not shutdown at all! Everything we did at Kausal is now part of Grafana
and Grafana Cloud - David's PromQL-completion UI is in Grafana v6 as the
explore view, and Cortex is the backend for Grafana Cloud's hosted
Prometheus...

~~~
webo
Thanks! What about the distributed tracing feature?

~~~
netingle
Its coming :-)

------
baq
if this is like splunk that doesn't make me cry unicorn blood tears every time
an invoice arrives then i'm all in. unfortunately it doesn't look like it and
honestly i'm not sure what it is. 'prometheus for logs' doesn't quite explain
it.

~~~
gouthamve
Hi please see the design doc for more inspiration as to why we built it:
[https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3...](https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3-q6vBAsZYIQ5ZeYBkyM/edit)

We're soon coming out with a blog post explaining the motivations and
architecture in detail. But yes, the pricing is one of the motivations to
build this, and as logs will be stored in S3 or an object store, the cost will
several orders of magnitudes less.

------
alexk
For Grafana folks: I got all the way through sign up and sending logs from k8s
cluster + setting up logging data source to Loki hosted instance, but could
not find a way to actually explore the logs in the hosted grafana instance :)

~~~
alexk
I figured that you have to click "Explore" next and then see the Logging tab,
although it was not easy for me to find, may be highlight it as a first class
link on the left panel?

------
vdm
[https://www.youtube.com/watch?v=xJSgf835YRE](https://www.youtube.com/watch?v=xJSgf835YRE)

------
davejohnclark
Looks cool, is the source code for promtail available? I can see there's an
image available
[https://hub.docker.com/r/grafana/promtail](https://hub.docker.com/r/grafana/promtail)
(which is used in the readme) but I can't find the source anywhere. Anyone
know where it is?

~~~
netingle
Of course! Everything is opensource, all apache licensed:

[https://github.com/grafana/loki/tree/master/pkg/promtail](https://github.com/grafana/loki/tree/master/pkg/promtail)

~~~
davejohnclark
Cool, thanks. No idea how I failed to find it...

------
victorhooi
This looks similar to Graylog
([https://www.graylog.org/](https://www.graylog.org/))?

From Loki's readme, It still stores the full log (but compressed), and indexes
values in each logline.

~~~
gouthamve
Hi, we don't index the values, but rather we index metadata about the
logstream, for example, service name, instance ip, and things like that and
not the actual contents of the log lines.

Inside kubernetes the metatadata would be podname, namespace, deployement
name, container name, etc...

------
hsnewman
What is the purpose of log aggregation without indexing?

~~~
netingle
Thats a really good question. I feel like its pretty well covered in the
design doc[0], and you should also read about OKlog[1] which really championed
this idea.

To be clear: Loki indexes metadata about the streams, and the streams
themselves are indexed by time. We don't full-text index the streams though.
We think, when combined with metrics, this represents a nice trade off in
terms of complexity vs features. This simplification not only makes Loki
easier to understand, but also scale and operate.

[0]
[https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3...](https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3-q6vBAsZYIQ5ZeYBkyM/view)
[1] [https://github.com/oklog/oklog](https://github.com/oklog/oklog)

------
matryer
Awesome, can't wait to dig into this. Nice work.

------
rane
I ran the curl ... | kubectl command on my cluster.

What would be the reason that promtail is not tailing log files for all of the
running pods?

------
bogomipz
Does this integrate with Prometheus's Alertmanager then for sending alerts
based on certain logging events?

------
monstrado
Is there a way to send logs to Loki without using the tail agent? For example,
a REST API for storing messages?

~~~
netingle
There is a REST API, yes [0]. Currently you send snappy-compressed
protobufs[1] containing yours log to a HTTP endpoint.

We initially started using fluentd as the agent, but we found its metadata
"enrichment" facilities weren't reliable enough - we'd get log lines without
the pod tags, for instance. For something like Loki, which depends really
heavily on the metadata for index, this was super important. So we wrote
promtail.

[0]
[https://github.com/grafana/loki/blob/master/docs/api.md](https://github.com/grafana/loki/blob/master/docs/api.md)
[1]
[https://github.com/grafana/loki/blob/master/pkg/logproto/log...](https://github.com/grafana/loki/blob/master/pkg/logproto/logproto.proto)

~~~
monstrado
Awesome, thanks!

------
nerdbaggy
InfuxDB has this as well [https://www.influxdata.com/blog/writing-logs-
directly-to-inf...](https://www.influxdata.com/blog/writing-logs-directly-to-
influxdb/)

------
nwmcsween
Why not a unixish simple tool to handle metrics? assuming $prog outputs log
info to stdout: $prog | collect | action and 'collect | action' could all be
done somewhat simply with awk and 1000x more portable.

~~~
kryptk
And store it where? And query it how? What will render my NoC dashboards?

With graphite protocol the metric stream is already plain-text with just 3
fields per line (path value time), its pumped over a socket to a collector..
but that's where the hard stuff starts.

~~~
nwmcsween
First I'm not trashing the project, I'm just wondering why simple unix like
solutions aren't used.

Store it where ever you want, this isn't a magical datastore that makes things
faster, use clickhouse-client, whatever it doesn't matter.

There is a widening disconnect between the unix way and how new projects are
created.

------
xer0x
But I just setup Elasticsearch and Kibana yesterday!

~~~
hathawsh
A few years ago, I set up centralized logging with Elasticsearch and Kibana on
a site with very little traffic (less than 100 requests per minute). After a
few months, it apparently fell behind on indexing and worked the logging
server's hard drives 24x7. After a few more months, the server died. (It
refused to power up and I wasn't interested in going deeper.) I freely admit
that I don't know how to configure Elasticsearch correctly, but I feel like it
should have been able to handle the indexing load in its default
configuration.

I think I'll try out this new logging solution. I only want basic indexing.

~~~
sandstrom
I try to stay away from long-lived processes written in Java. Maybe it's just
me, but I think they consume insane amounts of memory.

------
ecoqba11
Very interesting, will check it out!

------
endian_ogino
As inmature I'm my first thought was thst the logo looks like a dick to me. :|

~~~
jitix
You cannot unsee it lol

But on a serious note the developers should indeed look into this. I recall
this comment from HN [0] which highlights why neutral branding and naming are
important.

0:
[https://news.ycombinator.com/item?id=14702513](https://news.ycombinator.com/item?id=14702513)

~~~
davejohnclark
I could never unsee it in this old easymock logo
[https://www.javacodegeeks.com/wp-
content/uploads/2012/10/eas...](https://www.javacodegeeks.com/wp-
content/uploads/2012/10/easymock-logo.jpg). They have since changed it to
something much more neutral, thankfully!

------
yamann
can it just replace ELK with a single golang binary? big if it does

~~~
netingle
If all you use ELK for us simple process/container logs then yes! That’s the
whole idea.

But Loki is not a replacement for Elastic - we don’t have any complex query
support, and we don’t do full text search. Elastic is great for analytics or
BI, but we think Loki is the way forward for your container/pod logs.

------
zahreeley
Lauki

------
lunulata
its just nvidia was over-valued and the market is correcting itself now
alongside all the other tech stocks getting smashed, but I think investors
should double down on nvidia now while the getting is good. The company still
has the best gpu engineers on the planet $$$.

------
dkarl
Grafana, are you sure you want to name a piece of software after a guy famous
for lying, betrayal, and other forms of unreliability? Besides, I already know
two cats, a dog, two children, and a Saturn hatchback named Loki. The Saturn
is named appropriately, by the way; let's hope your software isn't.

~~~
netingle
I'm just a really bit Marvel fan though - he did redeem himself in the last
film, no?

~~~
buster
In the actual nordic myths, he is a rather bad guy and in the end fights
against the gods when Ragnarök (armageddon) comes.

