
Working with the ELK stack - greenonion
http://engineering.skroutz.gr/blog/elk-at-skroutz/
======
lkrubner
I am working at a company that has 75,000 customers, and we keep track of a
fairly large set of personal data about these 75,000 customers. The data is
typically kept in PostGres, but for reports, we assumed we could dump it out
in a denormalized form to ElasticSearch. We would not dump all the data, of
course, we only take 18 items that are considered very important. We had never
done much analysis of who our customers were, and what their level of
engagement was. We had a new person come in, focused on business intelligence,
and they were desperate to get some data about our customers. So I wrote a
short Python script that pulled the data we wanted out of PostGres and stored
in ElasticSearch. I then made it available to the team via Kibana. I assumed
everyone would be fascinated to look at the data and perhaps see various
trends.

But that didn't work. Kibana was unusable. With 75,000 records it never
loaded. Not in anyone's browser. So I cut that in half, to 36,000 records. And
still, it never loaded. So I kept cutting the amount. And eventually I got
down to 10,000 records. Then it loaded, but was so slow no one could use it.
So finally I cut it down to 7,000 records. Now it loaded, and it was fast
enough that we could use it.

To do the real analysis, I ended writing another script that dumped out the
75,000 records as a CSV file, then I uploaded it to a spreadsheet on Google
Docs. This worked fine.

I am curious why Google spreadsheets can render 75,000 records, but Kibana can
not? I am also curious what the real use case is for Kibana? If it can't
handle large datasets, then its ability to make pretty charts seems useless --
we could never get the data in there to make the chart. I assume that other
people will do what I did, and use a spreadsheet instead.

~~~
spencera
Kibana dev here, also surprised by your experience. I have 1 million web log
documents stored in elasticsearch right now, running on my macbook, and Kibana
runs quite fast IMO. Would love to debug this further if you wouldn't mind
filing an issue on Github.

~~~
lkrubner
Thank you, I will.

------
Karunamon
I would be seriously beginning to worry, were I working over at Splunk. I know
the ES guys have said, over and over, that they're not gunning for Splunk
customers but, as the ELK stack becomes a more mature tool, the reasons to use
Splunk and pay their _downright extortionate_ prices decrease more and more.

I'm assisting with a ~500GB/day cluster right now (with that number expected
to quadruple in the next year or so), and ELK has proven to be an amazingly
resilient and flexible tool.

~~~
Iaks
I've worked with both systems quite a bit, and really like them both. The
licensing obviously makes ELK a no-brainer for my personal use. But, to be
fair to Splunk the average required knowledge for their end-user is
significantly lower. For the crowd poking around here it seems obvious that
ELK is the right choice, but when I am unleashing a team of helpdesk techs on
a set of data I would probably have a much more nuanced consideration about
the tools.

I'm kind of pointing out the obvious, but it's my two cents for whatever
they're worth.

------
nissimk
Is there a lightweight solution for centralized logging with full text search
and a nice UI? ELK seems like it requires some serious RAM and cores even if
you are just talking about elasticsearch. Is there a similar solution that
works well in just a couple of gigs of RAM for a slower message flow rate?

Is anybody here successfully using ELK in a low ram environment with low
message flow?

~~~
snewman
If you're open to commercial solutions, there are several SaaS choices that
are quite cheap at low log volume. For instance, Scalyr (500MB/day for
$19/month), LogEntries (free up to 167MB/day), or Papertrail (33MB/day for
$7/month). All will provide centralized logging and full text search, and some
form of dashboards and alerting.

Scalyr provides especially powerful features for log parsing and analysis, as
well as integrating system and application metrics, and it's wicked fast --
most searches run in well under 1 second. (Disclosure: I am the founder of
Scalyr.)

If you're interested, the respective web sites are easy to find, or drop me a
line (my email address is in my profile).

~~~
23david
For most teams, I definitely recommend getting started with a hosted
elasticsearch option initially, and after validating a POC, only then consider
a move to self-managed. Check out [https://bonsai.io](https://bonsai.io) as a
hosted option...

I've helped a bunch of financial and security companies implement and deploy
elk as a critical infra component. Generally I'm seeing it used to complement
Splunk, Alienvault and other log/network/security monitoring solutions.

Happy to help with any questions about deploying and managing ELK in
production.

------
makmanalp
Slightly off topic, but wow - thank you for the Turkish stemmer! There have
been a few attempts at this, but I'm not sure if any worked super well. For
those who don't know, Turkish is agglutinative
([http://en.wikipedia.org/wiki/Agglutination](http://en.wikipedia.org/wiki/Agglutination))
meaning most of the meaning in words are added as suffixes to the words, so
stemming is even more crucial and difficult - removing a suffix can completely
change the meaning. Example: "Kitap" is "Book" but "Kitaplik" is "Bookshelf"
(or more literally "Book-container").

Here it is: [https://github.com/skroutz/elasticsearch-analysis-
turkishste...](https://github.com/skroutz/elasticsearch-analysis-
turkishstemmer)

~~~
giorgos_tsif
Thank you, maybe we will write a blogpost describing this experience! (We are
Greeks we do not know Turkish!)

There is a ruby version also here:
[https://github.com/skroutz/turkish_stemmer](https://github.com/skroutz/turkish_stemmer)

------
octotoad
I'm currently working on deploying an ELK box for visualising a mix of
financial transaction data and corresponding network application backend
syslog records. So far so good, but frontend Kibana 3 performance can be
really painful at times given the large amount of "documents" that can be
involved (anywhere up to 5000 events per minute).

It seems like displaying multiple histogram panels may be quicker in real
time, given the shorter timeframe, but it would be nice to be able work with
something like a month's worth of data without major performance hits.

Not sure if this is something specific to Kibana design, Elasticsearch
indexing/search configuration or certain JavaScript engine behaviour.

~~~
charkost
We have (at skroutz.gr) kibana 3 dashboards with 61,269,264 documents using
the last month as time window and still no performance hit. Check our cluster
resources in our blog post to get an idea.

------
greenonion
Hi, OP here, we 're really excited that this made the HN front page! Feel free
to ask us anything.

~~~
kiyoto
For the logging layer, have you considered/looked at Fluentd? Disclaimer: I am
a Fluentd maintainer and would love to hear feedback from folks.

------
Thaxll
The problem I have with ELK is Kibana, it's not a tool designed to watch logs
but to do reporting.

~~~
mdaniel
That depends on how one defines "watch logs". Looking at the text, no; looking
at the metrics, yes.

We had a dashboard that would graph the number of errors grouped by the
component over a given span of time. If we saw the chart grow beyond our
tolerance, that's when we would break out elasticsearch-head or grep or $tool
and try to dig into the details.

Do you have a tool in mind that does what you are describing and thus would
supplement Kibana?

------
helfire
Just started playing with ELK last weekend, this guide got me up and running
pretty quickly [http://evanhazlett.com/2014/11/Logging-with-ELK-and-
Docker/](http://evanhazlett.com/2014/11/Logging-with-ELK-and-Docker/)

------
Dowwie
Hey thank you for sharing your experiences. I am a little confused about this
architecture decision if the goal is to offer business insight, though.

I don't understand why usage metrics that are used to calculate business
analytics would be logged in a data store that is separate from business
transactions and reporting related data. How would you run cohort studies or
track funnels?

------
ed_blackburn
I'm in an environment that has dismissed the idea of ELK for SCOM. I'll be
interested to see how this SCOM world progresses. I'm currently underwhelmed
by the SCOM narrative for both pushing data to it and more alarmingly how to
get web based dashboards published without great expense. Anybody got or know
of any success stories?

(Sulks off pining for ELK.)

------
ehurrell
The ELK stack is great, powerful, relatively easy to use and set up and very
configurable. I'd definitely consider it for any dashboard prototyping, given
that I've worked places where the 'MVP' dashboard could've been mocked up in
days with ELK rather than the months it took otherwise.

~~~
giorgos_tsif
so true about the 'MVP' dashboard

------
elementai
We've just finished migrating our ELK stack from cloud VPS'es to bare metal to
cut costs. I was and still impressed how smooth that went including upgrading
from 0.9x to 1.4

------
Epicism
Has anyone been able to do any sort of correlation or alerting with ELK?

