
Using ElasticSearch and Logstash to Serve Billions of Searchable Events - twakefield
http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash-to-serve-billions-of-searchable-events-for-customers/
======
justinsb
I've used ElasticSearch & Logstash (& Kibana, also shown here), and I've been
nothing but impressed.

My primary concern was how it would cope when data exceeded RAM (I like to
call this the MongoDB fallacy). However, based on this post it looks like the
ElasticSearch setup works just fine.

~~~
ralphm
Yeah, it takes a bit of tuning, depending on what you want to do with
Elasticsearch. There are so many different use cases. I found that especially
keeping the heap size for field data down to 40% helped quite a bit, in my
case.

~~~
justinsb
I'd love to see a blog post that went into detail about those issues: what to
measure, what to tune etc.

~~~
ralphm
What I found most important is to monitor Elasticsearch while doing that
tuning. That's when I set up Graphite and StatsD and
[https://github.com/ralphm/vor](https://github.com/ralphm/vor).

First of, you need to make sure Elasticsearch can lock a chunk of memory
(using mlock). About half of the available RAM is a good size, as other system
processes need some memory, too, and not everything is on the heap.

You want to look for how many items you are indexing and how much of the heap
the field data cache is using while doing queries. By default ES tries to keep
the total heap size at about 3/4 of the allocated memory. The types of queries
are important, too. E.g. if you do faceting or sorting on fields that have
many different values, this will fill up the field data cache in no time.

Is that the kind of information you're after? I can go into in more detail if
you have more specific question.

~~~
justinsb
That's great stuff - thanks. Just trying to collect tips from those that have
been there, so that when I get there I have something to start from!

~~~
ralphm
You're most welcome. Drop me a line any time.

------
demmonoid
Have you considered using any Graphite alternatives? I found it to be very
slow with rendering data, especially when multiple arrays are being used as
sources. Also, it lacks some features that graphs rendering software is
generally supposed to have, at least the half-a-year old version.

~~~
ralphm
I did look at the greater landscape. What I found great about Graphite is the
number of functions you have to massage the metrics. Not all metrics come in
the same form (total number of bytes received since boot, or number of bytes
received per second) and things like the `hitcount` and the `perSecond` (in
the upcoming 0.10) functions really help.

It can indeed be slow, and this is usually a I/O issue. The Whisper storage
backend does a lot of seeks, and people recommend using SSDs to deal with
that. Also, you can have graphite-web just give you the (calculated) metric
data for a particular query in JSON, and have it rendered client-side.
[http://graphite.readthedocs.org/en/latest/tools.html](http://graphite.readthedocs.org/en/latest/tools.html)
lists a few.

Finally, we are investigating other storage backends for more fault tolerance.
Probably we'll settle on something based on Cassandra, like
[http://blueflood.io/](http://blueflood.io/).

~~~
brasetvik
Have you looked into using Elasticsearch's date histograms as sources for
Graphite-functions?

Jordan Sissel wrote a short note about it over here:
[https://gist.github.com/jordansissel/3760225](https://gist.github.com/jordansissel/3760225)

That'll obviously be very expensive when the resolution is very high, though.

~~~
ralphm
I missed that, but that looks pretty awesome. Thanks for sharing!

------
dougk7
Elasticsearch is really awesome for keeping large amount of searchable data. I
used it in a previous application where we stored millions of items a week.

For data retention I had different indexes with different TTLs, depending on
the type of queries that hit them (queries that only dealt with frequent items
were sent to an index with a very short TTL).

For graphing I also used Graphite, with metrics
([http://metrics.codahale.com/](http://metrics.codahale.com/)) for sending
data from Java programs and scales
([https://github.com/Cue/scales](https://github.com/Cue/scales)) for sending
data from Python applications.

The only problem I had was tuning it for faceting (Faceting consumed lots of
RAM).

------
nodesocket
Here is a blogpost we ([https://commando.io](https://commando.io)) wrote on
shipping nginx access logs using LogStash and ElasticSearch

[https://medium.com/devops-
programming/b01bd0876e82](https://medium.com/devops-programming/b01bd0876e82)

------
zerop
After reading this article, I checked out logstash.net today nearly after an
year.... I am happy to see they added so many connectors and it has grown so
much.... This open source software can be serious competition to splunk..

------
dreamdu5t
How much time was involved in building this setup? Just curious.

~~~
alexk
2 devs and 2.5 months, including multi-tenant client service with
authorization

------
jimmcslim
Is this a compelling OSS replacement for Splunk?

------
demmonoid
Also, are you using local statsd installations at each server that is sending
stats, or you keep statsd separately?

~~~
ralphm
In our case, most of the metrics come from the events flowing through
Logstash, and statsd is on the machines running the Logstash server. Note that
we are not using Logstash for shipping log events.

~~~
imperialWicket
I'll bite: what are you using to ship logs?

EDIT: or I can read it in the post...

~~~
dreamdu5t
Probably just syslog or its recent variants.

------
ddorian43
Anyone used elasticsearch as a database, like mongodb ?

~~~
techscruggs
It won't have a decent Disaster Recovery plan until 1.0, so I would suggest
holding off on making it the authoritative source of any of your data.

