My primary concern was how it would cope when data exceeded RAM (I like to call this the MongoDB fallacy). However, based on this post it looks like the ElasticSearch setup works just fine.
First of, you need to make sure Elasticsearch can lock a chunk of memory (using mlock). About half of the available RAM is a good size, as other system processes need some memory, too, and not everything is on the heap.
You want to look for how many items you are indexing and how much of the heap the field data cache is using while doing queries. By default ES tries to keep the total heap size at about 3/4 of the allocated memory. The types of queries are important, too. E.g. if you do faceting or sorting on fields that have many different values, this will fill up the field data cache in no time.
Is that the kind of information you're after? I can go into in more detail if you have more specific question.
After that, you'll have to tune merging. The underlying Lucene storage engine chokes when it tries to merge segments that are large, so you have to tune the max segment sizes, etc.
Then, you'll have to tune your queries -- it's not feasible to do a full index scan on a large index so you have to get clever about how you pull data in chunks. If your data is timestamped and that timestamp is indexed, you can pull data for smaller time ranges which will be faster than pulling all data for an index at once.
It's a bit introductory on the memory-parts. One that goes in more detail on memory- and JVM-tuning is planned.
(Full disclosure: I work for Found)
It can indeed be slow, and this is usually a I/O issue. The Whisper storage backend does a lot of seeks, and people recommend using SSDs to deal with that. Also, you can have graphite-web just give you the (calculated) metric data for a particular query in JSON, and have it rendered client-side. http://graphite.readthedocs.org/en/latest/tools.html lists a few.
Finally, we are investigating other storage backends for more fault tolerance. Probably we'll settle on something based on Cassandra, like http://blueflood.io/.
Jordan Sissel wrote a short note about it over here: https://gist.github.com/jordansissel/3760225
That'll obviously be very expensive when the resolution is very high, though.
the upside is we have a dashboard that is tailored for our use case, the downside is it took a couple of days to get right.
For data retention I had different indexes with different TTLs, depending on the type of queries that hit them (queries that only dealt with frequent items were sent to an index with a very short TTL).
For graphing I also used Graphite, with metrics (http://metrics.codahale.com/) for sending data from Java programs and scales (https://github.com/Cue/scales) for sending data from Python applications.
The only problem I had was tuning it for faceting (Faceting consumed lots of RAM).
EDIT: or I can read it in the post...