

First Monthly Challenge: Elasticsearch - fheisler
http://engineroom.trackmaven.com/blog/first-monthly-challenge-elasticsearch/

======
arafalov
> Provide real-time text search over a large corpus (ie, some subset of
> Project Gutenburg, a bunch of product reviews, etc.) That would be nice. But
> have you looked at the PG export yet? It's a million of individual files in
> the semantic web notation with duplication for the author information, etc.
> I would LOVE for somebody to convert that into the full-fidelity form
> suitable for import into the search engine. So, please do it for real, not
> just as a 'possible'.

Another dataset with similar problem is Google Mail takeout. Theoretically in
an mbox form, there is apparently enough quirks in there to not be parsable by
the standalone 3rd party libraries. Somebody said python might be able to do
it, but I haven't seen the confirmation specifically for Google Mail.

If you manage to do either and document the steps/share the code, I'll
personally sing your praises at the search engine workshops/meetup/lectures I
do (usually Solr rather ES).

~~~
YousefED
I actually just loaded emails from both the GMail API and an mbox file into
ES. Haven't gotten around to do advanced analytics yet, mainly using it to
load into Mixpanel / Intercom to track events such as "customer has replied to
one of our emails". We then use this info to do funnel analysis in Mixpanel,
and exclude users we're currently in touch with via email from intercom
onboarding emails.

~~~
arafalov
Well, don't just brag. Write a blog post, publish a GitHub repo. Share it
somehow.

------
linux_devil
I use elastic search + Python scripts(running as bolt on Storm)+ Kibana (with
better map) for analytics and its uber cool . Only things where Kibana miss
out is 'unique counts' , multiple dashboards across different Document types
etc. I hope these get addressed in near future .

~~~
divideby0
Kibana 4 should have some of that:

[http://www.elasticsearch.org/blog/kibana-4-beta-1-released/](http://www.elasticsearch.org/blog/kibana-4-beta-1-released/)

It exposes quite a bit of the new aggregation functionality added to ES 1.x.

~~~
Karunamon
Unfortunately Kibana 4 is missing a lot of functionality from 3.x. One feature
that we really want back is the global search scope per dashboard.

Example: Say you've got a dashboard showing hits on a web service. You've got
a pie chart showing HTTP return codes, a bar chart showing response times, and
another few graphs and charts detailing various data out of the requests
themselves.

You could click on, say, the "500" in your return code pie chart, and then
every visualization on the page would redraw and show you stats for just
requests that that were 500s. (What's unique about the requests that return
server errors?)

Or turn it around - click on the section of a chart that denotes requests that
took longer than 100ms to process, and now you see info about those requests
only. (What makes these long-time requests so special?)

This was a _jaw-droppingly awesome_ troubleshooting tool, and now it's gone. I
hope they return it before Kibana4 gets out of beta!

------
troygoode
Great idea Fletcher. We <3 ElasticSearch here at Lanetix as well and I think
some of our developers that haven't worked with it as much will give your
challenge a shot.

~~~
fheisler
Awesome, send along links to anything they come up with! We're thinking of
live streaming the meetup, so that might be a good way to share around as
well.

~~~
troygoode
Will do. And we'd love to watch if you live stream it.

------
mercnet
Great article as I now feel the need to research Elasticsearch vs Postgres
Full Text Search to see which one would benefit my side project.

~~~
chishaku
I had the same question and went with Elasticsearch. Also check this re:
postgres... [http://blog.lostpropertyhq.com/postgres-full-text-search-
is-...](http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-
enough/#1)

