Hacker News new | past | comments | ask | show | jobs | submit login
Elasticsearch Raises $70 Million (elasticsearch.com)
232 points by asm89 on June 5, 2014 | hide | past | web | favorite | 64 comments

I'm sitting in Elasticsearch training right now (during a caffeine break). These are some really great guys that know their stuff, and they're committed to contributing back to the OSS version everything that makes sense. They contribute a ton of code back to Lucene and employ a lot of the brightest minds in this space.

On the product side, I'm sitting here being amazed at some of the problems they've solved very elegantly. Elasticsearch has a bright future.

I'm starting a new job in August where I will focus on building search-based applications with Elasticsearch. I honestly can't wait. This news makes me happy as I know I'll be learning relevant technology.

I'm also in Elasticsearch training (Victoria, London). Impressed with the trainers, both core developers with a passion for the product. Bodes well.

How is it better than Solr?

I havent looked at Elastic Search in a long time, so I really do want to know. Not trying to pick a fight. ;)

I m using it since earlier versions for almost 3 yers, i guess. It makes config easier, offers a well designed, easy API. Especially for Rails, PHP, Django crowd, an easy choice to create a sensible text search. The main benefit was an easier Lucene with JSON/http API. Yes a lot more advanced features are there too.

Advantageous may be the fact that you can extend ElasticSearch with JavaScript, whereas I think as easy for someone who understands the Solr codebase, I've wanted to try that once, but I had to resort to doing the necessary computation in the language calling the Solr service.

* Kibana on ElasticSearch. This is huge. You get a polished Search & Graphing UI with very little effort.

* Much more approachable config.

* Its clustering is easier to setup.

* Eventhough logstash is a bit heavy for my taste, the whole ELK stack is really nice for aggregating server logs.

Agreed about logstash being too heavy for your app servers. It would be nice to have the functionality in a trim, native binary.

As things are, one can always direct an app sever's syslog to a logging fleet running logstash (or elasticsearch running embedded logstash): http://cookbook.logstash.net/recipes/rsyslog-agent/

I think that's what Heka is supposed to be.

It looks pretty promising, but I have yet to meet anyone who uses it.


Oh, that looks very nice. Neat to embed Lua as a sandboxed plugin environment.

Solr has an ELK equivalent, called SILK. Actually uses most of the same components (Banana=Kibana for Solr, etc): http://www.lucidworks.com/lucidworks-silk/

Re: logstash too heavy

That's what logstash-forwarder[1] (formerly lumberjack) is for. It's in Go, not Java.

[1] https://github.com/elasticsearch/logstash-forwarder

I use beaver, which is a python variant of the logstash agent

An overview presentation video was just released: http://www.berlinbuzzwords.de/session/side-side-elasticsearc...

If you want to know more details, vote for my talk in November, I'll be digging into much more technical comparison: http://lucenerevolution.uservoice.com/forums/254257-open-sou...

Honestly I'm not sure, I haven't used Solr. I was just comparing it to the relational databases I've used, key-value stores like Cassandra, and other documents stores like Mongo. It definitely doesn't replace relational stores, but IMO blows all of the other NoSQL data stores out of the water. The Lucene indexing behind the scenes just really delivers some impressive functionality.

I've used both, although not Solr 4.

Elasticsearch has easier config, especially for clustering. It is designed to be schemaless so you can push almost any JSON data into it.

Performance is adequate in both.

It's not a big advantage either way, unless you need clustering.

Elasticsearch isn't really schemaless, but you can have it guess and extend the schema as fields are encountered (or you can have it treat specific field as blobs of unindexed json).

If people are curious as to why VCs are interested in open-source businesses:


I also recommend the JBoss Business Case Study: "Lessons from Leaders: How JBoss did it" [1]

[1] http://www.forentrepreneurs.com/lessons-from-leaders/jboss-e...

Along those same line, here's another good read about the recent npm, Inc. investment. Feel free to skip down to 'Open Source and $$$' section. http://words.steveklabnik.com/is-npm-worth-26mm

In short, it's debatable whether an investment into npm, Inc. will directly pay off for the investors. What the investment is more likely about is creating the infrastructure for new, billion dollar companies to pop up, giving those investors an inside track to the new companies.

In the case of ElasticSearch, investing in this infrastructure project absolutely makes sense. "Big Data" is becoming huge, but it's still relatively dumb. Up until now, we've primarily been focused on tools and technologies to source, aggregate, and analyze the data. But lots of companies are now popping up who are built on the idea of making intelligent use of all this data, far beyond what humans are naturally capable of. ElasticSearch of course isn't the whole solution, but it's part of it.

(I work for one of those companies, and we use ElasticSearch)

Open-source is becoming the only way to develop (and, especially, maintain) complex software. The fact that the engineers can gain employer-independent reputations gives them an incentive that's astronomically expensive (as hedge fund compensation goes) to replicate otherwise.

The world is finally figuring out that it's impossible to employ top talent, but that it can be quite lucrative to sponsor it.

It's not impossible to employ top talent. That is a very ignorant statement.

So is, "Open-source is becoming the only way to develop (and, especially, maintain) complex software". This is hyperbole. Complex software has been developed and maintained for over half a century, in myriad ways.

Top talent has access to thousands of other well paying roles. You cannot treat them the same way companies treat regular employees (badly). If you give them pointless work, or work without career growth, they can up and leave in a heartbeat. If you try and bully them or pressure them to work long hours or place pointless restrictions (dress code, start times, vacation policy, etc) on them they will leave.

That is a statement I think most people on here would agree with. The original statement wasn't. A company can be a joy to work for. You can also make some really good cash.

Better to say it's impossible to keep top talent around for more than a couple of years. Once they start to get bored they leave.

In technical fact, not impossible but, in fact, common. In spirit, you don't really employ top talent, in the sense that subordinating it wastes it. You sponsor it and become a beneficiary of it.

Perhaps the OP meant to say it's tough to retain top talent without being on the top of your own game in terms of providing benefits and culture to them.

What I find interesting about the Elasticsearch story is the success it has given that the core product (the search server) is completely open source.

My fear is that with money comes an 'Enterprise' edition.

I don't think the "open core" model is in their best interests. Look at a company like Zenoss, where their OSS version is their biggest competitor. It doesn't make sense for ES to do that.

Now marvel[1] is where they've started monetizing and I'm sure they make a good bit of money from "professional services" teaching companies how to deploy ES at scale. Hopefully they pull it off, as the world needs a good competitor to splunk. ES has the backend tech, but kibana has a loooooong way before it rivals the interface for searching splunk. Here's to hoping!

[1] http://www.elasticsearch.org/overview/marvel/download/

Marvel is quite interesting. It is a product everyone _could_ build on their own (basically, it is Kibana over their own metrics data), but at that price point, there is no reasonable reason to do so.

I was in a class with the founders a few months ago - for whatever it's worth, they're vehemently opposed to segregating functionality like this.

Their plans are to offer support, and make their bones that way. Elasticsearch is a complex enough product at scale that this would probably be quite lucrative.

ElasticSearch is a sufficiently complex system that can be implemented in a ton of different ways to solve a ton of different problems. I'm sure there are more than enough consulting gigs floating around for them make a good bit of money both on actual consulting and selling additional software packages.

What would be so bad about an enterprise edition? Many companies require extra support and would pay for the enterprise edition and wouldn't even consider the product without this option.

Elastic Search make their money from Enterprise support licenses. There are so many ways of installing it , scaling and optimizing depending on application. Don't forget the training around. You have IBs like Goldman Sachs using it for log analytics. So the services around that are needed and real. I don't think they will need an enterprise version.

We love Elasticsearch. It's fast and accurate for huge amounts of data, super easy to scale, and incredibly easy to get started. That cash can only make the product better. Good for them!

Hopefully this means that the documentation gets some help, especially some of the official libraries. So far working with ES has been a mixed bag… seems good in theory but I've had a hard time getting over the learning curve.

Want to highlight some of your specific pain points?

I guess I'm in the minority based on the other comments here…

After spending a few hours with the documentation, I felt like I had a generally good feel for how you would interact with ES through curl, but then jumping into using the Ruby library there seemed to be a big leap and I felt like I needed to have a much more intimate knowledge of how ES worked to "get" it. A lot of guess and check before I figured out how to query properly. I've also been unsuccessful in figuring out how to implement accent folding. At this point I assume that this has something to do with "mapping", but I couldn't figure out where that was supposed to be set up (again, through the Ruby lib, but I was also unsure where to start to just accomplish that via curl…).

Who knows, maybe I just need to spend more time reading the docs, and maybe the information that I need is in there somewhere, but to me it felt incomplete or disorganized. I'd love to stop using Solr for document search, but making ES do things that I know how to do in Solr ended up being too time consuming.

I ended up feeling like I probably had to go to a training session to really get it, which is a shame…

I found the elasticsearch tutorials on youtube by Clinton Gormley pretty helpful in understanding the concepts. Also came across this book (which I haven't really read so I don't know how good it is, just posting it here) http://exploringelasticsearch.com/

I don't know OPs pain points, but mine were quite simply that the query DSL for ElasticSearch seems to be a thin wrapper over Apache Lucene. If you have experience working with Lucene, picking ElasticSearch isn't that difficult. If you've never worked with Lucene before a lot of the concepts aren't necessarily obvious, and it's incredibly easy to write queries that won't return what you expect but won't fail either. My experience trying to configure analyzers and trying to adjust search parameters involved a ton of trial and error.

Agreed completely.

Elasticsearch is great. I have spent the last 6 months working on a project using it as the primary search technology, it has been nothing but great. I have not had any training and tried to figure it out through the online docs (which in the last overhaul have gotten much better) so forgive me if any of what I'm about to say is wrong, or better yet please correct me so I can learn :)

here's the things I noticed when using it: - Since it's schemaless by default it will guess the data type of a document based on the first value it sees, as far as I'm aware you're not able to change a data type later on, so I found it best to create a schema (aka mapping) being explicit about the fields data types up front. I think explicit is better anyway (Zen of python ;) )

- It's blazingly fast. Like crazy quick.

- Use the geo data type if you're going to be doing radius queries. I've got 50M documents in the index and it queries insanely fast. It's been just as fast as PostGIS (which I also love)

- Use this as the GUI: http://mobz.github.io/elasticsearch-head/

- Do some proper research on filtering before you start, start here: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

the correctly configured snowball filter will make sure things like "rückwärts" will match "ruckwarts",

Could someone explain in a few lines why & how is elasticsearch a revolutionary thing (technically) ?

Do you need something beyond "highly scalable search solution"? A lot of enterprises (and I'd guess startups too!) need good search solutions and your two players in the Java space (that I know of) are Solr and Elasticsearch. We are using Solr in our product, but the more I hear about Elasticsearch the more I want to give it a serious try.

My reason for not using it originally was that we had prototyped our solution using Compass (which was the project before Elasticsearch) only to have it abandoned to work on Elasticsearch. So I was concerned about the "one dev" model and losing something that was a key component of what we needed.

Since then I've not been impressed with SolrCloud (would love to hear good experiences, but it seems their distributed model isn't right) and have been giving Elasticsearch serious thought again.

"In the Java space" meaning deployed in a Java servlet container. Both are stand-alone search servers, exposing their API for search and indexing through HTTP using XML and/or JSON.

> "In the Java space" meaning deployed in a Java servlet container.

I didn't say nor mean to imply that. Technically, Solr is based on servlet technology whether stand-alone or not, you can choose to deploy it in a web container of your choice or use the Jetty instance it comes with for the "stand-alone" experience. I don't know much about Elasticsearch's architecture personally.

Elasticsearch is very component-based and can be deployed as a servlet if you really wish.

Just use a different transport implementation:


We have actually been using elasticsearch as our main data storage at backstitch for almost a year now. How I usually explain its awesome power is (1) take the schemaless nature of MongoDB, (2) add the indexing power of Lucene, (3) and give it the flexible scaling of Riak.

The documentation is also fantastic and the plugin availability pretty good.

It's what Solr would look like if you designed it today.

More specifically, it is what Solr would look like if you designed it today but also did not care about dealing with all the use cases Solr deals with TODAY.

The best proof is that Elastic Search is slowly hitting the same problem Solr hit at some point in the past and has to deal with them (scripting is now disabled by default, analysis/query language is slowly getting more complex, etc).

Not trying to downgrade ElasticSearch, they have done great things and have features Solr hasn't matched (yet). Percolation is one (though check Luwak for a comparison: http://www.berlinbuzzwords.de/session/turning-search-upside-... )

The clustering Just Works, with autodiscovery via unicast, multicast, Azure API, EC2 API, or Google Compute Engine API.

I am just starting to learn elasticsearch so am not a good source. But I didn't want something that was technically revolutionary. I wanted a tool that would be easy to get started with and provide something that is completely trivial as far as the end user is concerned (search).

It's an "it just works" technology that also goes very deep and solves a number of problems where the competing solutions are an order of magnitude more complex and/or expensive.

Something like Google Alerts could be designed easily with elasticsearch. There's probably no other technology (off the shelf) that could be used for an use case like that.

I have been prototyping with Elasticsearch for the last couple of months. I have nothing but great things to say about the software and documentation. Several other partner teams have taken notice of my work and will likely incorporate the software as well. Very exciting!

I saw these folks at gluecon. Haven't looked at a ton of dashboard solutions, but I found Kibana to be pretty compelling, simply because it was trivial to get the elk stack up and running and input arbitrary data. I am not as interested in log data, more in business metrics.

Awesome! I make heavy use of elasticsearch and am very happy with the performance.

It's typically the rest of the pipeline now that causes most of the latency whereas search used to be the bulk of the duration of a request.

It has also spurred other entities to improve their search performance :)

I always thought that Elasticsearch was something provided by Amazon, and hence not really interested in it. So I was rather surprised to see in the title that they raised money.

Apparently what Amazon provides is called Elastic MapReduce, not Elastic Search.

Great product and wish them best of luck. Had great success using with Hive with their Apache Hadoop plugin for doing quick analysis.

Maybe they can finally write some docs

The docs are pretty good and have been for a while:


Admittedly they were pretty poor early on, but they've matured with the product.


Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact