One of the big things we're working on at the moment is improving the release process. In addition to semantic versioning of the APIs we have to think through how it applies to the binary artifacts created. We want Go modules to be supported and be a part of the solution, but we are also mindful not to break things for existing users.
If you don't need heavy lifting, then "sonic" implemented in rust is a really nice lean alternative too:
FWIW, that graph is from a blog post I published earlier today:
I've added a mention of Xapian/Xapiand as well now and generated new graphs that include data for those two projects.
I'm not associated with the team, but I take every opportunity to promote it, as I think it is a very underrated project.
To give a feel for a similar use case, and the ability to run a simple benchmark...
There are some issues, though. For example I think it's currently not possible to use the built-in query language to search Boolean values. So you might run into some minor issues even in smaller projects.
Other than that, I was pleasantly surprised by how well it fit into our static doc server that we shipped a binary around of.
However, many of those same users complained about having to operate another cluster, especially ones that weren't already using the JVM (since it was a skill set they didn't have).
So, the appeal was to offer a service that runs as a part of the Couchbase cluster. It wouldn't have to match every feature of ES, just shoot for 80/20 and customers would likely find it beneficial.
It was fortunate that Go was still growing in popularity within Couchbase at that time, and we were able to position Bleve as a true open-source component, on top of which some money-making value add could be layered.
Any thoughts on how Bleve compares with Xapian or Trinity (C++ libraries for full-text indexing)?
1. Make Bleve a distributed index, or make Bleve into something that is a more direct ES competitor.
We have no plans to do this because we think that is better built at a different layer. We have hooks we introduce in certain places where we need to plug-in code that would otherwise violate the boundaries. And that is an arrangement that has worked well so far. There are multiple projects built on top of bleve that allow you to index/search across nodes.
2. Make an adapter for the XYZ key/value store.
This request goes back to the original bleve index which is serialized into a key/value abstraction layer. When users run into size/speed issues with bleve, many assume that just plugging in a faster key/value store will help. (Hey we thought that too when we built it this way)
But, we've now replaced that index scheme with a new implementation called scorch. Scorch is considerably smaller and faster, and manages it's own index on disk, without using any key/value store.
As for things that we DO plan to implement:
1. Size of the index still comes up a lot. Couchbase is a very performnace sensitive user of Bleve, so I expect they'll lead the way on this front.
2. Better (pluggable) scoring. Today our search result scoring is broken for several types of queries, and the stuff that does score right is too tightly coupled to the searching logic.
3. Overhaul index mapping. Today bleve uses a mapping object to describe how source objects/documents are indexed. One of the best ways we can simplify the mapping is to make things more explicit. I think we tried to embrace the concept of reasonable defaults, but we ended up with inheritance hierarchies that are difficult to reason about.
There are lots of miscellaneous things like adding a data type that supports IPv6, or more advanced queries (lots of variations on span queries).
Just teasing you.
You should contrast it with Xapian and Phaistos Trinity on your README.
(pronounced "to believe")
How would you deploy Bleve in a 12-factor app environment?
(Does Bleve directly support any persistence? Does it support distributed workloads? Could a "trained" model get passed to read-only nodes?)
Bleve does (optionally) support persistence, so reading/writing files is one place it does directly interact with the environment. The environment must support mmap.
There are several projects which support distributed index/search workloads with Bleve. The exact approaches vary, but they all use Bleve to perform node local operations, and coordinating this is done at a higher level by the application.
I suspect I don't understand the terminology you're using in the last question, as Bleve has no training, models or nodes.
holy smokes: https://www.youtube.com/watch?v=K-tUQTw_Vtk