
Bleve: Full-text search and indexing for Go - homarp
http://blevesearch.com/
======
mschoch
Core contributor here, happy to try and answer any questions.

One of the big things we're working on at the moment is improving the release
process. In addition to semantic versioning of the APIs we have to think
through how it applies to the binary artifacts created. We want Go modules to
be supported and be a part of the solution, but we are also mindful not to
break things for existing users.

~~~
taf2
That is awesome one of the biggest pains I have with elasticsearch is how they
so rapidly update major version number as like an excuse for breaking
compatibility. IMO most important feature of a database is stability, this is
something we get by maintaining compatibility with previous releases.

------
molsson
To get a feel for the size of Bleve, checkout this graph that shows commit
rates doing into Bleve versus Elasticsearch and Vespa:
[http://blog.minimum.se/assets/elasticsearch-open-source-
comm...](http://blog.minimum.se/assets/elasticsearch-open-source-commit-
rates.png)

If you don't need heavy lifting, then "sonic" implemented in rust is a really
nice lean alternative too:
[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)

FWIW, that graph is from a blog post I published earlier today:
[http://blog.minimum.se/2019/04/08/elastic-search-
introductio...](http://blog.minimum.se/2019/04/08/elastic-search-
introduction.html)

~~~
manigandham
Great blog post! First one I've seen with good comparisons to the other
options. I would recommend adding Xapiand too:
[https://github.com/Kronuz/Xapiand](https://github.com/Kronuz/Xapiand)

~~~
molsson
Thanks (you can upvote it here if you want:
[https://news.ycombinator.com/item?id=19605334](https://news.ycombinator.com/item?id=19605334)
)

I've added a mention of Xapian/Xapiand as well now and generated new graphs
that include data for those two projects.

------
dmitryminkovsky
Really nice to see the Go ecosystem developing. The other day I was searching
for a link/URL extraction library and the best one I could find was in Go:
[https://github.com/mvdan/xurls](https://github.com/mvdan/xurls) ("best"
because it actually uses a list of TLDs, for example:
[https://github.com/mvdan/xurls/blob/master/tlds.go](https://github.com/mvdan/xurls/blob/master/tlds.go)).
Was an unusual experience not finding something as good for Java.

~~~
ronnier
I’ve had the same experience with video encoding and image libraries. All the
momentum is in Go.

~~~
dmitryminkovsky
Any libraries in particular?

~~~
ronnier
Any of the libvips bindings. Ffmpeg bindings. Hls and dash libraries. Any
other wrappers around binaries in general. I found nice ports for syntax hi-
lighting that I couldn’t find in java.

~~~
dmitryminkovsky
Thanks I was wondering if people had Go written video libs or if these were
bindings.

------
dcu
Blast is a server on top of it:
[https://github.com/mosuka/blast](https://github.com/mosuka/blast)

~~~
totalperspectiv
I got excited thinking that somehow someone was doing bioinformatics in Go.
[https://en.wikipedia.org/wiki/BLAST](https://en.wikipedia.org/wiki/BLAST)

~~~
cure
We don't do bioinformatics workflows in Go (yet?) but we sure as hell run the
(CWL) workflows with Go on Arvados:
[https://arvados.org](https://arvados.org).

~~~
totalperspectiv
Arvados looks really cool! I haven't seen that before. It might actually be
exactly the workflow tool I didn't know I needed!

------
daschl1
Indeed, bleve can be considered on the same level as lucene. It is used to
power full-text search in Couchbase, which adds the distributed indexing and
querying portion on top (similar to what elasticsearch does to lucene)

------
lux
Bleve is awesome! In a page worth of code, I was able to make a really simple
search for our app's documentation that takes almost no resources. Elastic
would have been massive overkill.

------
bitt
Looks very interesting. How does it compare against sonic
[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)

~~~
sroussey
Curious on that comparison as well

------
jchw
I actually used this recently in a small personal project, it's pretty good.
It's not like Elastic or Solr, more like Lucene - which may very well be good
enough for your use case. The index structures are stored in BoltDB (which
stores in a flat file.)

There are some issues, though. For example I think it's currently not possible
to use the built-in query language to search Boolean values. So you might run
into some minor issues even in smaller projects.

~~~
kitten_mittens_
Our biggest pain point using bleve is umlaut handling (our documents are in
German).

Other than that, I was pleasantly surprised by how well it fit into our static
doc server that we shipped a binary around of.

~~~
mschoch
We recently added an ASCII folding filter, which may help:
[https://github.com/blevesearch/bleve/pull/1070](https://github.com/blevesearch/bleve/pull/1070)

------
JshWright
Is the name a reference to
[https://en.wikipedia.org/wiki/Boiling_liquid_expanding_vapor...](https://en.wikipedia.org/wiki/Boiling_liquid_expanding_vapor_explosion)
?

~~~
dmux
I'd think so given their icon. Interesting name choice, nonetheless.

~~~
JshWright
Oh, yep. I clicked right through to the code and didn't notice that.

~~~
nkozyra
It's also noted in the documentation.

~~~
JshWright
Who reads docs?

~~~
nkozyra
It was an accident, I promise!

------
tschellenbach
Go fanboy here, we write almost everything in Go over here at Stream. I'm
curious though, what made you build your own instead of using Elastic?

~~~
mschoch
Bleve originated as part of a solution to a problem customers faced when using
Couchbase. Almost all customers have at least some sort of search use case,
but often times that use case isn't particularly complicated. Many of them
were running an ES cluster, moving the data from Couchbase to ES with an
adapter, and using that to solve their search use case.

However, many of those same users complained about having to operate another
cluster, especially ones that weren't already using the JVM (since it was a
skill set they didn't have).

So, the appeal was to offer a service that runs as a part of the Couchbase
cluster. It wouldn't have to match every feature of ES, just shoot for 80/20
and customers would likely find it beneficial.

It was fortunate that Go was still growing in popularity within Couchbase at
that time, and we were able to position Bleve as a true open-source component,
on top of which some money-making value add could be layered.

------
mikece
Forgive my ignorance on the topic of full-text search but is this supposed to
compete with Elastic or is this more of an alternative to Lucene? If the
latter, then wouldn't you be limited to indexing text datasets small enough to
stay in memory?

~~~
jxub
Elastic is a distributed index which can't be said of Bleve afaik. Therefore,
it's more like Lucene.

~~~
keshavmr
Correct. Couchbase uses Bleve to provide the distributed search index/query.
See: [https://docs.couchbase.com/server/6.0/fts/full-text-
intro.ht...](https://docs.couchbase.com/server/6.0/fts/full-text-intro.html)

------
networkimprov
To the maintainers: what are the top 3-5 (or so) requested features, and do
you plan to implement them?

Any thoughts on how Bleve compares with Xapian or Trinity (C++ libraries for
full-text indexing)?

[https://xapian.org/](https://xapian.org/)

[https://github.com/phaistos-networks/Trinity](https://github.com/phaistos-
networks/Trinity)

~~~
mschoch
First, here are the top two requests that we do NOT plan to implement.

1\. Make Bleve a distributed index, or make Bleve into something that is a
more direct ES competitor.

We have no plans to do this because we think that is better built at a
different layer. We have hooks we introduce in certain places where we need to
plug-in code that would otherwise violate the boundaries. And that is an
arrangement that has worked well so far. There are multiple projects built on
top of bleve that allow you to index/search across nodes.

2\. Make an adapter for the XYZ key/value store.

This request goes back to the original bleve index which is serialized into a
key/value abstraction layer. When users run into size/speed issues with bleve,
many assume that just plugging in a faster key/value store will help. (Hey we
thought that too when we built it this way)

But, we've now replaced that index scheme with a new implementation called
scorch. Scorch is considerably smaller and faster, and manages it's own index
on disk, without using any key/value store.

As for things that we DO plan to implement:

1\. Size of the index still comes up a lot. Couchbase is a very performnace
sensitive user of Bleve, so I expect they'll lead the way on this front.

2\. Better (pluggable) scoring. Today our search result scoring is broken for
several types of queries, and the stuff that does score right is too tightly
coupled to the searching logic.

3\. Overhaul index mapping. Today bleve uses a mapping object to describe how
source objects/documents are indexed. One of the best ways we can simplify the
mapping is to make things more explicit. I think we tried to embrace the
concept of reasonable defaults, but we ended up with inheritance hierarchies
that are difficult to reason about.

There are lots of miscellaneous things like adding a data type that supports
IPv6, or more advanced queries (lots of variations on span queries).

------
karterk
I've also been working on a simple, fast, typo-tolerant search engine.
Shameless plug:
[https://github.com/typesense/typesense](https://github.com/typesense/typesense)

~~~
networkimprov
But, but... it's not in Go! :'(

Just teasing you.

You should contrast it with Xapian and Phaistos Trinity on your README.

------
cellularmitosis
Oh, different bleve :)

[https://www.youtube.com/watch?v=LrJxIraQelc](https://www.youtube.com/watch?v=LrJxIraQelc)

[https://www.youtube.com/watch?v=cm5VB7JaJWA](https://www.youtube.com/watch?v=cm5VB7JaJWA)

[http://www.geeked.info/tag/2bleve/](http://www.geeked.info/tag/2bleve/)

(pronounced "to believe")

------
philip1209
I'm looking at adding better search to our app soon, and I honestly don't
really have knowledge of any systems out there. Answer to this basic question
might help both me and others:

How would you deploy Bleve in a 12-factor app environment?

(Does Bleve directly support any persistence? Does it support distributed
workloads? Could a "trained" model get passed to read-only nodes?)

~~~
mschoch
Bleve is just a library that provides this functionality, you first have to
build an application that uses Bleve, and deploy that.

Bleve does (optionally) support persistence, so reading/writing files is one
place it does directly interact with the environment. The environment must
support mmap.

There are several projects which support distributed index/search workloads
with Bleve. The exact approaches vary, but they all use Bleve to perform node
local operations, and coordinating this is done at a higher level by the
application.

I suspect I don't understand the terminology you're using in the last
question, as Bleve has no training, models or nodes.

------
networkimprov
Are the maintainers aware that Bleve is unsafe to use on MacOS with Go 1.11
and earlier?

[https://github.com/golang/go/issues/26650](https://github.com/golang/go/issues/26650)

~~~
mschoch
We were made aware of this in September:
[https://github.com/blevesearch/bleve/issues/783#issuecomment...](https://github.com/blevesearch/bleve/issues/783#issuecomment-420592339)

~~~
networkimprov
That was me :-) It puzzled me that there was no public comment about it.

------
_adamb
I am left confused, amused, and amazed by their choice of logo. I love it and
it makes me reconsider my, apparently conventional, logo choices...

~~~
cellularmitosis
[https://www.youtube.com/watch?v=UM0jtD_OWLU](https://www.youtube.com/watch?v=UM0jtD_OWLU)

holy smokes:
[https://www.youtube.com/watch?v=K-tUQTw_Vtk](https://www.youtube.com/watch?v=K-tUQTw_Vtk)

------
ausjke
it's used in gitea too

