
Bayard: a full-text search and indexing server written in Rust - jinqueeny
https://github.com/mosuka/bayard
======
xvilka
It would be nice to integrate all Rust alternatives to ELK stack:

1\. Toshi[1] - alternative to Elasticsearch

2\. Sonic[2] - alternative to Elasticsearch

3\. Vector[3][4] - alternative to Logstash

4\. native_spark[5] - alternative to Apache Spark

[1] [https://github.com/toshi-search/Toshi](https://github.com/toshi-
search/Toshi)

[2]
[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)

[3] [https://vector.dev/](https://vector.dev/)

[4] [https://github.com/timberio/vector](https://github.com/timberio/vector)

[5]
[https://github.com/rajasekarv/native_spark](https://github.com/rajasekarv/native_spark)

~~~
CameronNemo
One of logstash's main draws is as a data transformation pipeline. You can do
lookups via dns or a json or csv file, for example. From what I can tell
vector is just a simple log shipper.

~~~
manigandham
Vector is a source -> transform -> sink pipeline as well. There are no
transforms that do lookups or joins available now but the functionality is
supported if someone writes a custom transformer middleware.

------
jinqueeny
It's built on top of Tantivy ([https://github.com/tantivy-
search/tantivy](https://github.com/tantivy-search/tantivy)) that implements
Tha Raft Consensus Algorithm
([https://raft.github.io/](https://raft.github.io/)) by raft-rs
([https://github.com/tikv/grpc-rs](https://github.com/tikv/grpc-rs)) and The
gRPC (HTTP/2 + Protocol Buffers) by grpc-rs ([https://github.com/tikv/grpc-
rs](https://github.com/tikv/grpc-rs)) and rust-protobuf
([https://github.com/stepancheg/rust-
protobuf](https://github.com/stepancheg/rust-protobuf)).

~~~
mrec
So would it be roughly accurate to say that Bayard is to Tantivy what
Elasticsearch is to Lucene?

~~~
g4nt1
There is toshi-search [https://github.com/toshi-
search/Toshi](https://github.com/toshi-search/Toshi) who is trying to be a
drop-in replacement for Elasticsearch. My understanding is that Bayard is
trying to achieve the same use cases as Elasticsearch but with a different API

------
karterk
Elasticseach is notoriously hard to roll out and develop against (for smaller
companies especially), and so I am happy to see smaller projects in this
space.

I've also been working on a light, fast, typo-tolerant search engine:
[https://github.com/typesense/typesense](https://github.com/typesense/typesense)

It's been around for a couple of years now, and have a few happy customers who
have had great success in replacing $X0,000/year popular hosted search with
Typesense!

~~~
elephantum
Postgresql is quite good easy to integrate alternative to Elastic.

It has built-in fulltext search:
[https://www.postgresql.org/docs/12/textsearch.html](https://www.postgresql.org/docs/12/textsearch.html)

~~~
mfrye0
Agreed. We've been using this successfully for awhile now.

I'm curious though at what point something like this or ES itself would make
sense for primarily text search. Is speed the biggest thing, or is it more
flexibility to tweak and get better search results?

~~~
rpedela
Postgres is fine if your search problem is mostly a recall problem. If N is
large enough or you have small N with enough overlapping keywords (long
documents) then precision becomes important. That is when you need things like
BM25, PageRank, machine learning, etc and Postgres just doesn't cut it
anymore. Additionally spell check, high-quality autocomplete, multiple
languages are better supported and much easier to implement in ES/Solr.

------
atheiste
I see the author did the same search engine in Go a while ago. So I suppose
the project being a side project to learn a new language. Or is there a
different reason?

~~~
devy
> So I suppose the project being a side project to learn a new language.

Perhaps. But the author Minoru Osuka ain't nobody[1]. He is

\- Engineer at Mercari, Inc.

\- Committer at Apache Software Foundation

\- Co-author of a Apache Solr book in Japanese

\- Ex-Yahoo! JAPAN

\- Ex-Rakuten

So yeah, I think he knows what he's doing.

[1] [https://twitter.com/minoru_osuka/](https://twitter.com/minoru_osuka/)

------
NPMaxwell
FYI: Who was Bayard Rustin?
[https://en.wikipedia.org/wiki/Bayard_Rustin](https://en.wikipedia.org/wiki/Bayard_Rustin)
It's a silly play on words celebrating one of the very great heroes of 20th
Century America

~~~
MS90
I wonder if he was named after Le Bon Chevalier?

[https://en.wikipedia.org/wiki/Pierre_Terrail,_seigneur_de_Ba...](https://en.wikipedia.org/wiki/Pierre_Terrail,_seigneur_de_Bayard)

~~~
fulmicoton
[https://twitter.com/minoru_osuka/status/1190236816277790720?...](https://twitter.com/minoru_osuka/status/1190236816277790720?s=19)

------
mmoez
"X written in Rust" is becoming a tiring clickbait pattern on tech boards.

Why not simply announcing "X" in the title?

~~~
reacharavindh
It’s not useless. I for one associate language with their run time
properties..

Written in Python - easy to understand, but lacks performance. Probably cant
use more than 1 CPU core. Needs a lot of memory.

Written in Go - fast enough for most cases, all CPU cores, but possibly high
mem usage becUse of GC. I need to plan for it.

Written in Rust - possibly new and maturing, uses memory effectively, likely
to use all cores. Easy to deploy (single binary)

Written in JS - probably not for me - personal taste and hate of npm
ecosystem.

Written in C - probably the best performing, but less robust, no memory
safety.

So, “written in” helps in judging whether to care for that project or not to
some extent.

~~~
michaelcampbell
At least it's not Rust-this or that-4-Rust. The bundling of the language in
the title of the app is a fad I hope I see the end of.

Yes, Python, we get that you wrote WhateverPy in Python.

~~~
weberc2
Rust has its own -rs thing going on.

~~~
CameronNemo
Mostly for bindings though.

~~~
weberc2
Ah, that would make sense. I’ll have to take your word for it.

------
jinqueeny
FYI: this is just a PoC and is very early in the stage :)

------
wiradikusuma
Similar space, Vespa from Yahoo: [https://vespa.ai/](https://vespa.ai/)

~~~
atombender
Vespa is great, but is _much_ harder to use than Elasticsearch. It's also very
much geared towards ranking and not filtering.

For example, you can't do exact string matches (!). All string matches are
case insensitive. You also cannot index nested fields (e.g. a map or array of
maps) for search. In the end, you have to munge your data considerably to make
it fit Vespa's data model.

It also feels odd and antiquated in many ways, with XML configurations all
over the place.

But it's fast!

------
fnord123
1 commit.

Is this an opening of a mature project that has been coded in private
somewhere? Is this just a code drop on the community?

Note: this comes from a developer in Japan. Tantivy's main developer is also
based in Japan. @fulmicoton, is there any interaction between the projects?

~~~
wil421
I was looking at Raspberry Pi projects for Rust. There were similar complaints
on the Pi forum. Looks I’ll be using Python for my projects.

~~~
wil421
Lots of downvotes but no replies. Rust is the language de jour on HN at the
moment. Lots of comments about it being a young language so you’ll just have
to wait.

I invite a healthy debate.

~~~
dnautics
If you are looking for a platform for the RPi, check out the nerves project.

------
atombender
Speaking of, has anyone tried working with Rucene [1], the Lucene port to
Rust?

[1] [https://github.com/zhihu/rucene](https://github.com/zhihu/rucene)

------
jhancock
I'm looking for an easy to use typeahead/autocomplete search solution.
javascript lib for frontend paired with easy to manage, lightweight server.
something modern.

The dataset isn't huge. e.g. 1 million strings of no more than 512 utf-8 chars
each and not reindexed more than once a day or week. clusters, sharding etc
unnecessary.

I keep hoping to stumble on a fully baked solution...any ideas?

~~~
FridgeSeal
Would this meet your needs?

[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)

------
snitch182
Interesting. Since the underlying engine(Tantivy) is faster than lucene - at
least in their benchmarks - it should be faster that solr. Seems like the
author is exploring a faster alternative to solr. I never got around to
explore elasticsearch since our solr instances are running so smoothly.

------
manigandham
Awesome to see more competitors for elasticsearch. Added to the list:
[https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...](https://gist.github.com/manigandham/58320ddb24fed654b57b4ba22aceae25)

------
rapsey
Raft storage in-memory only. Not exactly safe replication.

------
lolive
Is the query langage less oscure than the query langage of ElasticSearch?

------
Kinnard
Named after Bayard Rustin??

~~~
fulmicoton
No.

------
coleifer
The proof that rust is a meme language is evidenced by the need to include
"written in rust" every time a rust project is mentioned.

~~~
coldtea
That makes no sense.

Devs write "written in rust" because:

\- it's interesting to other developers (which are HN's main audience) to see.

They don't sell some shrink wrapped software, where the language doesns't
matter. Nor some already established package you just download and use as is
like Postgres or Bash, or whatever.

\- it matters for those looking for compatible stuff for their own projects
(for libraries, reusable packages, etc.)

\- it offers certain guarantees other languages do no (e.g. memory safety,
native binaries) which can be an important criterium for those looking for a
project

\- it's important for possible collaborators to know the language (the project
being Open Source and everything).

\- in a field where a Java based project (Lucene/Elastic Search) dominates, it
is important to advertise that you offer a non-Java alternative for people who
want to avoid Java/Oracle/etc.

\- Rust is also currently on the rise (!= meme), and thus gets new
programmers, and new greenfield projects. And since those people are trying
the language, they want to advertise their involvement to the community, talk
about how they found the experience, etc.

~~~
Gene_Parmesan
> it offers certain guarantees...

Does it really, though? Unsafe rust exists, and while the language is
certainly built to strongly encourage certain safer programming practices, I
don't really see it as offering any _guarantees_ at all. If a project is open
source I can go and investigate for myself, but who has the time for that?

The guarantees that safe rust provides are very good for me as a developer,
because it kills a large class of potential errors and will therefore
theoretically make the dev process easier. But I don't really feel any trust
in these 'guarantees' when I switch roles to a user of someone else's
libraries or products.

Most of the rest of your points I agree with, but I also agree with the
original comment that it starts to make me feel just a little bit eye-rolly
every time I see "...written in Rust." (And I do like the language.)

~~~
wilsonthewhale
everything needs unsafe code to run, since it has to interact with the
OS/CPU/outside world. Rust's main contribution is that provides ways for you
to clearly section off and declare that code is "unsafe" and needs extra
examination to uphold its invariants.

as an example, std::vec::Vec is implemented with quite a bit of unsafe code,
but all Rust consumers can be confident that it is vetted and the abstraction
presented around it is safe.

of course, this isn't a perfect solution, but it's much better than e.g.
C/C++, where you basically treat every line as "unsafe".

