Hacker News new | past | comments | ask | show | jobs | submit login
Bayard: a full-text search and indexing server written in Rust (github.com)
283 points by jinqueeny 16 days ago | hide | past | web | favorite | 106 comments



It would be nice to integrate all Rust alternatives to ELK stack:

1. Toshi[1] - alternative to Elasticsearch

2. Sonic[2] - alternative to Elasticsearch

3. Vector[3][4] - alternative to Logstash

4. native_spark[5] - alternative to Apache Spark

[1] https://github.com/toshi-search/Toshi

[2] https://github.com/valeriansaliou/sonic

[3] https://vector.dev/

[4] https://github.com/timberio/vector

[5] https://github.com/rajasekarv/native_spark


none of these are actual drop in replacements as far as I can tell for Elasticsearch

aka you can't point your Kibana instance to these "nodes" and have it speak the Elasticsearch API


+ Manticore Search - alternative to Elasticsearch (https://github.com/manticoresoftware/manticoresearch)

One of logstash's main draws is as a data transformation pipeline. You can do lookups via dns or a json or csv file, for example. From what I can tell vector is just a simple log shipper.


Vector is a source -> transform -> sink pipeline as well. There are no transforms that do lookups or joins available now but the functionality is supported if someone writes a custom transformer middleware.


It's built on top of Tantivy (https://github.com/tantivy-search/tantivy) that implements Tha Raft Consensus Algorithm (https://raft.github.io/) by raft-rs (https://github.com/tikv/grpc-rs) and The gRPC (HTTP/2 + Protocol Buffers) by grpc-rs (https://github.com/tikv/grpc-rs) and rust-protobuf (https://github.com/stepancheg/rust-protobuf).


So would it be roughly accurate to say that Bayard is to Tantivy what Elasticsearch is to Lucene?


There is toshi-search https://github.com/toshi-search/Toshi who is trying to be a drop-in replacement for Elasticsearch. My understanding is that Bayard is trying to achieve the same use cases as Elasticsearch but with a different API


Precisely yes.


typo: raft-rs is at https://github.com/tikv/raft-rs


thanks


Elasticseach is notoriously hard to roll out and develop against (for smaller companies especially), and so I am happy to see smaller projects in this space.

I've also been working on a light, fast, typo-tolerant search engine: https://github.com/typesense/typesense

It's been around for a couple of years now, and have a few happy customers who have had great success in replacing $X0,000/year popular hosted search with Typesense!


Postgresql is quite good easy to integrate alternative to Elastic.

It has built-in fulltext search: https://www.postgresql.org/docs/12/textsearch.html


Agreed. We've been using this successfully for awhile now.

I'm curious though at what point something like this or ES itself would make sense for primarily text search. Is speed the biggest thing, or is it more flexibility to tweak and get better search results?


Postgres is fine if your search problem is mostly a recall problem. If N is large enough or you have small N with enough overlapping keywords (long documents) then precision becomes important. That is when you need things like BM25, PageRank, machine learning, etc and Postgres just doesn't cut it anymore. Additionally spell check, high-quality autocomplete, multiple languages are better supported and much easier to implement in ES/Solr.


Postgres FTS is very basic and requires separate methods and extensions to do things like fuzzy matching. It also doesn't support modern relevance and ranking algorithms.


Search is somewhat embarrassingly parallel right? So Postgres is great until you want to shard all of your queries. Which is (of course) possible, but then you're using attributes of a tool that aren't specifically tailored to your problem space?

Postgres full text search is fast and easy, but it doesn't seem much more scaleable/reliable/resilient than a clustered search solution that is shaped to the problem at hand.


Is it fast though !?


Multi-faceted search that requires special indexing setup.


I see the author did the same search engine in Go a while ago. So I suppose the project being a side project to learn a new language. Or is there a different reason?


> So I suppose the project being a side project to learn a new language.

Perhaps. But the author Minoru Osuka ain't nobody[1]. He is

- Engineer at Mercari, Inc.

- Committer at Apache Software Foundation

- Co-author of a Apache Solr book in Japanese

- Ex-Yahoo! JAPAN

- Ex-Rakuten

So yeah, I think he knows what he's doing.

[1] https://twitter.com/minoru_osuka/


That is a good observation. The author might also need flexible search options at work. In any case, I have some interest in Rust but don’t actively use it. I found reading through the main server.rs file interesting as example code.


FYI: Who was Bayard Rustin? https://en.wikipedia.org/wiki/Bayard_Rustin It's a silly play on words celebrating one of the very great heroes of 20th Century America


I wonder if he was named after Le Bon Chevalier?

https://en.wikipedia.org/wiki/Pierre_Terrail,_seigneur_de_Ba...




I assumed the same thing. Quite the co-incidence.

And yes, Bayard Rustin is absolutely one of the great heroes of the 20th Century.


"X written in Rust" is becoming a tiring clickbait pattern on tech boards.

Why not simply announcing "X" in the title?


It’s not useless. I for one associate language with their run time properties..

Written in Python - easy to understand, but lacks performance. Probably cant use more than 1 CPU core. Needs a lot of memory.

Written in Go - fast enough for most cases, all CPU cores, but possibly high mem usage becUse of GC. I need to plan for it.

Written in Rust - possibly new and maturing, uses memory effectively, likely to use all cores. Easy to deploy (single binary)

Written in JS - probably not for me - personal taste and hate of npm ecosystem.

Written in C - probably the best performing, but less robust, no memory safety.

So, “written in” helps in judging whether to care for that project or not to some extent.


At least it's not Rust-this or that-4-Rust. The bundling of the language in the title of the app is a fad I hope I see the end of.

Yes, Python, we get that you wrote WhateverPy in Python.


This, for me, is due to lack of package namespacing in these languages. Someone took 'xyz' ? Well I guess I'm going to be 'xyz-rs'


Rust has its own -rs thing going on.


Mostly for bindings though.


Ah, that would make sense. I’ll have to take your word for it.


For me, I primarily think about stability:

Python/Go/JS/C -> runtime crashes due to type errors and/or data races

Rust -> no runtime crashes due to type errors/data races


Why would go and c which have static typing suffer from type errors ? Why would go that has a data race checker and quite good concurrency primitives be more prone that rust ?

Rust seems to have its merits but I find the parent post more level headed in that it tries to characterize language runtime, admittedly subjectively but not in a rust==good, rest==bad way.


You can easily cause a C/Go program to crash/abort due to runtime type errors. If you've written C, you've miscast something at some point. Go in particular relies extremely heavily on runtime reflection. Both languages have poor type systems.

Go's race detector is useful and we've caught bugs using it, but it's nicer when the compiler prevents you from having those bugs in the first place.

>quite good concurrency primitives

That's gonna be a "yikes" from me, dog: https://www.jtolio.com/2016/03/go-channels-are-bad-and-you-s...

I'm not a Rust fanboy, I simply like languages where it's more difficult to represent invalid state (Haskell, Ocaml, Rust, etc.) than in the mainstream languages I suffer with in my job on a daily basis. Rust happens to have great tooling and is the most likely to make inroads on these issues I care about (runtime stability + code correctness via good type system).


Go doesn’t rely heavily on runtime reflection. Putting it in the same category as C or JS is disingenuous. Rust has a great static type system, but I’ve never seen a Go program fail on a runtime/reflection type error before.


I'm not sure what to say. Go code (stdlib, popular libraries, etc.) uses runtime reflection _everywhere_. I've been using Go since 1.2 and I've seen too many runtime crashes due to type errors to count.


It is used for things like printf and JSON marshaling. I’ve been using Go regularly since 2012 and this isn’t a real problem. The reflection in the standard lib and popular third party libs (and most unpopular third party libs for that matter) is rock solid. Moreover, reflection probably accounts for less than 1% of Go code—not sure where you’re getting “everywhere” from. So I guess I’m calling your bluff.


Even if your claim of "probably ... less than 1% of Go code" is accurate, appearing somewhat less often than 1 out of every 100 lines of code is _everywhere_ to me...


To be clear, I didn't say "less than 1 out of every 100 lines", but in any case I don't consider yours to be a reasonable definition of "everywhere". Especially since such a low frequency doesn't support your claim that "you see runtime type errors too many times to count". Perhaps you're using libraries that are far, far below the ecosystem's average quality?


Gosh, Rust evangelists did a really good job brainwashing the community. Take a look at big Rust projects that do anything useful, like Tokyo. You'll find dozens of unsafe blocks everywhere, so much for static analysis. True, in the most simple cases linear types will slap you on the wrist, but most bugs happen in the complex parts of code anyways.


I'm not part of the Rust "community" and don't participate in any of its fora, I just like a lot of the ideas it's pushing forward (see sibling comment). "unsafe" doesn't dIsAbLe AlL pRoTeCtIoNs like the anti-Rust crowd likes to say. I've written a few small (5,000+ LOC) programs in Rust over the last five years or so and I've only personally had to dip into unsafe once in my own code. I didn't claim that Rust prevents all bugs, I pointed out that Rust programs are more stable in the case of _runtime type errors and data races_ (which they are).


You're talking about a library that is forming the base of multithreaded applications. 99% of other libraries don't have nearly as much unsafe

Sorry, but by and large that's "red goes faster" type of reasoning.


Red is a property of a car totally unrelated to its performance characteristics. The language software is implemented in absolutely has an impact on the performance and behavior of the software.

This is more like "Carbon is faster than aluminum." Yeah, it's a generalization, but it's a useful one.


I don't know what you people get out of pretending you can't derive much from being told what language something is written in.


As someone writing software in Rust myself I am always interested in knowing about projects using Rust for multiple reasons.

1) In the case of libraries (crates), it might be something I can make use of in the future.

2) I can look at how they solved the problem they are solving and compare that with how I'd do it and maybe learn something new that can be useful to me in my future projects.

3) I want Rust to thrive and I want people to be aware of projects using Rust because the more people that are aware of Rust the bigger is the probability that I can work for more companies in the future writing software for them in Rust.


Your 2) is by far the most important to me. Not only does it allow to learn about solutions I could repurpose and their used patterns; it can also give me that last missing piece I was looking for that blocked me from building something.

That said, I'm especially looking for software "written in Rust" because I know the build process is standardized. I may need some dependencies, but I know how the build will work (cargo build -> if at all, all build instructions are in build.rs). I compile all rust-based open projects myself, and I have yet to stumble over a non-binding-specific package that won't compile.


Fast rewind/forward n years, and you could replace "Rust" in your post with the name of the language /du jour/.

My original comment was because as an old guy who saw hype around so many languages come and go, I am getting tired of those projects who try to sell themselves only because they are written in Rust.


I've seen many languages with mix and match syntax differences and feature sets that didn't bring anything significant to the table. Rust is not that. It has memory safety while having memory efficiency, high performance, good package management, easy interface with C FFI, and excellent support for parallelism. The list of things it gets "right" from my point of view dwarfs other languages-du-jour.

The biggest downside IMO is you have to get people past some conceptual hurdles before they can be productive with it. (e.g. the borrow checker) Despite how often you see it come up on HN, it doesn't have a powerful marketing force behind it. It also doesn't fit the trend of turning developers into commodities by being as easy as possible.


The excitement around Java existed for two reasons: many people were new to the ideas in Java, and many people were learning it for the first time. Now, every programmer knows about Java's ideas (with the possible exception of interfaces which aren't in Python) and the number of people learning Java is capped at the number of kids being born. If Rust is successful then eventually every programmer who wants to learn it will already know it, and every programmer will know how to use a borrow checker. The nobody will post "made in rust" stuff on HN, but that will be an indication of Rust's victory.


And those projects and posts will also be interesting, for people building their own projects with similar needs in that language du jour.

If you're just an end user, or the project isn't open source, then sure, language might not matter so much.


Rust is probably here to stay seeing as it was picked up by Firefox. I still think it could have been done better, but perfect is the enemy of good enough.


Rust wasn't just "picked up" by Firefox - it was developed by Mozilla for the very purpose of using it in the "next generation" of Firefox technologies.

(The original Rust project as designed by Graydon Hoare was even quite different from the Rust of today, e.g. the C/C++-like focus, with little ot no runtime, is something that only came about around the Rust 1.0 release.)


Note that Rust intentionally lacks some features that would be useful for Firefox development (e.g. developing for the DOM involves object inheritance which isn’t present in Rust), so it’s not like it’s designed to solve exactly their problems and nobody else’s.


Rust has inheritance and polymorphism.


Rust has trait inheritance, but not object inheritance. Servo uses code generation and macros to get around this. (I’m literally writing an operating system in Rust, I know what I’m talking about.)


Could you elaborate on how trait inheritance is different from object inheritance?


"Trait" inheritance is just interface inheritance, which is indistinguishable from composition+delegation. It doesn't have the pitfalls of actual implementation inheritance, which essentially involves an extra "trick" of dispatching on the actual type of your object even when calling base-class methods. It's not that Rust cannot do this - heck, people do it all the time in C. But it has to be done by writing things out explicitly, it's not automatic in any sense. And for good reason - letting base-class code access methods that have been redefined in a derived class can create all sorts of pitfalls when you're not very careful about what that implies.


So was said about Java 20 years ago... Yet no one dares today post a "X in Java" anymore on HN.


And was probably said about Fortran and C. Our tools are getting better. I can only hope Rust becomes so widely used that I can recommend big companies use it since they'll have a trillion developer hiring pool.


"In Rust" signals something very important to me, especially with the context of "full text search".

ES is the current free text search engine out there, and it's famously painful to manage. Resource consumption and GC pain can be really significant.

I see 'rust' and I know immediately that at least some pains I've experienced will be eliminated.


I like knowing what language something is coded in. It makes me more likely to look into the project. If it's written in something I'm not interested in I may click through, but not be as thorough, and some languages I save the link for later because I have no interest in them professionally or on my time off. I like looking at all projects eventually because some people come up with amazing pieces of software in all types of languages, but others might not care to look at a Ruby, PHP, NodeJS, Python, C, C++, Rust etc project.


So per your rationale, why not listing in the title other pertinent information about the project?

I am saying that because the programming language is not what defines a project. It could be a pile of junk even if it written in the greatest language ever made.


Sometimes people do, they say "Django" instead of "in Python using Django" which I think is fine. People who know Python will take the hint.


It can be a pile of junk because of the language it is written in.


> It can be a pile of junk because of the language it is written in.

So many wonderful things were written in assembly or PHP (assuming you rate PHP and assembly on the other end of the spectrum of awesomeness.)


The ratio of garbage to awesome is relevant not individual examples


I see this just as often for Python, C, JavaScript, etc.

The patters is more like "X written in Y", which is totally fine imo.


Rust is an interesting language both for its technical characteristics, which is a direct appeal as other commenters have noted, but it can also be worth noting because Rust interoperates almost as well as C. If I announce a cool Python module, someone who primarily uses Ruby is probably going to ignore it because the level of effort to use it would be more than it's worth. If I announce a cool Rust module, they might think “you know, it's pretty easy to build a wrapper…”.


Rust is young enough that you can read this as an ad for the language, not the project. "Rust is a language in which people write full-text search and indexing"


I think it's relevant for open source projects because people might want to contribute to them or just read through the code to see how things work. And of course people will be more interested in doing those things with languages they have experience using.

The same developer also happens to have written a similar server in Python a while back: https://github.com/mosuka/cockatrice


What codetrotter and ccccc0 said, also Rust as a language and a community has a strong focus on correctness, which makes me more interested in actually using the project.

In general I think the title of a project on a news aggregator should basically be a 80 character sales pitch, " in rust" is 8 characters that signal a lot more than most 8 characters could (to me).


Tell the mods that. I've seen mupltiple titles where they edited the original title to include "in X".


was about to comment the same before seeing your comment.


This.


FYI: this is just a PoC and is very early in the stage :)


Similar space, Vespa from Yahoo: https://vespa.ai/


Vespa is great, but is much harder to use than Elasticsearch. It's also very much geared towards ranking and not filtering.

For example, you can't do exact string matches (!). All string matches are case insensitive. You also cannot index nested fields (e.g. a map or array of maps) for search. In the end, you have to munge your data considerably to make it fit Vespa's data model.

It also feels odd and antiquated in many ways, with XML configurations all over the place.

But it's fast!


1 commit.

Is this an opening of a mature project that has been coded in private somewhere? Is this just a code drop on the community?

Note: this comes from a developer in Japan. Tantivy's main developer is also based in Japan. @fulmicoton, is there any interaction between the projects?


Not all projects are birthed in public. It may have been extracted from a larger private project which may have issues sharing its exact history of development.


The creator of Bayard is apparently the co-created of Solr.

Bayard looks like a search-in-rust PoC.


I met the author of Bayard a couple of times and had beers with him. Does that count as interactions?


Neat. :)


I was looking at Raspberry Pi projects for Rust. There were similar complaints on the Pi forum. Looks I’ll be using Python for my projects.


Lots of downvotes but no replies. Rust is the language de jour on HN at the moment. Lots of comments about it being a young language so you’ll just have to wait.

I invite a healthy debate.


If you are looking for a platform for the RPi, check out the nerves project.


Because...


Libraries weren’t maintained or in a state of development for months without regular commits. Statements like “sometimes this doesn’t work” or commits that totally redesigned an api. Meaning you have to rewrite your code.

I need stability and some maturity in personal projects due to time constraints. Not a jab at Rust, I’m still in wait and watch mode. Looking to learn a low level language in the next year and Rust is my top choice.


I'm looking for an easy to use typeahead/autocomplete search solution. javascript lib for frontend paired with easy to manage, lightweight server. something modern.

The dataset isn't huge. e.g. 1 million strings of no more than 512 utf-8 chars each and not reindexed more than once a day or week. clusters, sharding etc unnecessary.

I keep hoping to stumble on a fully baked solution...any ideas?


Would this meet your needs?

https://github.com/valeriansaliou/sonic


Interesting. Since the underlying engine(Tantivy) is faster than lucene - at least in their benchmarks - it should be faster that solr. Seems like the author is exploring a faster alternative to solr. I never got around to explore elasticsearch since our solr instances are running so smoothly.


Awesome to see more competitors for elasticsearch. Added to the list: https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...


Speaking of, has anyone tried working with Rucene [1], the Lucene port to Rust?

[1] https://github.com/zhihu/rucene


Raft storage in-memory only. Not exactly safe replication.


Is the query langage less oscure than the query langage of ElasticSearch?


Named after Bayard Rustin??


No.


The proof that rust is a meme language is evidenced by the need to include "written in rust" every time a rust project is mentioned.


That makes no sense.

Devs write "written in rust" because:

- it's interesting to other developers (which are HN's main audience) to see.

They don't sell some shrink wrapped software, where the language doesns't matter. Nor some already established package you just download and use as is like Postgres or Bash, or whatever.

- it matters for those looking for compatible stuff for their own projects (for libraries, reusable packages, etc.)

- it offers certain guarantees other languages do no (e.g. memory safety, native binaries) which can be an important criterium for those looking for a project

- it's important for possible collaborators to know the language (the project being Open Source and everything).

- in a field where a Java based project (Lucene/Elastic Search) dominates, it is important to advertise that you offer a non-Java alternative for people who want to avoid Java/Oracle/etc.

- Rust is also currently on the rise (!= meme), and thus gets new programmers, and new greenfield projects. And since those people are trying the language, they want to advertise their involvement to the community, talk about how they found the experience, etc.


> it offers certain guarantees...

Does it really, though? Unsafe rust exists, and while the language is certainly built to strongly encourage certain safer programming practices, I don't really see it as offering any guarantees at all. If a project is open source I can go and investigate for myself, but who has the time for that?

The guarantees that safe rust provides are very good for me as a developer, because it kills a large class of potential errors and will therefore theoretically make the dev process easier. But I don't really feel any trust in these 'guarantees' when I switch roles to a user of someone else's libraries or products.

Most of the rest of your points I agree with, but I also agree with the original comment that it starts to make me feel just a little bit eye-rolly every time I see "...written in Rust." (And I do like the language.)


everything needs unsafe code to run, since it has to interact with the OS/CPU/outside world. Rust's main contribution is that provides ways for you to clearly section off and declare that code is "unsafe" and needs extra examination to uphold its invariants.

as an example, std::vec::Vec is implemented with quite a bit of unsafe code, but all Rust consumers can be confident that it is vetted and the abstraction presented around it is safe.

of course, this isn't a perfect solution, but it's much better than e.g. C/C++, where you basically treat every line as "unsafe".


How do you define a "meme language" and why does that follow for rust?


"meme language". lol thats hilarious.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: