1. Toshi - alternative to Elasticsearch
2. Sonic - alternative to Elasticsearch
3. Vector - alternative to Logstash
4. native_spark - alternative to Apache Spark
aka you can't point your Kibana instance to these "nodes" and have it speak the Elasticsearch API
I've also been working on a light, fast, typo-tolerant search engine: https://github.com/typesense/typesense
It's been around for a couple of years now, and have a few happy customers who have had great success in replacing $X0,000/year popular hosted search with Typesense!
It has built-in fulltext search: https://www.postgresql.org/docs/12/textsearch.html
I'm curious though at what point something like this or ES itself would make sense for primarily text search. Is speed the biggest thing, or is it more flexibility to tweak and get better search results?
Postgres full text search is fast and easy, but it doesn't seem much more scaleable/reliable/resilient than a clustered search solution that is shaped to the problem at hand.
Perhaps. But the author Minoru Osuka ain't nobody. He is
- Engineer at Mercari, Inc.
- Committer at Apache Software Foundation
- Co-author of a Apache Solr book in Japanese
- Ex-Yahoo! JAPAN
So yeah, I think he knows what he's doing.
And yes, Bayard Rustin is absolutely one of the great heroes of the 20th Century.
Why not simply announcing "X" in the title?
Written in Python - easy to understand, but lacks performance. Probably cant use more than 1 CPU core. Needs a lot of memory.
Written in Go - fast enough for most cases, all CPU cores, but possibly high mem usage becUse of GC. I need to plan for it.
Written in Rust - possibly new and maturing, uses memory effectively, likely to use all cores. Easy to deploy (single binary)
Written in JS - probably not for me - personal taste and hate of npm ecosystem.
Written in C - probably the best performing, but less robust, no memory safety.
So, “written in” helps in judging whether to care for that project or not to some extent.
Yes, Python, we get that you wrote WhateverPy in Python.
Python/Go/JS/C -> runtime crashes due to type errors and/or data races
Rust -> no runtime crashes due to type errors/data races
Rust seems to have its merits but I find the parent post more level headed in that it tries to characterize language runtime, admittedly subjectively but not in a rust==good, rest==bad way.
Go's race detector is useful and we've caught bugs using it, but it's nicer when the compiler prevents you from having those bugs in the first place.
>quite good concurrency primitives
That's gonna be a "yikes" from me, dog: https://www.jtolio.com/2016/03/go-channels-are-bad-and-you-s...
I'm not a Rust fanboy, I simply like languages where it's more difficult to represent invalid state (Haskell, Ocaml, Rust, etc.) than in the mainstream languages I suffer with in my job on a daily basis. Rust happens to have great tooling and is the most likely to make inroads on these issues I care about (runtime stability + code correctness via good type system).
This is more like "Carbon is faster than aluminum." Yeah, it's a generalization, but it's a useful one.
1) In the case of libraries (crates), it might be something I can make use of in the future.
2) I can look at how they solved the problem they are solving and compare that with how I'd do it and maybe learn something new that can be useful to me in my future projects.
3) I want Rust to thrive and I want people to be aware of projects using Rust because the more people that are aware of Rust the bigger is the probability that I can work for more companies in the future writing software for them in Rust.
That said, I'm especially looking for software "written in Rust" because I know the build process is standardized. I may need some dependencies, but I know how the build will work (cargo build -> if at all, all build instructions are in build.rs). I compile all rust-based open projects myself, and I have yet to stumble over a non-binding-specific package that won't compile.
My original comment was because as an old guy who saw hype around so many languages come and go, I am getting tired of those projects who try to sell themselves only because they are written in Rust.
The biggest downside IMO is you have to get people past some conceptual hurdles before they can be productive with it. (e.g. the borrow checker) Despite how often you see it come up on HN, it doesn't have a powerful marketing force behind it. It also doesn't fit the trend of turning developers into commodities by being as easy as possible.
If you're just an end user, or the project isn't open source, then sure, language might not matter so much.
(The original Rust project as designed by Graydon Hoare was even quite different from the Rust of today, e.g. the C/C++-like focus, with little ot no runtime, is something that only came about around the Rust 1.0 release.)
ES is the current free text search engine out there, and it's famously painful to manage. Resource consumption and GC pain can be really significant.
I see 'rust' and I know immediately that at least some pains I've experienced will be eliminated.
I am saying that because the programming language is not what defines a project. It could be a pile of junk even if it written in the greatest language ever made.
So many wonderful things were written in assembly or PHP (assuming you rate PHP and assembly on the other end of the spectrum of awesomeness.)
The patters is more like "X written in Y", which is totally fine imo.
The same developer also happens to have written a similar server in Python a while back: https://github.com/mosuka/cockatrice
In general I think the title of a project on a news aggregator should basically be a 80 character sales pitch, " in rust" is 8 characters that signal a lot more than most 8 characters could (to me).
For example, you can't do exact string matches (!). All string matches are case insensitive. You also cannot index nested fields (e.g. a map or array of maps) for search. In the end, you have to munge your data considerably to make it fit Vespa's data model.
It also feels odd and antiquated in many ways, with XML configurations all over the place.
But it's fast!
Is this an opening of a mature project that has been coded in private somewhere? Is this just a code drop on the community?
Note: this comes from a developer in Japan. Tantivy's main developer is also based in Japan. @fulmicoton, is there any interaction between the projects?
Bayard looks like a search-in-rust PoC.
I invite a healthy debate.
I need stability and some maturity in personal projects due to time constraints. Not a jab at Rust, I’m still in wait and watch mode. Looking to learn a low level language in the next year and Rust is my top choice.
The dataset isn't huge. e.g. 1 million strings of no more than 512 utf-8 chars each and not reindexed more than once a day or week. clusters, sharding etc unnecessary.
I keep hoping to stumble on a fully baked solution...any ideas?
Devs write "written in rust" because:
- it's interesting to other developers (which are HN's main audience) to see.
They don't sell some shrink wrapped software, where the language doesns't matter. Nor some already established package you just download and use as is like Postgres or Bash, or whatever.
- it matters for those looking for compatible stuff for their own projects (for libraries, reusable packages, etc.)
- it offers certain guarantees other languages do no (e.g. memory safety, native binaries) which can be an important criterium for those looking for a project
- it's important for possible collaborators to know the language (the project being Open Source and everything).
- in a field where a Java based project (Lucene/Elastic Search) dominates, it is important to advertise that you offer a non-Java alternative for people who want to avoid Java/Oracle/etc.
- Rust is also currently on the rise (!= meme), and thus gets new programmers, and new greenfield projects. And since those people are trying the language, they want to advertise their involvement to the community, talk about how they found the experience, etc.
Does it really, though? Unsafe rust exists, and while the language is certainly built to strongly encourage certain safer programming practices, I don't really see it as offering any guarantees at all. If a project is open source I can go and investigate for myself, but who has the time for that?
The guarantees that safe rust provides are very good for me as a developer, because it kills a large class of potential errors and will therefore theoretically make the dev process easier. But I don't really feel any trust in these 'guarantees' when I switch roles to a user of someone else's libraries or products.
Most of the rest of your points I agree with, but I also agree with the original comment that it starts to make me feel just a little bit eye-rolly every time I see "...written in Rust." (And I do like the language.)
as an example, std::vec::Vec is implemented with quite a bit of unsafe code, but all Rust consumers can be confident that it is vetted and the abstraction presented around it is safe.
of course, this isn't a perfect solution, but it's much better than e.g. C/C++, where you basically treat every line as "unsafe".