Hacker News new | past | comments | ask | show | jobs | submit login
Manticore Search: Elasticsearch Alternative (github.com/manticoresoftware)
118 points by wener on July 29, 2022 | hide | past | favorite | 69 comments



Whenever I see quotes like “182x faster than MySQL! 29x faster than Elasticsearch!” as the intro to a project, I’m immediately sceptical of the quality of the entire project.


Me too! That's why it's very important when you say things like that to make sure you provide enough evidences. In this case there is https://db-benchmarks.com/ where you can see all the details and it's 100% opensource - https://github.com/db-benchmarks/db-benchmarks including the UI https://github.com/db-benchmarks/ui . The results themselves are opensource too - https://github.com/db-benchmarks/db-benchmarks/tree/main/res...

So anyone can reproduce the results or at least look carefully into them, understand the testing methodology etc.


Comparing full-text search engines on queries that aren't full-text search are of course slow.

It would be interesting to see queries benchmarked for their intended workload (logs/sorting/full-text etc).

Fastest Avg: https://db-benchmarks.com/?cache=fast_avg&engines=elasticsea...

Slowest: https://db-benchmarks.com/?cache=slowest&engines=elasticsear...


Elasticsearch is used not only for full-text search. It's widely used for analytics (aggregations) and filtering too, e.g. when you do log analytics.

Comparing Elasticsearch / Manticore with MySQL may be not the fairest thing since they are too different, but comparing them one with another and using not only full-text queries seems fine to me.


It all seems too good to be true, so I'd like understand more about the limitations.


Well for one it's written in c++, which means it is more likely to have memory safety bugs, which could potentially be security vulnerabilities.


While generally true, I would argue that for the use cases where full-text search is mostly used (e.g. either search through a public database, or, quite the opposite, an internal system that does search through logs collected from various sources), in practice security vulnerabilities are less of a concern because usually even if you can expose some data stored in the full text index using that vulnerability, it would still only expose data you could already find in that search engine and that's already accessible to you :).


That might be true in some cases.

But for the public data case, you probably still need to worry about DoS or data corruption.

In the logs case, a malicious actor can probably control at least part of the logs, so if a bug leads to arbitrary code execution, a bad actor could possibly get all kinds of valuable data.

Also, just to be clear, the language doesn't necessarily mean there are significant security bugs. A well written c++ app is probably better than a poorly written java app. It's just harder to avoid memory bugs in c++ than java.


I'd say (fulltext) search is one of the least interesting features of ES, and the aggregations are its USP. E.g. moving averages on (biggish) datasets can be calculated on very cheap hardware.


If you're not interested in fast full text search, then you're wasting a ton of resources on a solution that can be served way easier on more specially tuned analytics databases. The entire storage and retrieval methodology of these engines are based largely to do lucene style searches at extreme speeds.


What would be a good self-hostable solution to replace ES aggregations? I'm also quite fond of Clickhouse which is a lot faster yet, but the sheer number of products which have popped up in the last decade always makes me wonder if there's still faster solutions out there.


We're using a lot of clickhouse but there's influxdb (they apparently have a SAAS version now too), but you can look at druid or even the Hadoop family of products. The latter two are probably more in line with building entire analytics workflows instead of just a storage tier. If you're looking for more workflow managed solution, you can also look into apache flink which has a lot of similar uses


Thank you. I think I'll try to get the most out of Clickhouse, also because of the extremely ease of maintaining it and keeping it running.


I'm not even entirely sure what such a comparison means for a DBMS to be faster than another.

In practice, claims boil down to this query is 182x faster than that query in MySQL.

Using the same type of comparison, you can come to the conclusion that MySQL is 10x faster than MySQL because one query was constructed one way and the other another. (Applying induction, we can thus prove that MySQL is infinitely fast ;-)


Interesting but - as usual - benchmarks are a bit twisted: Manticore knows how to optimize its products, and compare to "out of box" ElasticSeach. However, ES is a really complex stack that REQUIRES a lot of tuning to get decent performances... So I guess that a "real life" benchmark, with every product tuned by experts, would be more significant

Manticore is full C++ so it surely is better than a full stack java ES for memory (and maybe performance) on single server... but ES has a proven history of horizontal scaling (throw more servers to scale). And I didn't see any architectured benchmark. Real ES use case are often based on multiple servers, specialized nodes...

So, all in all... I would be interested in more "real life" use cases before making my mine


> Manticore knows how to optimize its products ... compare to "out of box" ElasticSearch

I don't like those kind of benchmarks too: you tune your product to maximum and don't touch your competitor at all.

That's why the mentioned benchmarks imply NO TUNING or very light one like enabling mlock in Elasicsearch / enabling secondary indexes in Manticore. Otherwise it works as it's shipped. If you read the articles here https://db-benchmarks.com/posts/ you'll understand that the authors really wanted to do fair benchmarks and to use as little tuning as possible.

> ES has a proven history of horizontal scaling (throw more servers to scale). And I didn't see any architectured benchmark.

This is true. ES is better in terms of this. Manticore has replication and distribute indexes, but doesn't have automatic sharding and shards orchestration. This is a work in progress.


This makes it a toy comparatively, at least at scale. Once they work their way through that though I'm sure it will be faster. Just depends on how long it takes to get there.


How far off is automatic sharding you think?


Hopefully about half a year.


As a former Sphinx user: awesome, looking forward to trying this out.

Elastic just gives me grief all the time, sphinx always just worked perfectly for me. Perhaps ES can do more, but I've never found sphinx lacking for the usecase I run into.

I'm seeing a lot push abck about the simplistic benchmarks, and sure, they are super meaningful perhaps. But they do give you a good idea of where the strengths of this might be. Benchmarks are pretty meaningless anyway imo.

Also, I am just very happy to see there is ANY alternative engine to Lucene. A bit of competing ideas is very healthy for the ecosystem. Lucene has been pretty much the only solution for years and I don't think that is sustainable.


I think LogQL[1] is a pretty good alternative to Lucene search syntax. It feels more natural and human friendly for searching logs.

[1]: https://grafana.com/docs/loki/latest/logql/


I have worked for many years with Elasticsearch at scale, good thing I am not working with it anymore. The main issue with Elasticsearch was to upgrade versions and maintain the cluster. The cluster is a live creature that can get mad at your data and start hogging the CPU at any time, not to mention GC stops. We used some geo operations on Elasticsearch and from a minor version to another they broke it. What a hell on earth, I hope OPs alternative will bring some relief for those in pain or avoid the pain for those that will eventually need a FTS engine.


What are you using instead now?


I changed jobs, I have not needed a similar technology for the moment.


It's worth noting that Manticore search isn't a new development: it's a fork Sphinx, a pretty popular search engine in 2010s written by Andrei Aksenov in C++. It got superceded by ElasticSearch because the original engine never fully emraced multicore processing and working as a cluster, but I personally used Sphinx and it was very simple and very robust, a stark contrast to Elasticsearch, especially for tasks where you do need to ingest a lot of data like log collection.



Far more important to me than the speed of the searches is the quality of them. Are there any benchmarks that attempt to test this? I'm not even sure how to determine what the criteria would be, but there must be a few well-definable domains that could be used.


Here's a pull request to BEIR to compare Manticore with Elasticsearch in terms of relevance https://github.com/beir-cellar/beir/pull/92. In this test Manticore provides better relevance than Elasticsearch in average. In general I would say for most users the results quality in terms of full-text relevance is about the same in Elasticsearch and Manticore.


Seems like it is focused so much on how fast it responds to queries that it forgot that it is extremely important on how good it's responses are. Lucene's main benefit is the huge library of token filters to improve the understanding of the text in index. Manticore only has an ability to plug your own single filter in. And while that doesn't make it any less possible to make search results good, it's definitely a lot harder to do. Producing good results is the main goal of search, speed is only necessary for it to not be too slow. A well working search will offset any extra infrastructure that it might need to run fast.


> Manticore only has an ability to plug your own single filter in

It's true that Lucene has more token filters and is perhaps more flexible. It's partly because Manticore's another focus is simplicity and ease of use, so there are: * just "charset_table=non_cjk" which is a default and should work for most languages (it already does case folding and accents folding) * on top of that you can apply one or multiple morphologies: stemmers (available for many languages via libstemmer library and some stemmers are built-in), lemmatizers (available for English, German, Ukrainian and Russian languages) * "charset_table=cjk" + "morphology=icu_chinese" to segment Chinese text * you can combine that all if you wish * built-in stopwords for most languages

and of course prefix search, infix search etc.

But what we really want is that the default settings work fine in most cases.



From benchmarks it seems that columnar storage shines when there isn't enough RAM avaliable. Lower cost full text search seems quite good value proposition.

And row-wise storage for lowest latency when cost is not a problem.

Also, I took a quick look at ranking. With things like BM25 and geo-spatial search it seems to be flexible and feature rich enough. I think it's time I take a serious look at Manticore.


If you already use Postgres consider PH full text search first.


Why?


Less moving parts means you can concentrate on delivering impact. Also less moving parts means you can have deeper understanding of those parts, not only stackoverflow-copy-paste-understanding.


Plus, it's actually pretty good!


I switched from Lucene to Postgres for a wiki fulltext search, and I'm thrilled with it. Insanely fast, and great ranking, for my use case.


Are there any comparisons on how quickly this indexes documents compared to elasticsearch or lucene? I work with some static snapshots and we kind of need those to have a short lead time which is indexing them.


Yes. Please read this section in the blog post https://manticoresearch.com/blog/manticore-alternative-to-el...


So in this example both can fulltext search description?


Yes. "name" and "description"


I would totally use this for really small hobbyist level stuff if it had something that "plugged" right into it as "pretty" and "feature full" and "easy to use" as Kibana.

From what I could tell clicking through its links on GitHub... it has no UI offering? It's just an API?

I guess technically that does compare 1:1 with Elasticsearch but...

ELK is what most people think of when they think of Elasticearch, right? aka... full blown stack complete with UI

This just gets the backend piece done from what I can tell. Any frontends that plug into it?


Manticore is going to be a drop-in replacement for Elasticsearch in the ELK stack. The demo of Kibana visualizing Manticore can be seen here https://manticoresearch.com/blog/manticore-alternative-to-el... . This functionality is available in beta stage.


Looks promising, keen to try it out.

Related question though: why do so many FTS offerings in the space also try and offer column/analytics features? In my experience, I've already got a database for that, I'd really just love to see a FTS engine that _just_ searches text and filters responses, as fast and accurately as possible. Leave the analytics to my column database, just search the text as fast as possible.


It's a very good question. The answer is that when people do data analytics they often also need full-text search, otherwise it may just be too slow to scan through all your terabytes of strings using LIKE or REGEXP. You need some indexes to make it faster. Indexes for strings is what's partly full-text search is.

That's why Clickhouse can be integrated with Quickwit, for example.

Manticore aims to be a good full-text search engine + can process large data volume that doesn't fit into RAM with help of the columnar storage.


Has anyone used it in production? Any informed opinions?


Yes. I use it as a replacement for sphinx when it went closed. Very impressed with the documentation, the support on slack. It’s much faster than the last version of sphinx and more stable for me. This is searching over 300 million documents about 1TB in size.

Is it a replacement for elasticsearch? Maybe. Depends on your search requirements. I prefer elastic when you have power users needing to search over specific fields, and all of the syntax that it gives you. For a plain text box you want people to type terms into though manticore is much easier to configure.

Scaling is not a good as elastic currently but you don’t need it as soon due to its lower footprint.

That said I’m about to replace it with my own custom index, but that’s not because manticore is a problem. Just that I can do better for my specific use case. I’m abusing manticore in ways it’s not happy about. For other projects I’d happily use it again.


We use it in Freepik, Flaticon and Slidesgo. It has a good performance and I find the source code easy to read / understand. It lacks some features from Elasticsearch, like CJK tokenizers, but we were able to work around that.

Very stable, fast, and easy to connect to mysql.


I use it at work. Sphinx search has been powering our text search since the inception in 2008 and was scaling greatly. In 2021 we've moved from to Manticore without any hiccups.

It's now handling index over 280M+ documents and has been drama-free - just works.


It's used by Craigslist as said on the site https://manticoresearch.com/clients/


I never trust clients listed on a site unless I've personally spoken to the client. You never know if it's "we use it for all our traffic" or "we use it for one tiny feature that's barely used and was made as a hackathon project a couple years back."


We've been happy users of it for years. :-)


They released an official C# client few weeks ago. Nice.

https://github.com/manticoresoftware/manticoresearch-net


Also, Sonic: https://github.com/valeriansaliou/sonic (written in Rust)


Only support full-word hits, no partial and certainly no NLP engine.


but it is written in rust!! OMG RUST!!!111! It must clearly be superior to any other software in existance. RUUUUST!


Speed is important, but so is quality of results, especially when it comes to search. You can make something search blazingly fast, if it doesn't need to be accurate.


True! Here's a pull request to BEIR to compare Manticore with Elasticsearch in terms of relevance https://github.com/beir-cellar/beir/pull/92. Spoiler: in this test Manticore provides better relevance than Elasticsearch in average. Of course you can tune both further and Elasticsearch now has KNN which when combined with BM25 can give even better relevance. In general I would say for most users the results quality in terms of full-text relevance is about the same in Elasticseach and Manticore.


I was falling in love... Until I discovered authentication is not yet supported!!

It's a critical feature for adoption, please include it in your roadmap :)


Thank you for the feedback. It's currently possible if you put Manticore behind an http proxy, e.g. like this https://github.com/db-benchmarks/ui/blob/main/docker-compose...

But I agree implementation of a proper auth should be prioritized. I'll discuss it with the team.


Yes it's a solution for the http api but I was interested in the MySQL interface. Thanks for pushing it to the team :)


I’ve been very happy with MeiliSearch. It powers internal search services for customer support at my ecom site.


Meilisearch is nice for tiny indexes, but it’s extremely slow at indexing and it’s not really comparable to elastic or manticore when it comes to larger datasets and requirements beyond the most basic use cases.


We're not, because index has to fit in RAM and if it doesn't it slows to a crawl or even fails.


If you are interested in how Manticore (and Elasticsearch) compares with Meilisearch in terms of full-text search and RAM consumption here's my take https://www.reddit.com/r/selfhosted/comments/w89tgh/comment/...


Good comparision. MeiliSearch's BucketSort and RAM consumption are the reasons I decided not to use MeiliSearch after testing it.

I just couldn't tweak ranking with BucketSort to get a desired relevancy. And memory usage was extremely high.

I didn't bother testing Manticore since there was no good open source client for C#, but seeing they released an official client I might test it to see how it compares to Solr.


Solr needs better documentation and community, but it's crazy fast, stable, and works well for what I do with search. It could also stand to have an improved UI to enable a better getting started experience and an easy means to index webpages out of the gate.

If anyone were to search for how to index a webpage with it, they'd get frustrated pretty quickly. I started a new version of Grub to address this, but I've found my motivation for working on it lacking: https://github.com/kordless/grub-2.0

A friend and I assembled a module for that repo called Aperture, which implements a scalable screenshot crawler that sends document into Solr. If anyone is interested, hit me up and I'll provide some instructions for using it!


Awesome answer, we just installed Manticore on our dev env, we where using Meili and somehow didn't find manticore during our research.


Sweet. Glad it's not running on the JVM (Looking at you ELK.).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: