Its like the SQLite of search. Single library that provides the basic set of features you would expect in a quality search experience: facets, ranked search, boolean operators, stemming etc etc.
If you’re using the JVM already and want the SQLLite of search, you can instantiate an index in your own JVM, load documents, and get compatible query features.
You can also use Lucene directly, which is simpler, but does have some nuances that differ if you want perfect query compatibility between a deployed ElasticSeach cluster and an in-memory search.
Elastic the company “supports embedded” the same way it “supports self hosted”. It’s left as an exercise to the user.
I was previously looking into embedding TypeSense somehow (but not portable), MeiliSearch (but their API triggered a panic response), or SQLite's full-text search extension (but limited features e.g. language support). Using a dedicated library would be much better and this seems to be packed with features.
Could you expand on that with examples? I only played around with it lightly but it seemed ok via the Python SDK.
I've been recently leaning towards Meilisearch for some client facing search features. This was over Typesense because, i believe Typesense is totally in-memory, but in some cases my dataset might not fit in RAM(atleast not at a reasonable cost). I was fine with taking a performance hit for disk access, provided i could have thousands of separate indexes with separate permissions etc. and directly call them from the browser app.
* If you have more than 65,535 words in a document, words after that limit will be SILENTLY IGNORED (not indexed). This was 1,000 words when I tried it, which was very limiting
* Documents where your query word happens sooner are ranked above documents where your query words happen later (in the document string). You can't turn that off
* You can't get the "match" information (i.e. the position where your query matched in the document) unless you retrieve that attribute (i.e. tell the database to send you the entire document content)
* The API uses HTTP verbs wrong, e.g. POST is used to "Add or replace documents" and PUT is used to update documents (it's exactly backwards from the intended use, per RFC 7231)
This discussed more at
To be clear, I don't hate the GPL license, although I think it is probably not the best choice for a library. But from a licensing perspective, xapian is pretty different from SQLite
So there shouldn't be any compatibility issues in practice.
Typesense holds the entire index in memory, in order to enable fast search where every keystroke returns results in 50-150ms. So it’s built primarily to enable user-facing search.
It’s not a good fit for log search since putting your entire log dataset in memory can become expensive depending on the size of your logs, and you typically don’t need search-as-you-type for log search.
Zinc on the other hand seems to be designed specifically for log search.
Alternative doesn't always mean they are API compatible, it's an alternative and not a drop-in replacement. My question was about how comparable it is apart from the API compatibility.
Is this really the main thing ElasticSearch is used for now? In 2016, I perhaps naively chose ElasticSearch (via AWS's managed service) for an actual user-facing search feature. That service is still running, and now I'm looking to move it to self-hosted, but didn't want to actually run ElasticSearch (or more likely now, OpenSearch). So I looked at the ZincSearch home page, and the quoted heading seems odd.
As for hardware cost needed to run Elasticsearch/Opensearch, it scales down as well as it scales up these days. I have a couple of ~70$/month clusters with millions of documents in each. Not a problem that I need to fix. I can scale them up to cost a lot more and handle billions of documents easily. That's in Elastic Cloud, which would be the expensive option. If you go bare metal, you can get way more bang for your buck obviously. A couple of Xeon machines with some decent amount of CPU/Memory/SSD go a long way.
As for complexity, they make it pretty easy to get started. What you do beyond that is kind of a luxury problem. You get all of these features out of the box but that doesn't mean you have to use them. But it's nice to have the option when you do need them as opposed to running into a brick wall because you are missing key features in these so-called drop in replacements. Depends on your needs. If you don't know the first thing about your search needs, keep it simple and go with mainstream solutions. Sign up with some managed opensearch/elasticsearch provider, get your cluster ready in a few minutes and start throwing some data at it. It's that simple. You can get it running in docker as well if you need something to develop against.
It seems a bit of a rite of passage to be implementing your own search engine these days without being hindered too much by not being completely up to speed with the field of information retrieval and all the wonderful algorithms and sollutions. Guilty of that myself actually; I built a tf/idf implementation at some point. It's not that hard. From there to something that gets close to Elasticsearch/Opensearch/Solr/etc. takes a lot of work and skills, however. There are a few nice solutions out there but also lots that aren't so hot.
I've not used Zincsearch so I'll reserve my judgement. However, their main pitch seems to be that they are cheaper than a thing that is free and partially compatible with it (for ingesting data only) but a bit lightweight (not so scalable, way less features). So maybe pay the difference of 0$ and get something without the limitations that come with being "lightweight"? Just saying. Not a great pitch if you have to name your main competitor in it. Kind of emphasizes that you are trying to catch up rather than disrupt. Does it have any actual things it does better? That should normally be the center piece of a pitch.
Kibana was killer, though.
I personally wouldn't recommend self-hosti gElasticSearch for logs; it's such a commodity now that it cheaper to get something like Datadog or another manage service (but do check the pricing,the last time I checked the managed logging service from Azure was very expensive for what it did).
I work at Meilisearch so maybe a biased answer.
We do support other languages than English, it actually depends on what are the languages supported by our tokenizer https://github.com/meilisearch/charabia, so any language that uses whitespace to separate words(including English), Chinese, Japanese, Hebrew and Thai (Japanese and Thai might work a little bit less)
Such a comment on HN can be very damaging, as I generally trust the posters here more than in other places. I know that I'll have very little time to properly assess technologies so I'd like to shorten the short-list as much as possible. Again, thank you.
So contains more useful information than saying "easy to deploy".
in fact, the type of project that sees something that works exceptionally well (like ES) and then decides to partially reimplement it (to the point of maintaining API compatibility) is a pretty big red flag that it is NOT going to be easier to use in the long run.
Also... One of the things I didn't like about ES was the JSON DSL. It would be nice to create a friendlier query language, maybe something like SQL to query data. I find ES queries written to be ugly to look at, and not intuitive.
It's not open source but old skool '90s-style freeware. The current single-node version will remain free forever, but the next multi-node (distributed) version will be for-pay.
Any specific issues you are referring to? At work, we've been using ES heavily for many years and it isn't really something taking up time apart from updating the software every once in a while.
If they can provide an stable drop in replacement for ES, this would really be game changer and offer massive value.
(But I guess a lot of the bloat with ES is also the 3+ node cluster which includes lots of redudancy, and resulting redundant use of RAM as well?)
I would suspect there to be some room for improvement.
(edited to add vector search category and add manticore)
When looking at OSS search projects, it's nice to have a rough ideas of the different use cases, here is the list I have in mind:
- user facing search (think algolia):
elasticsearch/opensearch, meilisearch, typesense
- enterprise search: elasticsearch/opensearch
- log search: elasticsearch/opensearch, loki, quickwit, zincsearch
- vector search: Milvus, Qdrant, Vald Weaviate
Vespa.ai is great but I don't really know in which category to put them, user facing and enterprise search seems to work well with them.
It's interesting to note that Elasticsearch and Opensearch are general purpose search engine, Solr as well. They are all powered by Lucene, the popular and performant search engine library.
Another general purpose search engine is Manticore search.
I would love to see some benchmarks by category :)
Note: I don't know well ZincSearch and put it in the log search as said on their front page.
Another search engine which can be considered general, is not based on Lucene and is not less powerful than Elasticsearch/Solr is Manticore Search 
> I would love to see some benchmarks by category :)
I'd love too. We started this work on db-benchmarks  , hopefully we'll have resources to continue it. Contributions are very welcome. It's 100% opensource 
We're also actively working on adding Vector search to Typesense: https://news.ycombinator.com/item?id=32851894
Vector search infrastructure is overall still fairly new (https://milvus.io/docs/text_search_engine.md), but up and coming for sure. Disclaimer: I'm a part of the Milvus community.
> Can ZincSearch be deployed in HA(Highly Available / Cluster) mode? Currently, No. We are working towards making ZincSearch Highly Available.
Dealbreakers, for now.
Zinc Search engine. A lightweight alternative to Elasticsearch written in Go - https://news.ycombinator.com/item?id=29434097 - Dec 2021 (64 comments)
It sort of annoys me to see people reimplementing storage engines over and over. If it’s embedded use rocks/leveldb/SQLite and it’s distributed use foundationDB. The end.