Hacker News new | past | comments | ask | show | jobs | submit login
ZincSearch – lightweight alternative to Elasticsearch written in Go (zincsearch.com)
189 points by drakerossman on Sept 22, 2022 | hide | past | favorite | 82 comments

Tangentially related if you need search without the clustering and high availability story of elastic search and friends I highly recommend Xapian.

Its like the SQLite of search. Single library that provides the basic set of features you would expect in a quality search experience: facets, ranked search, boolean operators, stemming etc etc.


FWIW, at the end of the day ElasticSeach is just a set of Java classes.

If you’re using the JVM already and want the SQLLite of search, you can instantiate an index in your own JVM, load documents, and get compatible query features.

You can also use Lucene directly, which is simpler, but does have some nuances that differ if you want perfect query compatibility between a deployed ElasticSeach cluster and an in-memory search.

Does ES support the embedded use case?

ElasticSearch the software certainly “supports embedded”. It’s relatively easy to do. Although the documentation could be better.

Elastic the company “supports embedded” the same way it “supports self hosted”. It’s left as an exercise to the user.

A great example of Xapian in action is the email client Mu or Mu4e (for terminal or Emacs, respectively), which queries massive amounts of emails blazingly fast.



That looks great, I could use that right now, thank you!

I was previously looking into embedding TypeSense somehow (but not portable), MeiliSearch (but their API triggered a panic response), or SQLite's full-text search extension (but limited features e.g. language support). Using a dedicated library would be much better and this seems to be packed with features.

>MeiliSearch (but their API triggered a panic response)

Could you expand on that with examples? I only played around with it lightly but it seemed ok via the Python SDK.

I've been recently leaning towards Meilisearch for some client facing search features. This was over Typesense because, i believe Typesense is totally in-memory, but in some cases my dataset might not fit in RAM(atleast not at a reasonable cost). I was fine with taking a performance hit for disk access, provided i could have thousands of separate indexes with separate permissions etc. and directly call them from the browser app.

Admittedly it's been almost a year since I tried it, but:

* If you have more than 65,535 words in a document, words after that limit will be SILENTLY IGNORED (not indexed). This was 1,000 words when I tried it, which was very limiting

* Documents where your query word happens sooner are ranked above documents where your query words happen later (in the document string). You can't turn that off

* You can't get the "match" information (i.e. the position where your query matched in the document) unless you retrieve that attribute (i.e. tell the database to send you the entire document content)

* The API uses HTTP verbs wrong, e.g. POST is used to "Add or replace documents" and PUT is used to update documents (it's exactly backwards from the intended use, per RFC 7231)

Actually it is GPL, which makes it really hard to use as a library for anything. That's a bummer.

The GPLv2 license is potentially problematic. Even if your product is open source, it can cause problems with libraries that use licenses that aren't compatible with the GPL, such as openssl prior to v3.

This discussed more at https://trac.xapian.org/wiki/Licensing

To be clear, I don't hate the GPL license, although I think it is probably not the best choice for a library. But from a licensing perspective, xapian is pretty different from SQLite

Thankfully it is licensed under GPLv2 or later, meaning also GPLv3.

So there shouldn't be any compatibility issues in practice.

As far as I know, GPLv2 is still incompatible with the license for Openssl 2.

How would it compare to something like https://github.com/meilisearch/meilisearch? The main differentiator for me seems to be "Compatibility with Elasticsearch API".

I work on Typesense and I can speak to it from Typesense’s perspective.

Typesense holds the entire index in memory, in order to enable fast search where every keystroke returns results in 50-150ms. So it’s built primarily to enable user-facing search.

It’s not a good fit for log search since putting your entire log dataset in memory can become expensive depending on the size of your logs, and you typically don’t need search-as-you-type for log search.

Zinc on the other hand seems to be designed specifically for log search.

As it seems, it has near full compatibility with the Elasticsearch API. https://github.com/zinclabs/zinc/issues?q=is%3Aissue+elastic...

@draerossman. Thanks. Yeah, indeed we have pretty decent compatibility with ES.

Another alternative to ES and Lucene respectively - QuickWit (https://github.com/quickwit-oss/quickwit) and Tantivy (https://github.com/quickwit-oss/tantivy).

Meilisearch does not have compatibility with ES API so why asking? The title is " lightweight alternative to Elasticsearch".

There's many "lightweight alternatives" to Elasticsearch (see this thread, which lists at least two already).

Alternative doesn't always mean they are API compatible, it's an alternative and not a drop-in replacement. My question was about how comparable it is apart from the API compatibility.

> Tired Of High Cost And Complexity Of Your Log Systems

Is this really the main thing ElasticSearch is used for now? In 2016, I perhaps naively chose ElasticSearch (via AWS's managed service) for an actual user-facing search feature. That service is still running, and now I'm looking to move it to self-hosted, but didn't want to actually run ElasticSearch (or more likely now, OpenSearch). So I looked at the ZincSearch home page, and the quoted heading seems odd.

Elasticsearch and ZincSearch both are general purpose search engines based on inverted indexes that can be used for app search as well as log search. In my stint at AWS I found that most companies used elasticsearch and anecdotally 90%+ users used it for log search. Log search was also the use case that I had in my mind when I built ZincSearch for my own needs and I found ES to be lacking. Here is a blog that I wrote on my motivation around building ZincSearch - https://prabhatsharma.in/blog/in-search-of-a-search-engine-b...

ElasticSearch is definitely used for user facing search too! In the WordPress community there are several integrations such as SearchPress (https://github.com/alleyinteractive/searchpress) and ElasticPress (https://github.com/10up/ElasticPress)

Elasticsearch is free as in beer and you can use the properly OSS Opensearch as an alternative. So, I don't really get the cost argument.,

As for hardware cost needed to run Elasticsearch/Opensearch, it scales down as well as it scales up these days. I have a couple of ~70$/month clusters with millions of documents in each. Not a problem that I need to fix. I can scale them up to cost a lot more and handle billions of documents easily. That's in Elastic Cloud, which would be the expensive option. If you go bare metal, you can get way more bang for your buck obviously. A couple of Xeon machines with some decent amount of CPU/Memory/SSD go a long way.

As for complexity, they make it pretty easy to get started. What you do beyond that is kind of a luxury problem. You get all of these features out of the box but that doesn't mean you have to use them. But it's nice to have the option when you do need them as opposed to running into a brick wall because you are missing key features in these so-called drop in replacements. Depends on your needs. If you don't know the first thing about your search needs, keep it simple and go with mainstream solutions. Sign up with some managed opensearch/elasticsearch provider, get your cluster ready in a few minutes and start throwing some data at it. It's that simple. You can get it running in docker as well if you need something to develop against.

It seems a bit of a rite of passage to be implementing your own search engine these days without being hindered too much by not being completely up to speed with the field of information retrieval and all the wonderful algorithms and sollutions. Guilty of that myself actually; I built a tf/idf implementation at some point. It's not that hard. From there to something that gets close to Elasticsearch/Opensearch/Solr/etc. takes a lot of work and skills, however. There are a few nice solutions out there but also lots that aren't so hot.

I've not used Zincsearch so I'll reserve my judgement. However, their main pitch seems to be that they are cheaper than a thing that is free and partially compatible with it (for ingesting data only) but a bit lightweight (not so scalable, way less features). So maybe pay the difference of 0$ and get something without the limitations that come with being "lightweight"? Just saying. Not a great pitch if you have to name your main competitor in it. Kind of emphasizes that you are trying to catch up rather than disrupt. Does it have any actual things it does better? That should normally be the center piece of a pitch.

I haven't used it for nearly a year now, but in my last role we used ES for user-facing search. We had to turn to documentation, forums and support for a lot of problems we faced and I got the impression ES was mostly being used for logs while user-facing search was the less common use case. Support didn't have answers for us on a number of occasions and online documentation was equally sparse.

Kibana was killer, though.

Well everyone has logs, but not everyone has user-facing search features.

I personally wouldn't recommend self-hosti gElasticSearch for logs; it's such a commodity now that it cheaper to get something like Datadog or another manage service (but do check the pricing,the last time I checked the managed logging service from Azure was very expensive for what it did).

If you're on the jvm, lucene is actually pretty trivial to embed. The complexity comes around representing the data to be seached as "documents", ingesting the documents to be searched, reimporting when they change, and lucene's goofy query syntax.

And Solr wraps it reasonably nicely over HTTP if you’re not on the JVM.

As does Elastic Search.

Running ES for simple site is trivial. If you're not pushing hundreds of GB or hundred+ searches per second, ES/opensearch is just pretty much "just run it"

It sure didn't look easy when I looked at the OpenSearch docs a few months ago. Any pointers to an easy way of configuring OpenSearch (or ElasticSearch) when I only want to run it on one machine, and it will only be accessed by other processes on that same machine (so I can just access it via localhost)?

I guess I'll ask the same question that I ask every time a search product appears here – does it handle languages other than English? Last I checked Meili search was still English-only, for that the inbuilt Postgres search is honestly good enough for most usecases.

Hi @tommoor

I work at Meilisearch so maybe a biased answer.

We do support other languages than English, it actually depends on what are the languages supported by our tokenizer https://github.com/meilisearch/charabia, so any language that uses whitespace to separate words(including English), Chinese, Japanese, Hebrew and Thai (Japanese and Thai might work a little bit less)

Thank you for posting back. I have a list of technologies to try for an upcoming project, and Meilli is one of them. Our language is Hebrew, and I had just written in my notes file that Meilli is a no-go because of the GP post. Good think that I thought to refresh the page before moving on.

Such a comment on HN can be very damaging, as I generally trust the posters here more than in other places. I know that I'll have very little time to properly assess technologies so I'd like to shorten the short-list as much as possible. Again, thank you.

(Am from ZincSearch team) Here is a list of supported languages - https://docs.zincsearch.com/api-es-compatible/index/analyze/...

The sales pitch should be that it's faster (if indeed it's faster), or easier to configure, or smaller to deploy... not that it's written in go.

Being written in go already implies that it is much easier to deploy and the how and why it is easier to deploy (because it is written in go).

So contains more useful information than saying "easy to deploy".

being written in go does not imply anything about ease of deployment. kubernetes is written in go.

in fact, the type of project that sees something that works exceptionally well (like ES) and then decides to partially reimplement it (to the point of maintaining API compatibility) is a pretty big red flag that it is NOT going to be easier to use in the long run.

ES is not hard to deploy. Maybe to use, but not to deploy.

It is definitely super easy to deploy and run compared to ES. You can get up and running in less than 2 min. And you are right, messaging needs to be better.

it's almost certainly their "drop in ES replacement" story (https://news.ycombinator.com/item?id=32938630) since almost every other submission like this is "just" a search with a REST API. But there's _so much_ tooling out there which claims to blast text at ES/OpenSearch that it makes trying to replace ES hard since it requires changing every producer in that story, too

It says "lightweight". And this is HN, if a clone of a tool is written in a different language, it is customary to list it.

Interesting. I'm not sure why I would chose this over Elasticsearch though? How does performance stack up against ElasticSearch? If the API is compatible with ES does that mean I can use the ELK stack including logstash for ingestion, and kibana for analysis?

Also... One of the things I didn't like about ES was the JSON DSL. It would be nice to create a friendlier query language, maybe something like SQL to query data. I find ES queries written to be ugly to look at, and not intuitive.

Did you know that OpenSearch has JDBC/ODBC drivers? https://github.com/opensearch-project/sql

Manticore Search [1] is a good alternative to Elasticsearch if you prefer SQL

[1]: https://manticoresearch.com/blog/manticore-alternative-to-el...

I've been working on something similar as I dislike ES's JSON DSL or Kibana's interface: https://log-store.com/

It's not open source but old skool '90s-style freeware. The current single-node version will remain free forever, but the next multi-node (distributed) version will be for-pay.

Elasticsearch supports a subset of SQL for querying:


Quite limited subset: only SELECTs and even that is limited, e.g. you can't do "SELECT id"

Zinc is a little too early for performance benchmarks (Still not at 1.0). fluentbit, logstash, filebeat can be used as is with ZincSearch. Kibana is not yet supported but you could use the UI of Zinc (Still in very early phase) which is embedded and you don't have to install separately.

It looks like the selling point is easier setup and operation. Elasticsearch is kind of a pain in the ass to take care of. Maybe they've made that better? That'd be valuable to me.

> Elasticsearch is kind of a pain in the ass to take care of.

Any specific issues you are referring to? At work, we've been using ES heavily for many years and it isn't really something taking up time apart from updating the software every once in a while.

Are you constantly adding new users and having to update retention policies as users try to fill up your machines?

I guess you are referring to the "log search" use case? We don't use it for that so that isn't really an issue, and it's mainly about beefing up memory / cpu.

And yeah, We definitely need a better query language. We are working on it.

perhaps a lisp DSL?

Also worth examining: a recently introduced FOSS library, Pagefind (https://pagefind.app).

Thanks for posting this. Just what I was looking for to add to my personal website.

You’re welcome. I’ve found it an outstanding addition to mine, too, and have written about it here:


ES is a real memory hog. You either need to spin up some real beefy servers or it will regularly crash.

If they can provide an stable drop in replacement for ES, this would really be game changer and offer massive value.

I’m not sure it’s possible or desirable to make a low memory search index. Like, fundamentally you’re going to want the indexes in RAM for performance, no?

Depends on the search volume and performance requirements. For lots of (most?) applications, you don't want all the indices in RAM.

(But I guess a lot of the bloat with ES is also the 3+ node cluster which includes lots of redudancy, and resulting redundant use of RAM as well?)

If you don't want/need HA you can run a 1 node ES.

Yeah, but for that you want to make the most efficient use of the available RAM, no?

I would suspect there to be some room for improvement.

There is tons of room for improvement, did you improve it? As in, did you tune ES to your specific hardware and search needs?

(disclaimer: I'm one of the cofounder of Quickwit.io, we are doing log search)

(edited to add vector search category and add manticore)

When looking at OSS search projects, it's nice to have a rough ideas of the different use cases, here is the list I have in mind:

- user facing search (think algolia): elasticsearch[1]/opensearch[2], meilisearch[3], typesense[4]

- enterprise search: elasticsearch/opensearch

- log search: elasticsearch/opensearch, loki[5], quickwit[6], zincsearch[7]

- vector search: Milvus[8], Qdrant[9], Vald[10] Weaviate[11]

Vespa.ai[12] is great but I don't really know in which category to put them, user facing and enterprise search seems to work well with them.

It's interesting to note that Elasticsearch and Opensearch are general purpose search engine, Solr as well. They are all powered by Lucene, the popular and performant search engine library.

Another general purpose search engine is Manticore search[13].

I would love to see some benchmarks by category :)

Note: I don't know well ZincSearch and put it in the log search as said on their front page.

[1]: https://github.com/elastic/elasticsearch

[2]: https://github.com/opensearch-project/OpenSearch

[3]: https://github.com/meilisearch/meilisearch/

[4]: https://github.com/typesense/typesense

[5]: https://github.com/grafana/loki

[6]: https://github.com/quickwit-oss/quickwit

[7]: https://github.com/zinclabs/zinc

[8]: https://github.com/milvus-io/milvus

[9]: https://github.com/qdrant/qdrant

[10]: https://github.com/vdaas/vald

[11]: https://github.com/semi-technologies/weaviate

[12]: https://github.com/vespa-engine/vespa

[13]: https://github.com/manticoresoftware/manticoresearch

> It's interesting to note that Elasticsearch and Opensearch are general purpose search engine, Solr as well. They are all powered by Lucene, the popular and performant search engine library.

Another search engine which can be considered general, is not based on Lucene and is not less powerful than Elasticsearch/Solr is Manticore Search [1]

[1]: https://github.com/manticoresoftware/manticoresearch

> I would love to see some benchmarks by category :)

I'd love too. We started this work on db-benchmarks [2] , hopefully we'll have resources to continue it. Contributions are very welcome. It's 100% opensource [3]

[2]: https://db-benchmarks.com/ [3]: https://github.com/db-benchmarks/db-benchmarks

I would put Vespa and also Weaviate in a new category called "Vector Search".

We're also actively working on adding Vector search to Typesense: https://news.ycombinator.com/item?id=32851894

As an Yahoo internal project, Vespa actually started out very similar to Elastic in that it was meant for fast text search. The team then added ANN search capabilities later on.

Vector search infrastructure is overall still fairly new (https://milvus.io/docs/text_search_engine.md), but up and coming for sure. Disclaimer: I'm a part of the Milvus community.

Right, it's a separate category. There are also Vald, Milvus and Qdrant https://github.com/qdrant/qdrant

I've worked with MeiliSearch a little and found it to be a great experience - the exact opposite of working with Elasticsearch.

No Splunk[1]? They're the defacto enterprise log platform.

[1]: https://www.splunk.com/

Yes Splunk is well known of course, I restricted myself to OSS projects.

> ZincSearch currently does not provide an Elasticsearch compatible query API, however we are working towards making it compatible.

> Can ZincSearch be deployed in HA(Highly Available / Cluster) mode? Currently, No. We are working towards making ZincSearch Highly Available.

Dealbreakers, for now.

We need to update the website. A lot of ES APIs are now compatible.


Zinc Search engine. A lightweight alternative to Elasticsearch written in Go - https://news.ycombinator.com/item?id=29434097 - Dec 2021 (64 comments)

This will be interesting to kick the tires on. The last gig I was on that used ES - ES would eat tons of memory and quite often crash due to OOM. Wasn't a fan of the query language, either.

Seems like wrong tuning, ES have few gotchas when it comes to index structure and size (not too many but also not too big) when running big

Looks similar to Sonic https://github.com/valeriansaliou/sonic (wr/in Rust)

Is there a good full text search engine with pluggable storage? I’d like to implement a simple set of apis and use search with other backends.

It sort of annoys me to see people reimplementing storage engines over and over. If it’s embedded use rocks/leveldb/SQLite and it’s distributed use foundationDB. The end.

I'm immediately interested in that this seems to be aiming at being API compatible, which means it could in theory be used as an ES replacement for self-hosting services that hard-depend on ES?

I was literally looking for a lightweight Elasticsearch alternative for a couple of days now... This is great, will set it up asap.

It would be interesting to see how does it perform on restarting when used with large amount of data.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact