I work on an open source alternative to Algolia called Typesense.
Algolia is a great product but can get quite expensive at even moderate scale. If I had a dollar for every time I’ve heard this from Algolia users switching over…
Oh yes. Speed is an important point. ElasticSearch & Solr use disk-first indexing (with RAM as just a cache), whereas Algolia and Typesense use a RAM-first approach where the entire index is stored in memory. This is what makes Algolia/Typesense return results much much faster than ES/Solr, and lets you build search-as-you-type experiences for each keystroke.
I was thinking about adding a row about speed to the comparison matrix, but couldn't find a way to express the comparison clearly... Imagine a row that said:
What index sizes are we talking about? If it's a few hundred gigs there's always the possibility of putting the entire ElasticSearch index into a ramdisk, or even just leaving lots of "free" RAM meaning the underlying OS will use it to speed up I/O transparently. Bare-metal machines with insane RAM sizes are a thing, and at massive scale could make sense.
I've had great success at a client where simply upgrading a DB to an instance with enough RAM to fit 80% of the entire data set fixed all performance problems and significantly reduced I/O "pressure" at least for reads (writes were never a problem).
I haven’t tried to do this myself so I can’t speak to it.
But one thing I would add is ElasticSearch is quite versatile and flexible, so I wouldn’t be surprised if you can contort it to get it to work for a wide variety of use cases. This is a blessing and a curse - blessing because it’s so flexible, curse because the flexibility breeds complexity and brings with it a steep learning curve and operational complexity.
Where I think Algolia / Typesense help is that things work out of the box without the learning curve or operational overhead.
That table seems... fine? Creating multiple data sets and comparing the various aspects of the products speed is a lot of additional work that you may not have signed up for, and is far from succinct (or easy). It might feel more empirical, but "feels faster" is fine. You're providing a free service - review of available products, and can use whatever metrics you choose.
Correct, the OS’ OOM reaper will kick-in to try and protect core OS operations. So you don’t want to let it get to that stage - you’d typically want to keep at least 15% free memory for the OS to do its thing.
Commercially available RAM today goes all the way to 24TB these days, which should be sufficient for a good number of search use cases. Beyond that you’d have to shard data across multiple clusters.
Looks pretty interesting. There never really seemed to be any good alternatives to ES for a long time. Apart from building feature set, how do you target quality of search results? Do you have any test bed for measuring this and do you benchmark against other solutions to try and understand how everyone fares?
We have search relevancy tests baked into the automated test suite that runs on every commit. We keep adding to it as we get feedback about edge cases and new cases.
Just a question of priority, based on number of asks for it. We do plan to support it.
In the meantime, we introduced a way to turn off typo tolerance and prefix-search on a per-field basis. This has helped some users search for fields containing model numbers for eg.
Algolia is a great product but can get quite expensive at even moderate scale. If I had a dollar for every time I’ve heard this from Algolia users switching over…
I recently put together this comparison page, comparing a few search engines, including Algolia, you might find interesting: https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...