
Search Benchmarking: RediSearch vs. Elasticsearch - sidi
https://redislabs.com/blog/search-benchmarking-redisearch-vs-elasticsearch/
======
showerst
I've seen a lot of ES competitor posts pop up on HN lately, and I think
they're missing the point of Elastic.

If you only need very basic word search, ES is probably not worth the
complexity in your stack, especially if you're already running a SQL database
with decent plaintext search.

Where elasticsearch shines is in complex queries: "Show me every match where
this field contains 'extinction' within 10 words of 'impact crater' but NOT
containing 'oceanic' and the publish date is > last month and one of the
subjects is anthropology"

~~~
wyldfire
> Where elasticsearch shines is in complex queries ...

If the "Multi-tenant indexing benchmark" is accurate it seems like it might be
a robustness concern for ES. "Elasticsearch crashed after 921 indices and just
couldn’t cope with this load." \-- does that mean memory exhaustion or some
other crash? If it's the latter, it seems like a quality problem more than a
performance one.

~~~
AznHisoka
Very very few customers actually have 921 indices in production. That is an
insane amount.. by a large factor.

~~~
Xylakant
Judging from what I see on irc and when I get called for “our ES cluster is on
fire, can you put it out?”, 921 indices is not much. I sometimes joke that I
could replace myself with a bot that answers “less indices, less shards” to
each and every question about performance and that bot could solve 90% of the
problems at a fraction of my cost. But alas, nobody wants to pay for a visit
from my bot.

------
panarky
I love plain old Redis, but I'm not thrilled with the extension modules from
Redis Labs.

I experimented with RediSearch using 20 GB of Reddit posts and I was very
underwhelmed.

First, 20 GB of raw data explodes into 75 GB once it's in RediSearch with zero
fault tolerance. While I'd expect some expansion with inverted indexes and
word frequencies by document, a 3.75 multiple seems high.

And since this is Redis, it's all in RAM, including indexes and raw documents,
all uncompressed. That's not cheap. Add replicas for fault tolerance and the
RAM needed for a decent sized cluster could be 10x the size of the raw data.

Then the tooling and documentation is very limited. Redis Labs provides a
Python client, but it doesn't support basic features like returning the score
with each document, even though RediSearch provides this capability if you
query it directly.

Finally, I found stability issues with Redis when the RediSearch module is
installed. Using the Python client provided by RedisLabs, certain queries
would predictably crash every node in the cluster.

Redis itself is rock solid, but Redis with the RediSearch module feels
fragile.

Overall, interesting concept but not ready for production use by any means.

------
softwaredoug
In order for me to trust a benchmark, it needs to be a lot more transparent
than this

\- Show the code that runs the bench mark

\- Give opportunities for everyone to recreate the benchmark

\- Give opportunities for every technology to 'respond' and point out where
the benchmark/tech configuration is wrong (ie "PRs welcome")

Otherwise, this just looks like cherry-picked data points, and even those I
won't trust. Nor would I show this to any of my clients (whom I help select
search engine technology). I dearly hope nobody makes real decisions based on
this blog post until the code, and everything is opened up.

------
jchw
>Component: Search Engine

>RediSearch: Dedicated engine based on modern and optimized data-structures

>ElasticSearch: 20 years old Lucene engine

The implications made here make me actually angry.

~~~
softwaredoug
Lucene: over 20 years has been battle tested, optimized, and improved to the
point where it’s running search almost everywhere

RedisSearch: new shiny thing built on top of Redis that is used in a couple of
niche places.

I’ll take Lucene please

------
bigodines
If 2-word queries is all you need, why would you even consider elasticsearch?
This benchmark is pure marketing IMHO.

~~~
onlyrealcuzzo
RedisLabs seems to really be abusing Redis's popularity on HN.

I've seen a lot of posts like this easily make it to the front page only
because a lot of HN-ers are Redis fanboys (rightfully so: Redis is great). But
then you read the post and it _appears_ to be marketing garbage.

~~~
_Codemonkeyism
From the license fiasco to this. RedisLabs tries hard to be the new Microsoft.

------
free652
The article is a mess of misspellings and misquotes. Also why two distributed
search engine were tested on a single node? That's a a meaningless test.

------
simpsond
"Elasticsearch crashed after 921 indices" ... Shards: "20 for the multi-tenant
benchmark".. 921 * 20 = 18420. Shards have state; they have overhead. Why
wouldn't they pick one shard for that benchmark? It's either intentional
misconfiguration, or poor understanding of sharding.

------
speedplane
From the article: "Here, we simulated a multi-tenant e-commerce application
where each tenant represented a product category and maintained its own index.
For this benchmark, we built 50K indices (or products), which each stored up
to 500 documents (or items), for a total of 25 million indices. RediSearch
built the indices in just 201 seconds, while running an average of 125K
indices/sec. However, Elasticsearch crashed after 921 indices and clearly was
not designed to cope with this load."

No sane elasticsearch engineer would make a new index for each product. They
would just have a single index with a product_id field for each sub-item. If
you needed product level information, you would create a second index for
that. You'd use two indexes not O(#Product) indexes.

They just created a botched benchmark by using ES incorrectly. It's like
driving a car backwards and then complaining it has poor max speed. ES could
easily handle this type of problem if done correctly.

------
sidi
"The more advanced multi-tenant use case – where RediSearch was able to
complete 25 million indices in just 201 seconds or ~125K indices/sec, while
Elasticsearch crashed after it indexed 921 documents, showing that it was not
designed to cope with this level of load." previously stated that
"Elasticsearch crashed after 921 indices and just couldn’t cope with this
load."

It's hard to mistake documents for indices. Both original and the currently
edited statement sound strongly suspect and make me question the benchmarking
methodology used. What caused the ES to crash after indexing 921 documents?
Why is comparing indexing speeds on a 1-node setup even a legit benchmarking
test?

------
alkz
I fail to see how the creation of 50K indices on elasticsearch is a meaningful
benchmark, that's just not how it's supposed to be used. Also as others said,
testing a distributed system on a single node makes little sense... as it is a
benchmark which is not reproducible as we don't know how the data was queried
and indexed

------
makkesk8
This benchmark is pretty misleading. And not the mention that elasticsearch is
free for multi node deployments while redis search is not.

------
overgard
The intent is nice, but the weird clippy-style avatar in the bottom right is
kinda annoying. I'm just trying to read the article not engage in a
conversation.

------
Scaevolus
> Dataset source: wikidump Date: Feb 7, 2019 docs: 5.6M size: 5.3 GB

"wikidump" links to
[https://dumps.wikimedia.org/enwiki/latest/](https://dumps.wikimedia.org/enwiki/latest/)
, which has thousands of files, none of which are 5GB and make sense. That's a
_very_ poor corpus link!

It says "Feb 7, 2019", so it probably means
[https://dumps.wikimedia.org/enwiki/20190120/](https://dumps.wikimedia.org/enwiki/20190120/)
or
[https://dumps.wikimedia.org/enwiki/20190201/](https://dumps.wikimedia.org/enwiki/20190201/)
... maybe. They don't have any obvious 5.3GB files.

------
g1mp
If anyone is looking for real benchmarks of ES, check out this page and leave
the BS benchmarks aside :-) [https://elasticsearch-
benchmarks.elastic.co/](https://elasticsearch-benchmarks.elastic.co/)

------
nathanaldensr
Someone needs to edit this article. There are misspellings and typos all over
the place.

~~~
rooam-dev
It's Friday, the task was to deliver the aritcle by the weekend :)

------
ademup
I'm curious if this scales down well. The test was done on "One AWS c4.8xlarge
with 36vCPU and 60GiB Memory". But could I run this on a tiny vps to index,
search, catalog my million-odd documents?

~~~
showerst
I'd guess redis performs _better_ in that case since there minimum overhead
for redis is much lower than elasticsearch.

~~~
weavie
Wouldn't Redis need to keep the whole dataset in memory?

~~~
showerst
In my experience to get decent "show off in benchmarks" performance with ES
you want your index to fit in RAM as well, but I haven't done much work with
ES outside of high RAM boxes.

------
siffland
A problem with RediSearch, at least for me is:

Note: clustering is only available in RediSearch’s Enterprise version

[https://redislabs.com/redis-enterprise/technology/redis-
sear...](https://redislabs.com/redis-enterprise/technology/redis-search/)

At least with ES i can build and play with the clustering of the nodes. This
is probably why they only made a 1 node ES, because they would have to push
their Enterprise software to do make a cluster of RediSearch. Maybe i am
wrong.

------
manigandham
RedisLabs has done great work in developing Redis but these extensions to
retrofit Redis into a multi-model database have issues.

Raw latency is usually not the primary concern most of the time and having
everything in RAM can be a major cost problem, further compounded by the lack
of compression available as with other persistent stores. The RESP protocol is
also overloaded and hard to work with when dealing with json and search
queries.

------
DmitryOlshansky
Does not tell us the settings for text analysis done by two engines. Secondly
on query side - again, scoring settings of RediSearch vs Elastic are not
discussed.

With that it’s just 2 points in space which gives us little information to
deduce 58% faster at X or whatever.

------
1024core
In the fine print: number of shards for the multi-tenant benchmark for
Redisearch was increased from 5 to 20; but kept the same (5) for Elastisearch.

This is why the only reliable benchmark is the one _you_ do on _your_ data.

------
dumbfounder
Anyone can make a search engine fast. It's much harder to make it good.

------
m3kw9
So according to HN, they’ve proved RediSearch is actually inferior

~~~
manigandham
No, they just haven't proven anything.

------
rooam-dev
Is the RediSearch's aggregation comparable with ES's? Speed has lower priority
when there are missing features.

PS: Crashes are never good though...

------
gt565k
WOW. Hahahaha.

This is a massive misconfiguration of an elastic search cluster. 50k indices?
500 documents per index?

500 records per index at 5shards/index is 100 records per shard.

Yeah, let's shard our data so much that we introduce tremendous amounts of
disk i/o overhead!!!

Author should learn how to properly configure an ES cluster before posting
ridiculous benchmarks like this.

What an utter pile of garbage benchmark this is.

~~~
jstarfish
Isn't that exactly what they're trying to demonstrate though? That all this
arcana you have to invoke to get a stable ES cluster barely breaks a sweat on
Redisearch?

The specific test deployment was multitenant anyway-- you can't account or
optimize for what tenants are going to index.

~~~
gt565k
I'm not familiar with RediSearch, but I'm just trying to point out that you
can't misconfigure ES and then benchmark against a misconfigured cluster. This
is comparing apples to oranges. Not to mention I'm not sure of the feature
difference between the 2 search engines, but I'd bet ES is much more feature
rich, thus its use cases are vastly different. If you are just comparing text
search, sure, maybe redis is faster. But at that point, so is a simple sql
database, when compared to a misconfigured ES cluster.

~~~
jstarfish
I'm not familiar with Redisearch either but I agree, there's definitely a bent
to this in that they made something that isn't ES, then compare it to ES via a
a benchmark that shows how poorly ES performs at...being something other than
ES.

The impression I got was that they were trying to demonstrate for two specific
workloads, how much more a single node of RS can do than a single node of ES,
and that we should extrapolate the savings and performance if scaled out from
there.

A properly deployed ES cluster versus a single RS node isn't a fair comparison
either. It's a strained comparison in any case.

------
coleifer
How silly to emphasize things like "built as a C extension" and "uses modern
data-structures" as if these were useful criteria for choosing a search
engine.

It's about minimizing the effort needed to find what you're looking for. Speed
of index construction time, unless we're talking orders of magnitude, isn't
really meaningful. I don't know if this is just a really clumsy attempt at
"marketing" or what, but I can't imagine this is going to convince anyone to
drop es for this thing.

~~~
softwaredoug
Also complaining that "Lucene is 20 years old" is about the same as saying
"Linux is ~30 years old"

Lucene is a pretty rock-solid open source project that has been battle tested
over those 20 years and had some of the best engineers in the world improve
over a long time frame. That's an asset for Lucene!

~~~
papito
That's a testament to how hard search engines are. Armies of engineers spent
years getting to this point. This project just started as a POC for Redis
modules, but this is really forced and futile. Trying to do this is like
attaching wings to a motorcycle - it's not how search works. This falls apart
beyond a primitive word lookup, which you can do with any SQL database. And it
has been done, MANY times.

