

Autocomplete of entire Linux kernel using Redis's new ZRANGEBYLEX - artursapek
http://autocomplete.redis.io/

======
Jabbles
So? I mean it's probably nice to have, but the performance of Redis isn't
important enough to be a factor in the scalability of this.

I just tested a binary search of 8 million random 100 byte DNA strings written
in Go. 10,000 requests for autocompletions of random 5 byte prefixes yielded
78 million results in 38 milliseconds. And this is with the strings dotted
around in memory after sorting them. Further optimizations are possible, but
if they were hoping to prove some kind of performance benchmark with this,
they have failed.

~~~
antirez
Algorithmically this is nothing exceptional or that you can't reproduce
easily, but this is even more true for like linked lists, that are the base of
Redis List type. The point is to have this capabilities with a network
interface, persistence, replication, and so forth.

------
danbruc
I first believed it uses the entire source but it only performs a prefix
search on a per line basis. So we have a sorted list of 8M strings, around 600
MiB assuming 80 characters per line which is probably way to much, requiring
23 comparisons to find a match using binary search. That seems not really
impressive to me.

~~~
antirez
This is not impressive as any balanced tree will do the trick, but your
example with binary search is not ideal, because binary search uses an array
that would require an O(N) insert operation.

The point of Redis sorted sets is that inserting is cheap and as soon as you
insert the query operations see the new data modified as needed without any
slowdown, and this requires a tree-like structure for this exact use case (it
is actually more complex than this and sorted sets are implemented with a skip
list + hash table).

What is interesting of having this in Redis is that you have the same
operation into a networked fast in-memory DB, so your clients can mount
something requiring range queries now.

For instance you can use a sorted set where you increment the score of every
search query you see, and another sorted set where the N-top items are
inserted to autocomplete searches using the new ZRANGEBYLEX command. However
because you have multiple keys and memory-wise this is pretty efficient, you
can segment your users in different categories and do this in a per-category
basis so that different categories will have different autocompletion
dictionaries, that may better fit what they are looking for.

------
corford
Nice to have but I'd stick with Solr/ES/Lucene for any serious string
searching/autocomplete work.

------
dave1010uk
What's the performance of a MySQL LIKE query on a similar sized table (with an
index)?

~~~
gfv
It's not really possible to compare LIKE to ZRANGEBYLEX. LIKE in a general
case is an infix search in all text in a column, while ZRANGEBYLEX is a
special case of search of all values within a given range. ZRANGEBYLEX uses a
lexicographic comparison, so one can only search for strings with a prefix
between certain values.

~~~
mappu
MySQL can search by prefix by using only a trailing wildcard (LIKE "string%"),
and then it will use an index if one is available.

