

Fuzzysearch: Tiny, fast fuzzy searching for JavaScript - bevacqua
https://github.com/bevacqua/fuzzysearch

======
lorenzhs
"fuzzy" is a bit of an overstatement, all it does is match substrings with
missing letters. If you type "orrange", it won't match. If you type "otange",
it won't match. All it does is match the "oange" kind of typo (including "oe",
which doesn't make any sense). This means that you'd get all kinds of weird
suggestions that don't make sense as soon as your haystack has non-trivial
size (like matching movie titles or artists), and might not find what you're
looking for (no prioritization).

~~~
espadrine
This is the usual meaning of "fuzzy search". It is most useful when accessing
files on a file system; instead of writing out the full path, you can type
pieces of the path that you know will get you the file you seek.

For instance, finding an "ok" image from this location only requires to type
"ok": [https://thefiletree.com/lib/](https://thefiletree.com/lib/).

(Note that it has scoring optimized for paths, which the library doesn't have;
slashes have special meanings in paths.)

~~~
lorenzhs
what you are describing is substring matching, which can be implemented much
more efficiently using algorithms such as Knuth-Morris-Pratt [1]. Fuzzy
matching is a different concept, and although the definition is a bit fuzzy
(haha), it usually refers to Levenshtein distance (which the author
specifically states he didn't implement) or Hamming distance. Note that
Levenshtein distance on substrings is easy to implement, you only need to a)
remember the maximum value encountered and what it matched (to match prefixes)
b) not penalize gaps in the beginning (to match suffixes).

Thus:

    
    
        M[0,j] = M[i,0] = 0
        M[i,j] = max(0,
                     M[i-1,j-1] + (needle[i] == haystack[j]),
                     M[i-1,j] + 1,
                     M[i,j-1] + 1)
    

Additionally, store the highest value x and its position (i,j). When you've
got the full matrix, return (x,i,j). In Bioinformatics, this is known as the
Smith-Waterman algorithm [2] and the result would satisfy the requirements of
fuzzy substring matching.

[1]
[https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93P...](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm)
[2]
[https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorit...](https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm)

------
doctorpangloss
> outer: for (var i = 0, j = 0; i < nlen; i++) {

> var nch = needle.charCodeAt(i);

> while (j < hlen) {

> if (haystack.charCodeAt(j++) === nch) {

> continue outer;

> }

> }

Oh my god, you can label loops?

~~~
nightcracker
I don't know why the author coded it this way, it can be done much simpler:

    
    
        function fuzzysearch(needle, haystack) {
            var hlen = haystack.length;
            var nlen = needle.length;
            
            if (nlen > hlen) return false;
            if (nlen === hlen) return needle === haystack;
            
            for (var i = 0, j = 0; i < nlen; ++i, ++j) {
                while (j < hlen && needle.charCodeAt(i) !== haystack.charCodeAt(j)) ++j;
                if (j === hlen) return false;
            }
    
            return true;
        }

~~~
joshstrange
He says the algorithm was suggested by "Mr. Aleph, a crazy russian compiler
engineer working at V8.". It's possible it was suggested to be written this
way because V8 (and possibly other interpreters) can optimise it to be faster.

------
GaiusCoffee
This JS "library" contains one function:

> function fuzzysearch(r,e){var
> n=e.length,t=r.length;if(t>n)return!1;if(t===n)return r===e;r:for(var
> f=0,u=0;t>f;f++){for(var
> a=r.charCodeAt(f);n>u;)if(e.charCodeAt(u++)===a)continue
> r;return!1}return!0}

I mean.. come on, guys. This is getting absurd..

~~~
kristopolous
Libraries that can be explained quickly and have good marketing material gain
traction.

The popularity of my projects appear to be nearly inversely proportional to
their amount of complexity and sophistication. Things I've been refining over
5 years like
([https://github.com/kristopolous/EvDa](https://github.com/kristopolous/EvDa))
has a userbase of 1, while my occasionally evening hacks have significantly
more traction.

At least for me, working long and hard on things I believe are of value and
writing tests, dogfooding, and documentation basically means it's just going
to be used by me ... bizarre but true.

------
to3m
I've always found searching by space-separated substrings, matched in any
order, to work vastly better for matching identifiers or file names. No
confusion about how best to match a particular sequence of letters: either
that sequence is an exact substring, and the string matches, or it isn't, and
it doesn't. (If you want multiple substrings, separate the substrings with
spaces.)

This is simplicity itself to code up. Here's quick and dirty python one, for
example, that would get you started. It's so simple that I'm pretty sure it
works, even though I haven't tested it.

    
    
        def get_suggestions(xs,str):
            """return list of elements in XS that match STR, the match string"""
            suggestions=xs[:]
            for part in str.split(): suggestions=[x for x in suggestions if part in x]
            return suggestions
    

This is also (or so I think) better to use. As a user you get a lot more
control over what you're finding, and you don't have to think very hard about
what chars to add in to eliminate items you don't want. So it's very likely
you'll be able to quickly winnow your list down to 1 item.

And because it doesn't have any complicated workings inside it, it can be
explained even to non-technical users, who can make good use of it.

------
vkjv
lunr.js is pretty small (5k gzip/minified) and does a pretty good job at
providing client side full text search. Definitely larger than this but
provides a few more (IMHO, necessary) features.

[http://lunrjs.com/](http://lunrjs.com/)

~~~
mewwts
For a durable, offline-ready search-engine written entirely in JS checkout
[https://github.com/fergiemcdowall/search-
index](https://github.com/fergiemcdowall/search-index) (I contribute to this)

------
dangoor
The big thing missing from this, in my opinion, is scoring. If you have a
large list of items, having no idea which of the 500 matches is the likeliest
match, given what users would expect, makes it kind of useless.

[http://www.blueskyonmars.com/2013/03/26/brackets-quick-
open-...](http://www.blueskyonmars.com/2013/03/26/brackets-quick-open-thats-
no-regex/)

~~~
ne0phyte
You could implement something like the Levenshtein distance [1] which is easy
and still very fast. It gives you the difference between two strings in steps
it takes to transform one into the other based on operations like insertion,
deletion, substitution (and swapping two adjacent chars).

[1]
[https://en.wikipedia.org/wiki/Levenshtein_distance](https://en.wikipedia.org/wiki/Levenshtein_distance)

~~~
edc117
Another for your consideration is Jaro-Winkler [1], which I've used with a
fair amount of success in previous projects, and is still fairly performant.

[1]
[https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)

