
Show HN: A fast, hopefully accurate, fuzzy matching library written in Go - sahilmuthoo
https://github.com/sahilm/fuzzy
======
ericpauley
Made a PR to switch to runes for iteration. Runes are canonical in Go for
unicode codepoints and also have no memory allocation, so they're wicked fast!
More importantly they make the code compatible with unicode names.

You can also save on a ton of allocation if you reuse unleaked position slices
on each match. It may also be nice to have a maxMatches argument that lets
users set a limit, which would save on unnecessary allocation.

~~~
sahilmuthoo
/xpost from the PR :)

Wow! mind = blown. Please let me digest this code. I will merge later today.
I'm probably going to come back and ask a few questions. I wouldn't want to
pass up the opportunity to learn from you.

Thank you very much!

~~~
ericpauley
Thanks for the cool project!

On a side note, working on this made me realize that case-insensitive
comparison in Go is pretty inefficient, made a CL to improve it: [https://go-
review.googlesource.com/c/go/+/110018](https://go-
review.googlesource.com/c/go/+/110018)

------
wincent
If you want to make it really fast you could steal some ideas from:
[https://wincent.com/blog/optimization](https://wincent.com/blog/optimization)
(tales of many years of optimizing a fuzzy search implementation).

~~~
sahilmuthoo
Thanks for writing command-T :)

Let me pull up some embarrassing numbers of fuzzy matching on all of Chromium.

------
glangdale
Have you tried out one of the standard 'distance' metrics like Hamming
distance or Levenstein distance? This would at least give you an objective
measure of success. Whether that's something that actually _works_ for what
you're trying to achieve is of course an open question, but both Hamming
distance matching and Levenstein distance matching (the latter is harder but
probably better for your purposes) are very well understood.

~~~
purple-again
I made a similar tool and believe that if the author is interested in this
route he may prefer the Jaro-Winkler algorithm as it is more highly tailored
to names.

Likely he rolled his own solution for the same reason that I did, he is
targeting a specific type of names (file names for him) which comes with its
own subset of rules that can be incorporated for better matching.

------
_-david-_
How does this compare with fzf
([https://github.com/junegunn/fzf](https://github.com/junegunn/fzf))?

~~~
sahilmuthoo
fzf is a tool with a much larger feature set. fuzzy is just a humble library
that you can add to your code.

~~~
Rafert
The logical follow-up question would be; how does it compare to fzy? :)
[https://github.com/jhawthorn/fzy](https://github.com/jhawthorn/fzy)

------
forrestthewoods
I approve of any and all fuzzy matchers. The world would be a better place if
every search field was a fuzzy search field. Thanks for the shout out!

------
amjith
I did a similar library for Python (~3yrs ago).

blog: [http://blog.amjith.com/fuzzyfinder-in-10-lines-of-
python](http://blog.amjith.com/fuzzyfinder-in-10-lines-of-python)

library:
[https://github.com/amjith/fuzzyfinder](https://github.com/amjith/fuzzyfinder)

I haven't done any benchmarking to check the performance. Feedback welcome.

------
jokh
Nice! How do you rank the file names when returning the search results? It'd
be cool if you could rank them by factoring in git commit frequency for files,
because more frequently modified files are more likely being searched for.
Cool project!

------
igammarays
Now if only this could be baked into VSCode for workspace-global symbol
search. I would then finally ditch WebStorm.

------
arghwhat
Hopefully accurate? It's not _supposed_ to be accurate! It's fuzzy search!

~~~
mseepgood
That's the joke. Glad you explained it.

~~~
arghwhat
I'm glad I could be of service.

------
anirudh83
cool stuff!!

------
tdurden
Why do projects written in Go always advertise they are written in Go?

~~~
gkya
In this case this is a library and the information that it is written in Go is
indeed useful: you can easily use it from Go code, not so much from other
langauges.

------
bitwize
_Hopefully_ accurate?

You should probably write some tests to ascertain that.

~~~
sahilmuthoo
I've found it hard to write tests (though there are a few here -
[https://github.com/sahilm/fuzzy/blob/master/fuzzy_test.go](https://github.com/sahilm/fuzzy/blob/master/fuzzy_test.go)
). Match quality is often subjective. Tweaking one parameter messes with
others.

This library came out of a project I started. The project never saw the light
of day. If I do see real world use, it'll be easier to find bugs and fix them
:)

~~~
yani
No tests makes it impossible to add it to real world projects IMO

~~~
sahilmuthoo
There are tests.

