Hacker News new | comments | show | ask | jobs | submit login
Show HN: A fast, hopefully accurate, fuzzy matching library written in Go (github.com)
186 points by sahilmuthoo 7 months ago | hide | past | web | favorite | 30 comments

Made a PR to switch to runes for iteration. Runes are canonical in Go for unicode codepoints and also have no memory allocation, so they're wicked fast! More importantly they make the code compatible with unicode names.

You can also save on a ton of allocation if you reuse unleaked position slices on each match. It may also be nice to have a maxMatches argument that lets users set a limit, which would save on unnecessary allocation.

/xpost from the PR :)

Wow! mind = blown. Please let me digest this code. I will merge later today. I'm probably going to come back and ask a few questions. I wouldn't want to pass up the opportunity to learn from you.

Thank you very much!

Thanks for the cool project!

On a side note, working on this made me realize that case-insensitive comparison in Go is pretty inefficient, made a CL to improve it: https://go-review.googlesource.com/c/go/+/110018

If you want to make it really fast you could steal some ideas from: https://wincent.com/blog/optimization (tales of many years of optimizing a fuzzy search implementation).

Thanks for writing command-T :)

Let me pull up some embarrassing numbers of fuzzy matching on all of Chromium.

I still use and recommend Command-T :-)

Have you tried out one of the standard 'distance' metrics like Hamming distance or Levenstein distance? This would at least give you an objective measure of success. Whether that's something that actually works for what you're trying to achieve is of course an open question, but both Hamming distance matching and Levenstein distance matching (the latter is harder but probably better for your purposes) are very well understood.

I made a similar tool and believe that if the author is interested in this route he may prefer the Jaro-Winkler algorithm as it is more highly tailored to names.

Likely he rolled his own solution for the same reason that I did, he is targeting a specific type of names (file names for him) which comes with its own subset of rules that can be incorporated for better matching.

How does this compare with fzf (https://github.com/junegunn/fzf)?

fzf is a tool with a much larger feature set. fuzzy is just a humble library that you can add to your code.

The logical follow-up question would be; how does it compare to fzy? :) https://github.com/jhawthorn/fzy

I approve of any and all fuzzy matchers. The world would be a better place if every search field was a fuzzy search field. Thanks for the shout out!

I did a similar library for Python (~3yrs ago).

blog: http://blog.amjith.com/fuzzyfinder-in-10-lines-of-python

library: https://github.com/amjith/fuzzyfinder

I haven't done any benchmarking to check the performance. Feedback welcome.

Nice! How do you rank the file names when returning the search results? It'd be cool if you could rank them by factoring in git commit frequency for files, because more frequently modified files are more likely being searched for. Cool project!

Now if only this could be baked into VSCode for workspace-global symbol search. I would then finally ditch WebStorm.

Hopefully accurate? It's not supposed to be accurate! It's fuzzy search!

That's the joke. Glad you explained it.

I'm glad I could be of service.

cool stuff!!

Why do projects written in Go always advertise they are written in Go?

In this case this is a library and the information that it is written in Go is indeed useful: you can easily use it from Go code, not so much from other langauges.

Novelty and notoriety. I’ll throw in kindred spirits too. Go is still trying to get traction. It’s not Java, nor C#, not PHP. It’s still trying to prove itself outside of well know, but niche projects like Docker.

/shrug, language matters to me. I like knowing what code i can interface with. Generally speaking, written in Go is a boon to me (as a Go dev obv), written in Python/Node/Ruby is a negative with runtime requirements for me.

Hopefully accurate?

You should probably write some tests to ascertain that.

I've found it hard to write tests (though there are a few here - https://github.com/sahilm/fuzzy/blob/master/fuzzy_test.go ). Match quality is often subjective. Tweaking one parameter messes with others.

This library came out of a project I started. The project never saw the light of day. If I do see real world use, it'll be easier to find bugs and fix them :)

No tests makes it impossible to add it to real world projects IMO

There are tests.

That sounds to me like a great starting point for others to contribute code. :P

Can't we all just be happy that someone decided to write a fun project and made it freely available for others to learn from?

People are supposed to be gentle in Show HN postings, but often they’re a bit mean.

I suspect "hopefully" is in reference to the fact that it's doing fuzzy matching.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact