
Show HN: Google Sheets add-on to compare text, fuzzy-match, highlight duplicates - chiscript
I created an add-on for Google Sheets called Flookup, and it comes both as a free version and a VERY AFFORDABLE paid version.<p>At its core, Flookup is a fuzzy matching add-on that helps you manage text that is less than a 100% match. Beyond that it can be used to:<p>1. Search for and match data regardless of whether it contains typos.<p>2. Highlight and delete duplicates duplicates even if the data has mismatched text.<p>3. Calculate the percentage similarity between strings.<p>4. Extract unique values from any column based on percentage similarity.<p>5. Sum and find the average of numbers based on corresponding partial matches.<p>Because of its versatility, Flookup can be used to return the best match, the next best match, etc. until the minimum percentage similarity is reached. This feature avoids weaknesses other fuzzy matching algorithms have because it safely hands power to the user, and I believe the user is the best judge of which data is a match or not.<p>Another great feature Flookup has is that it can be used to combine lookup values. This is particularly helpful when your data has many similar strings and you want to add extra information to your lookup value in order to increase the specificity of your query.<p>Finally, Flookup is good for more than just fuzzy matching; it is the improved replacement for VLOOKUP and INDEX&#x2F;MATCH that you have been looking for.<p>Find out more by heading to <a href="https:&#x2F;&#x2F;www.getflookup.com" rel="nofollow">https:&#x2F;&#x2F;www.getflookup.com</a>,
Subscription information is available at <a href="https:&#x2F;&#x2F;www.getflookup.com&#x2F;pricing" rel="nofollow">https:&#x2F;&#x2F;www.getflookup.com&#x2F;pricing</a>
======
throw_14JAS
I've had a similar idea in the back of my mind for a few years now. Congrats
on launching!

My use case is a bit different -- I was doing a lot of database cleanups,
particularly CRMs. I rewrote/reused code to build a duplicate detector a
number of times; always wish there were a service that I could send data to,
and it would flag my dupes. Even was using human labelers to train domain
specific models.

~~~
chiscript
I see... There are other solutions that claim to use A.I to train or generate
models for their their apps. I'm not really sure how effective they are
though. That said, however, Flookup can help you flag dupes quite well. There
are many ways I tried to shore up the fact that no algorithm is a one-size-
fits all solution. For example, Flookup allows you to dictate what stop words
to remove or combine lookup variables for more specificity or even return the
next best match in case the first one isn't to your liking. All this makes it
quite malleable and usable for a case like yours.

~~~
throw_14JAS
Thanks for replying! I don't tend to do that type of work anymore, but I'm
still stoked to see a solution to the problem I had frequently. I think
there's a great service to be built (and maybe it's yours!) that deduplicates
data.

Specific models might be an interesting addon. Address parsing, normalization,
and deduplication (with potential covariates like phone number, email address,
etc.) is a massive pain in the ass for any data engineer who works with sales
or marketing folks. Their databases (CRMs) are awful -- it was always a chore
to clean these up, but measurably saved money (imagine you mail physical
cards, and only want 1 per customer... but you have 5 different contacts at
that company for 3 unique individuals).

I would have paid for a deduplication service -- say, quarterly batches at
somewhere >$500/quarter for e.g. 20-50k contacts.

The 1-size-fits-all isn't really a value add for me, that wasn't so much my
issue. For other target users, I can see that use -- for them, the interface
is the value add. Especially if you can read/write Excel files directly.

Stop words aren't something I used in my deduplication efforts. How many of
your users request or use this? What kind of stop words do you want to exclude
from comparing two entries? I would be worried that stopwords still carry
information: "The Store" versus "Store" might be significant.

~~~
chiscript
I hope that service is mine. I'm really pushing to get more users on board
this year.

You have interesting insights, especially about the value deduplication
offers. In fact, apps similar to mine charge in the thousands for an annual
subscription. I didn't know how much of a pain it was until I released my
first paid version last year.

As for the stop words: A significant number of my users either asked for it or
presented me with issues that could be solved by removing certain irrelevant
text. I don't track usage patterns and such so I added the feature based on
this feedback, fully trusting that the users could identify the stop words
correctly.

Some of these words might be anything from TLDs to definite articles like
"the"; it all depends on the user. Sometimes it's a simple word like "very" or
"best", etc., and, based on the data they've shared with me, it looks like the
feature was a welcome addition to Flookup.

------
tehabe
Why is so hard to introduce the people who made this tool on the website? If I
spent money on something I want to know to whom I sent this money. It feels
really weird to use an anonymous tool for something which might be important.

~~~
gnud
Hard disagree. A prominent 'meet the team', complete with smiling, rounded,
portrait photos will usually make me close the tab.

The fact that there is a tutorial and pricing prominent in the menu, makes
this better than 80% of landing pages I run across, easily. That appeals more
to me than any number of hipster beards.

I liked the landing page. It gave the what and the why of the tool in a clear
way - and but still had some actual technical details.

~~~
rosstex
Hard disagree to your disagree. I don't want it at the top of the screen, but
I don't mind scrolling down and seeing those smiling faces.

------
superbrane
Congrats for launching a useful tool and already gathering a nice install
base. Wonder how much of it is paid :)). Base plan looks a bit too restrictive
- people can process 50 rows manually in xls -I suggest you offer more rows
for the free plan, so you encourage adoption.

~~~
chiscript
Thanks for your suggestion.

It used to be way more "restrictive" than that, until yesterday when my
unscientific analysis prompted me to make the change to 50. At this level, I
felt that many users would be able to get some work done and properly test
Flookup at the same time.

Of course there might be some more room for expansion and I'm willing to do it
depending on user feedback and more research.

------
dandare
This reminded me of [https://openrefine.org/](https://openrefine.org/).

Good luck with your project!

~~~
chiscript
Ssshhhhhh! Don't tell them about it!

Hehehe! #JustKidding ;)

------
samdung
Congrats on the launch and i hope you make money quickly enough before Google
launches this as a built-in feature.

~~~
chiscript
Hahaha! The horror!

------
jacklewis
> 2\. Highlight and delete duplicates duplicates even if the data has
> mismatched text.

I see what you did there

~~~
chiscript
Hehehe! I try my best

------
marapuru
Why did you choose a subscription model over a fixed price one?

~~~
chiscript
There is a fixed price for those who want it.

The subscription model is especially for those who might not want to own the
product forever; like those who want to use it for a couple of months to a
year and quit. Since its launch, I have seen the majority of new subscriptions
shift from the monthly to the annual. And now, as of the start of this year, I
have seen a bump in the lifetime subscriptions.

The other great thing about subscriptions is that you, the user, get lifetime
updates/support/upgrades.

