
Implementation of various string similarity and distance algorithms in Java - based2
https://github.com/tdebatty/java-string-similarity
======
0x54MUR41
This library is implemented for some use-cases. It calculates how many
different characters or changed character on a string.

String similarity and distance algorithms are different with semantic
similarity (knowledge-based). It calculates the similarity based-on
relationship of word to another word (hierarchy/tree). So, If you need a
library that can calculate semantic similarity instead of string and distance
in Java, I would recommend this [1].

[1]: [https://github.com/sharispe/slib/](https://github.com/sharispe/slib/)

------
_RPM
So, I just started working in Java at an internship this summer. With mvn, the
package manager is fantastic. I was naive and thought things like `npm` were
unique in their philosophy, but realized it's very similar. I hate debugging
Java in an enterprise environment, but I have to say that it's really elegant
to code in Java and use the standard library.

~~~
jjoonathan
What's wrong with debugging java? Between reliable remote debugger attachment,
mvn + IntelliJ's support for automatically building and distributing sources,
decent decompilation in case that fails, JMX, and tools like VisualVM, it
seems to me that Java is head and shoulders above most "trendy" languages in
the debugging department. Is there a cool new tool that I'm not aware of that
hasn't made it to Java yet?

~~~
specialist
Yes, Java is a superior dev env. And then some says something like "We need DI
aspects with our annotated IoC!"

Spring (and others) is an exception (stack trace) obfuscation framework.

~~~
_RPM
Yeah. And with the bean configuration XML files, the error messages suck if
there is anything wrong with it.

------
nchammas
Similar library for Python:
[https://github.com/jamesturk/jellyfish](https://github.com/jamesturk/jellyfish)

------
thwaw
Good job! One question though. In cases like this when there is generally no
state, isn't it better to make the methods static?

~~~
nugator
Static method calls in Java code is a nightmare when testing.

~~~
hellrich
Am I missing something? I regularly write static methods and test them with
JUnit.

~~~
gopalv
> I regularly write static methods and test them with JUnit

Implementing mock objects which return "unexpected" results is impossible with
static methods.

The standard 1 interface + 1 impl pattern in Java is just so that the
Proxy.newProxyInstance can create a decorated or mocked object for testing.

So you can test the methods directly, but you can't write failure-inducing
methods (like a connect exception throwing one) which test the methods which
use it.

------
azuajef
A nice variety of approaches. Any plans for implementing others, e.g., Smith-
Waterman and mutual info. similarities?

------
SNvD7vEJ
Thanks, really useful.

Any plans on including this in Apache Commons?

How does it compare to StringUtils in Apache Commons?

~~~
kinow
Apache Commons Lang has some string distance algorithms. But we started
another project in the sandbox for text/strings, as [lang] was getting a bit
overcrowded with so many things.

You can take a look at the current project here
[https://commons.apache.org/sandbox/commons-
text/](https://commons.apache.org/sandbox/commons-text/)

Source: [https://github.com/apache/commons-
text](https://github.com/apache/commons-text)

------
amelius
Slightly offtopic, I'm still looking for a good 3-way merge algorithm for
strings and structured data in Javascript.

------
haddr
What is the meaning of the dot in the big O notation there? like O(m.n)?
Should it rather be O(m*n)?

~~~
barrkel
Dot is a multiplication symbol (in mathematics, that is; one of several). See
[https://en.wikipedia.org/wiki/Interpunct](https://en.wikipedia.org/wiki/Interpunct)
; the choice between dot and cross as multiplication symbols is further taken
advantage of in vector multiplication (dot product vs cross product). Etc.

~~~
haddr
That's actually a middle dot, not a regular dot.

------
djKianoosh
the examples might show better if they were in the form of assertions

very nice variety and good readme! good job

