
A cool way to use natural language in JavaScript - oori
https://github.com/nlp-compromise/nlp_compromise
======
oori

      nlp.statement('She sells seashells').negate().text()
      // She doesn't sell seashells
    
      nlp.sentence('I fed the dog').replace('the [Noun]', 'the cat').text()
      // I fed the cat
    
      nlp.text("Tony Hawk did a kickflip").people();
      // [ Person { text: 'Tony Hawk' ..} ]

~~~
gcr
Thanks for writing this comment. I can't think of a better way to describe
what this library does than reading these super simple examples.

~~~
geon
Check out the docs. It is chock full of them.

[https://github.com/nlp-
compromise/nlp_compromise/blob/master...](https://github.com/nlp-
compromise/nlp_compromise/blob/master/docs/api.md)

------
hegivor
Perhaps I am misunderstanding the example but isn't the date parsing result
from the API documentation incorrect.

    
    
      nlp.value("I married April for the 2nd time on June 5th 1998 ").date()
      // [Date object]   d.toLocaleString() -> "04/2/1998"
    

[https://github.com/nlp-
compromise/nlp_compromise/blob/master...](https://github.com/nlp-
compromise/nlp_compromise/blob/master/docs/api.md#date-parsing)

~~~
userbinator
No 4th or February in the string either... yet another example of how
unambiguous date parsing in natural languages is very, _very_ difficult. Also
an example of why ISO-8601 is awesome.

~~~
themartorana
Everyone always looks at me funny when I write my dates as `yyyy-mm-dd` -
until, that is, I explain that it sorts alphabetically perfectly. That's the
moment I can convert people to ISO-8601 converts.

~~~
gengkev
I write dates that way too, but because mm/dd/yyy is ambiguous in an
international context. Sorting correctly is great as well!

------
guptaneil
Hmm this example is interesting:

    
    
        nlp.person("Tony Hawk").pronoun();
        // 'he'
    

I was curious how it handled gender neutral names, since being able to
identify gender from a name would be an awesome UX win. I tried a variety of
different names, and for gender neutral names (ie "Alex" or "Taylor"), it
always picked "he". If it doesn't recognize the name (ie "Alexa"), it returns
"they". Unfortunately, it only recognizes very standard American names.
Anything remotely ethnic (ie "Anjali") or slightly uncommon (ie "Nate")
results in a "they".

Not picking on the library, since this would be an impossible task even for a
human, but it seems odd to have used pronoun identification as one of the
headline examples.

Anyway, awesome library overall. This could pair well with a dedicated date
parsing library like Sherlock[1] to create some pretty cool conversational UI
elements.

1:
[https://github.com/neilgupta/sherlock](https://github.com/neilgupta/sherlock)

~~~
daleharvey
"since being able to identify gender from a name would be an awesome "

Way to misgender and insult a number of your users. As you said it is an
impossible task and the result is telling "for gender neutral names (ie "Alex"
or "Taylor"),"

The best way is to not care, it almost never matters, and when it does, ask.

~~~
revelation
This is a natural language library, and _languages_ still care very much about
gender.

~~~
dack
but can a function which is basically "name -> gender" really be honest with
its users? It's claiming something that would require much more information to
predict with any reasonable chance of success (even excluding lgbt folks). If
you look at the files it is based on, it looks like there are only a tiny
number of names defined anyway - what hope does the user have of accomplishing
the goal they want?

Now, I might accept an API in which machine learning was involved and tons of
historical data was included as input before predicting the value. Even then,
it would of course have a decent potential of being wrong, but at least the
result would be a more realistic claim.

The current API feels a lot like this sort of thing:

boolean isHappySentence(String sentence) { return sentence.contains(":)"); }

Sure, it might get it right some of the time, but it's not effective enough to
be useful.

~~~
komali2
Every single aspect of this API has a degree of uncertainty.

If you'd like to improve the capability of the algorithm, put in some PRs to
help out the accuracy.

------
rolfvandekrol
Why is NLP these days equivalent of Natural English Language Processing? There
are much more languages in the world.

~~~
jnky
On a related note: why are we putting up with other languages anyway? I've
been asking myself for a long time why we (as in humanity) don't try to
standardize on a language.

The language of the Internet is English for the most part, and since the
Internet is the greatest facilitator of communication in human history, that
standard language might as well be English.

I firmly believe the world would be a much better place if people from
everywhere could communicate with each other without being inhibited by the
language barrier.

I understand that this is not something that can be done overnight, but a lot
of people learn English as a second langue today, so maybe a switch could be
made in a generation or two, as people who don't speak English as a first or
second language slowly die out.

~~~
coldtea
> _On a related note: why are we putting up with other languages anyway? I 've
> been asking myself for a long time why we (as in humanity) don't try to
> standardize on a language._

Because we value our languages and cultures, thank you very much?

Especially if you happened to be born a native English speaking, and you
propose this from that vantage point, it comes as borderline racist, and
reminds me of this except from Bernard Shaw's Julius Caesar:

"Pardon him, Theodotus: he is a barbarian, and thinks that the customs of his
tribe and island are the laws of nature."

English the language of the internet? It is perhaps the language of YOUR
internet. The Chinese internet is as big -- and there's the Indian internet,
the Spanish speaking internet, and tons more besides.

Why not just get rid of other cultures too, and just keep anglo-saxon culture?

Imagine how easier things could be then, not only we'll have one language, but
we could all relate to the same stuff...

Oh, and about those English -- it's dominance in the last centuries is just a
historical accident, caused by British and then US power. Even the very term
used to describe such a phenomenon, "lingua franca" is not English. The main
language used to be Greek, then Latin, then French and English, etc. And
that's mostly on the Western part of the world (Europe, US etc) -- that
westerners tend to think it's the whole world, but it's hardly 10-20% of the
global population. The "lingua franca" could be again very different in 1 or 2
centuries...

~~~
dandare
Call me a racist too (I am mocking your misuse of the term here) but I never
saw the value in having multiple languages. It is not enriching our culture in
the same way different cuisines do, you can not easily savour on a poem in
different language. For absolute majority of people their native language is
just a necessity because their adult brains are mostly unable to learn new
language at the same level as their mother tongue (unless you spend many years
in that culture). IMHO different languages are equally annoying, useless and
impossible to deprecate as different calendars or metric systems. On the other
hand it is a little tragedy that English with it's ambiguous spelling and
pronunciation become the lingua franca of the day.

~~~
scrollaway
So how many languages do you speak? And no, english + some latin language
doesn't count as "many".

There are many ideas which are impossible to express in english because it's
not only lacking the words but the culture itself. Calling for standardization
on a language is calling for the death of cultures you do not understand.

And while that's not "racist" as it's not about race, it's quite literally
xenophobic. You're afraid of something because it's alien to you.

~~~
debaserab2
> There are many ideas which are impossible to express in english because it's
> not only lacking the words but the culture itself.

This makes me really curious: do you have an example of this or would the
explanation of an example in English lose the essence of the explanation?

~~~
coldtea
It will mostly lose it, yes (like explaining a joke).

Because it's not just that a word exists in some language X that doesn't exist
in language Y (you can always explain the word's general meaning with a
complete phrase or two).

The most important part is rather having that notion in the language X as a
handy thing to use -- and thus being able to shape concepts and phrases around
it, and sharing an instant common recognition for that notion with others.

That said, there are several very good books on the subtleties of translation
and language concepts in general, but one I suggest for the HN crowd would be
"Le Ton Beau De Marot: In Praise Of The Music Of Language" by Douglas R.
Hofstadter (of "Godel, Escher, Bach" fame).

[http://www.amazon.com/Ton-Beau-Marot-Praise-
Language/dp/0465...](http://www.amazon.com/Ton-Beau-Marot-Praise-
Language/dp/0465086454)

------
fiatjaf
How would I proceed to turn

"this library is great." into "all other libraries aren't great." ?

[https://tonicdev.com/5716bedc1dd0391100f67570/57372d9625c9be...](https://tonicdev.com/5716bedc1dd0391100f67570/57372d9625c9be1100f23823)

------
BinaryIdiot
This library is fantastic. I've used in some side projects and while it's far
from perfect it hits that "good enough" space pretty well.

------
pknerd
Is there something similar available for PHP || Python?

------
gcr
This could be great for foreign language learners!

Imagine integrating this into Anki: "Please negate this sentence", "Please
turn this sentence into past tense", "What's the direct object", and so on.

Perhaps "this sentence" could refer to some interesting sentence from an
article that you read in Pocket last week, for example!

------
iamgopal
What a day it would be, when nlp natively supported in browsers and os.

~~~
0x0
NSLinguisticTagger has been a part of iOS since iOS5 :)

Link:
[http://nshipster.com/nslinguistictagger/](http://nshipster.com/nslinguistictagger/)

~~~
kawera
This looks fantastic, thanks!

Do you know if there's something similar for OSX ?

~~~
escap
[https://developer.apple.com/library/mac/documentation/Cocoa/...](https://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSLinguisticTagger_Class/)

Available in OS X v10.7 and later

------
fizzbatter
Anyone know of any good Rust libraries in the NLP landscape? It seems quite
dead in Rust. Only partial libraries, and non-compilable libraries.

------
marknadal
Wow, this is an excellent README if I have ever seen one. And the project is
most excellent as well. Great work.

------
rattray
Can anyone explain how this works?

------
0b01
Title is misleading clickbait. Should be 'to process natural language.' or a
cool way to use NLP.

------
moioci
Somebody has to point out that the past participle of swim is swum, not swam.
(/pedantic-mode)

------
MrBra
What's so cool about this in a way it hasn't been cool before?

------
58028641
Instead of trying to train computers to parse our langueges, why don't we
improve our languages so that they are unambigious and each word has one
meaning and each meaning only has one word. If only there was one universal
language that had no exceptions.

~~~
kybernetikos
There are constructed languages like that. Lojban is supposed to aim for no
ambiguity, I believe that Ithkuil does too.

However, it's very hard to allow abstraction without ambiguity. It might be
best to think of ambiguity as a form of lossy compression, since it allows us
to skip a bunch of thinking and words to say what we want to. Supposedly the
creator of Ithkuil has to think for half an hour to get exactly the right
word.

On top of that, we also use ambiguity for humor and colour. A new humor
culture had to grow in Lojban, because typical puns and the like didn't
translate.

I've grown more interested recently in languages that embrace ambiguity, such
as toki pona. With a language like that, you can shrink the necessary
vocabulary down to a tiny number of words meaning that nonspecialists can
learn the language in a short period of time. Aiming for ease of acquisition
seems like the right approach for a constructed language that aims to be
practically useful.

------
MrBra
Wow, and why exactly did this make into HN first page? Because it's
Javascript?

~~~
MrBra
Explain the downvotes, anyone?

Any good reason this is on the front page, other that:

Google made JS faster > Google made their NLP library SyntaxNet open-source >
NLP_compromise is a NLP JS library so it has to be top news

?

How this library can even appear to be anything new?

Have you never explored the NLP field? I guess downvoters never did, I have to
other way to understand it.

~~~
cskau
From the HN guidelines:

    
    
      Please don't submit comments complaining that a submission is inappropriate for the site. If you think a story is spam or off-topic, flag it by clicking on its "flag" link. (Not all users will see this; there is a karma threshold.) If you think a comment is egregious, click on its timestamp to go to its page, then click "flag" at the top.
    

Also:

    
    
      Please resist commenting about being downvoted. It never does any good, and it makes boring reading.
    

Please read:
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
MrBra
> Please don't submit comments complaining that a submission is inappropriate
> for the site.

I never did. I was just questioning _on the popularity_ it got on HN, not on
it being inappropriate.

> Please resist commenting about being downvoted.

Ok, I apologize for that. My mistake. But hey, may I know the reasons why do
you think that my opinion is wrong, anyone, and specifically you @cskau, if
you do? Eagerly waiting for it but I fear I'll receive no on-context answers
:(

~~~
mcbits
I will speculate:

1\. JavaScript is widely known and used. Thus, anything JavaScript attracts
more attention than similar projects in other languages, especially if it's
outside the box of the usual webby stuff.

2\. The API on this library looks very intuitive and utilitarian. It doesn't
get much easier than .people(), .topics(), etc. People can see themselves
actually using this.

~~~
MrBra
Thanks, you confirmed my theory! :)

