

Show HN: Knwl.js - Scan through text for data that may be of interest - Bambo
https://github.com/loadfive/Knwl.js

======
lordlarm
Cool, but for each of these properties that knwl.js finds there are done
extensive research and none of them are easy to do well. Sure, in the demo you
find some interesting properties, but it's content was chosen by the author.

E.g. in the emotion-detection instead of using a data set such as
[http://sentiwordnet.isti.cnr.it/](http://sentiwordnet.isti.cnr.it/) the
author defines the following data structures:

    
    
        this.emotion.negativeWords = ['terrible','horrible','evil','die','dick','bitch','fucked','stupid','idiot','dumb','noob','shit','vain','n00b','dickhead','cocksucker','disgusting','slut'];
        this.emotion.negativeWordsB = ['fuck','shit','kill','rape','hate','hating'];
        this.emotion.positiveWords = ['happy','good','great','amazing','awesome','wonderful','brilliant','smart'];
        this.emotion.positiveWordsB = ['love','like','want',"<3",'kiss'];
    

Which perhaps functions well, but in a very limited domain. The same can be
said for detection of spam, phone numbers (only find American numbers in a
certain format atm) etc.

~~~
guptaneil
As with most projects that deal with parsing natural language, it works best
when optimized for your use case. For example, I created a natural language
date parsing library
[[https://github.com/Tabule/Sherlock](https://github.com/Tabule/Sherlock)]
that is great at parsing events for entering into a calendar, but would fail
if tested against an entire news article. The cool thing about knwl is that it
is open-source and the code is very clean, meaning you can easily optimize it
for your use case. Knwl lays the ground work for somebody to build a smarter
parser into their app.

~~~
c16
\+ 1. I fully agree here, parsing for events data, parsing a news article or
parsing tweets are all very different tasks and each one requires different
optimizations.

An interesting project regardless, I'll likely be using this in the near
future.

------
silentrob
I love the concept but also agree with some of the comments below. Text
extraction, and parsing is better done with classification and training. I
spent sever months working on the problem with Natural[1].

[1] -
[https://github.com/NaturalNode/natural](https://github.com/NaturalNode/natural)

~~~
samsnelling
I cannot encourage people to check out Natural enough. One of the best node
packages out there.

------
esamek
Things that didn't work:

(xxx) xxx-xxxx phone number

mm-dd-yyyy date

------
jbrooksuk
Clever, but not quite good enough for us to parse our notes with. If you type
a time such as "14:30PM" it can't detect it, but if you put a space before the
"PM" then it works fine. Also, if the time is in 24-hour, then you don't need
the "AM/PM" but it can't detect this.

------
97-109-107
Has anyone heard of solutions to just guess that something looks like a
timestamps within a longer string (eg. generic logs, chats, nothing fomralized
like mail headers)?

------
yconst
A little bit more info about the internals of the recognition engine would be
welcome. It seems that one needs to manually supply sets of keywords and key
phrases?

------
michaelmcmillan
The execution could have better, as others have mentioned before me. Hopefully
others will contribute to make it more intelligent. Nevertheless: Great idea!

------
dlsym
Yeah. If you could get me the indices of the extracted data, too.

That would be great.

------
kine
Very cool. Needs some work but a very good start

------
imdsm
Would be nice to see it take a URL.

