
NLUlite: Natural language parser and database - NLUlite
https://nlulite.com
======
syllogism
Let me know if you're interested in licensing better syntactic/semantic
parsing technologies.

I'm the developer of the Redshift NLP library (
[http://github.com/syllog1sm/redshift](http://github.com/syllog1sm/redshift)
). Currently the software lacks documentation, but it offers a good
speed/accuracy trade-off. Documentation and a good tokenizer are coming. You
can read a tutorial for a simplified version of the algorithm on my blog:
[http://honnibal.wordpress.com](http://honnibal.wordpress.com)

------
unsane1
Installer crapped out for me, giving a gzip error. Also, I'm not wild about
self-extracting and executing archives. Would you mind perhaps just posting a
.tgz that can be examined before installing?

~~~
NLUlite
Hi, Can you post the error?

------
garblegarble
This appearing at the same time as a post about designing a personal
knowledgebase (
[https://news.ycombinator.com/item?id=8270759](https://news.ycombinator.com/item?id=8270759)
) makes me wonder how reasonable it is to link this sort of natural language
fact extraction and querying system to such a knowledgebase.

I suspect it might be expecting too much, but I'd love to integrate my browser
history, people I've contacted, where I've been etc. in order to produce an
easier way to search and find some webpage (e.g. when I remember some
contextual information about the day or the place when I visited a page but am
unable to find it by re-googling)

------
NLUlite
Thank you all for your comments :-)

@unsane: This is of course not supposed to happen (we tested it on many
different machines). Apologies. Please write at contact@nlulite.com and we'll
look into the problem.

@garblegarble: The system is still under development. Please write at
contact@nlulite.com to suggest features you would like to be present.

@Syllogism: 93.6% of accuracy is impressive. At this stage, however, we prefer
to use proprietary algorithms. We feel we can reach similar accuracy for
version 0.2.0 (out in January)

@CGamesPlay: The server is supposed to be installed in the $HOME directory. If
you wish to use a different path, you can use the option -d <YOUR_NEW_PATH>
when starting the server.

@Rhapso: You are right, the non-commercial download is somewhat byzantine. The
problem with wget is that you don't get to sign a non-commercial agreement.
Let us think about it for a few nights.

@toblender: We are working on that ;-)

~~~
Rhapso
You can refer to the statement in comments at the top (hell, put the whole
thing in the script, print it out when run, and make the user type yes)

~~~
NLUlite
Thank you for your suggestion. We are going to implement the wget option the
next week.

------
markburns
Whenever I see "Natural Language Parser" mentioned anywhere I get excited then
a little disappointed because it implies something much more profound.

Not to belittle the tremendous effort, but most projects I have seen are
"English Language Parser"s.

Are there any actual generic language parsing projects out there?

That don't try to overfit to English but actually attempt to do a job of
whatever quality in whatever language?

Like I'm a native English speaker, I can understand English say 100%, Japanese
80-90%, I can understand a bit of a few European languages and I can identify
a bunch of other languages.

It would be wonderful if there were software with this design in mind.

~~~
avmich
> Are there any actual generic language parsing projects out there?

Chalmers University has impressive results on this -
[http://www.grammaticalframework.org/](http://www.grammaticalframework.org/)

------
hnriot
for some URLs the data throws an exception, for example:

[http://en.wikipedia.org/wiki/Horse](http://en.wikipedia.org/wiki/Horse) (I
don't like snakes)

    
    
      File "/home/drace/dev/NLUlite/client_python/NLUlite.py", line 375, in add_url
        parser.feed(page)
      File "/usr/lib/python2.7/HTMLParser.py", line 114, in feed
        self.goahead(0)
      File "/usr/lib/python2.7/HTMLParser.py", line 158, in goahead
        k = self.parse_starttag(i)
      File "/usr/lib/python2.7/HTMLParser.py", line 305, in parse_starttag
        attrvalue = self.unescape(attrvalue)
      File "/usr/lib/python2.7/HTMLParser.py", line 472, in unescape
        return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
      File "/usr/lib/python2.7/re.py", line 151, in sub
        return _compile(pattern, flags).sub(repl, string, count)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8:
ordinal not in range(128)

It's also really slow at learning. I have a ton of everything, cores, memory
etc and it takes minutes to process web pages. I guess you do say that on the
website that the free version is slow.

~~~
NLUlite
By the way, the commercial version's parser scales almost linearly with the
number of (independent) threads. The Wisdom.ask() method is also faster with
the multithreaded version.

------
Rhapso
Please setup a better way to distribute the non-commercial version. If you
insist on using the self-extracting archive, please make it accessible via
wget. If this works half as well as it claims, I am willing to pay you for a
commercial single-license for personal use.

------
CGamesPlay
When I start the server and attempt to instantiate a ServerProxy, I get
"Connection refused". The server produces no output. Running ubuntu 14.4.

[append] Turns out the server will silently do nothing if you do not extract
the archive to $HOME.

~~~
hnriot
you just have to use -d to specify where the data files are, the server can be
anywhere. I moved it out of home to my dev environment without problems.

------
toblender
Unfortunately it's only for Linux at this time.

~~~
hnriot
it's really simple to get an x64 linux VM up and running, virtual box takes
just a few minutes to spin up. I think Linux makes the most sense, it doesn't
really make sense for the developer waste cycles on porting when people can
just use a VM to run it very easily.

------
praveenster
How does this compare to nltk, opnnlp or mahout?

~~~
NLUlite
You need to try all four :-) More seriously: NLUlite is supposed to work "out
of the box", without any need of additional training datasets for the parser
(often these datasets can be quite expensive). Mahout is a different type of
machine learning, as it does not look into the grammar of sentences.

------
gtani
can you dump out the index and examine if the most important noun phrases and
named entities are being extracted, a la lucene?

~~~
NLUlite
Exactly as hnriot said: the saved Wisdoms are in plain xml, where the
sentences are represented in DRT
([http://bit.ly/1AhuVas](http://bit.ly/1AhuVas))

