

Ask HN: How do I start with implementing a probabilistic parser? - sunil68

I have got the inspiration for this from the probabilistic parser from wit.ai. It is available on GitHub, wit.ai&#x2F;duckling[1].<p>It parses natural language having date&#x2F;time components into a timestamp structured object.<p>For example, the phrase &quot;this christmas morning&quot; gets transformed into<p>&quot;From Friday, 25 December 2015 at 4:00:00 +0000 (UTC) to Friday, 25 December 2015 at 12:00:00 +0000 (UTC)&quot;<p>I have a similar idea in my mind and I want to try it out. It will also do the same thing, parsing some natural language syntax subset to some structured data objects.<p>Where should I start from? Any great resources will work for me; blogs, books, papers etc.<p>[1] https:&#x2F;&#x2F;github.com&#x2F;wit-ai&#x2F;duckling
======
selbyk
Bayesian Theory would probably be a good place to start.

[http://en.m.wikipedia.org/wiki/Bayesian_inference](http://en.m.wikipedia.org/wiki/Bayesian_inference)

Also, there is an open source Google project written in C called word2vec that
comes with some interesting shell scripts that link common phrases
surprisingly well for how simple the code seems to be.

------
arh68
I don't understand. The project you linked is open source. Can't you just
start with their code, and fork?

~~~
sunil68
First off, that's written in Clojure.

In case we settle on the fact that it doesn't matters in which language it's
written, I'll have to learn one extra thing, Clojure. The language choice
matters though.

I think I'll develop that in my choice of language and so asking for the core
mechanism of the parser.

~~~
arh68
Ok I understand better now. I don't blame you for not learning Clojure for 1
project. Duckling suggests similar projects, for example the Ruby project they
list:

    
    
        Chronic.parse('may 27th', :guess => false)
        #=> Sun May 27 00:00:00 PDT 2007..Mon May 28 00:00:00 PDT 2007

~~~
sunil68
Yeah! now as people haven't responded on the thread, I'll have to march around
Chronic.

By the way! Chronic is not a Probabilistic Parser by definition.

------
selbyk
What are your language preferences? I've been wanting to do a similar project
for a while now, but have been hesitate because of how daunting it seems.

~~~
sunil68
I'm thinking about implementing in either CoffeeScript or Python.

