
Open Sourcing our new Duckling – Probabilistic Parser Rewritten in Haskell - jimarcey
https://wit.ai/blog/2017/05/01/new-duckling
======
ghughes
> Duckling is now used at scale internally by Facebook.

I'm thrilled to hear that Facebook has a new toy for extracting structured
data from vast quantities of text.

~~~
saurik
:(

------
nl
_Duckling, our open-sourced probabilistic parser to detect entities like dates
and times, numbers, and durations_

Are there any benchmark for how well this compares with something like
HeidelTime or SUTime[2]?

[1] [https://github.com/HeidelTime](https://github.com/HeidelTime)

[2]
[https://nlp.stanford.edu/software/sutime.html](https://nlp.stanford.edu/software/sutime.html)

~~~
blandinw
(disclaimer: I work on Duckling)

I don't know of any benchmarks. HeidelTime and SUTime are two solid
projects... although the rule files [0][1] are a bit scary if you ask me :-).

Quick thoughts:

\- SUTime relies on TokensRegex [2] which is similar to how Duckling parses
sentences at a high-level

\- SUTime seems to only provide English rules

\- I don't know of any production use cases of either

[0]
[https://github.com/stanfordnlp/CoreNLP/tree/master/src/edu/s...](https://github.com/stanfordnlp/CoreNLP/tree/master/src/edu/stanford/nlp/time)

[1]
[https://github.com/HeidelTime/heideltime/blob/master/resourc...](https://github.com/HeidelTime/heideltime/blob/master/resources/english/rules/resources_rules_daterules.txt)

[2] [https://nlp.stanford.edu/pubs/tokensregex-
tr-2014.pdf](https://nlp.stanford.edu/pubs/tokensregex-tr-2014.pdf)

~~~
nl
Yes, the rules are horrible.

Yes, SUTime only has English rules

 _I don 't know of any production use cases of either_

Hi... Using SUTime for English, HeidelTime for non-English. Not at FB scale,
but running against millions of messages per day.

------
patapizza
Hey, main developer here. Happy to answer any questions folks might have!

~~~
flaie
I'm interested to know why you dropped Clojure. How Clojure wasn't meeting
your needs?

~~~
Ixiaus
Not the op nor on their team but I do work at a company using Haskell for grpc
services, internal cli tools, cloud web services, and heavy duty (and high
performance) parallel parsing.

I'm a polyglot programmer and appreciate the features of many different
languages but I couldn't imagine foregoing the benefits of Haskell's type
system, reasoning about code algebraically, and leveraging the rock solid GHC
RTS for some of the mission critical things we're doing. Rust would probably
be my only other consideration for performance reasons but its ecosystem is
more immature than Haskell's and some things are very clunky to express in
rust (even with its enlightened type system) that are very clear in Haskell.

Haskell may not be the right choice for every team or project but it certainly
has been for me for many years now on many (though not all) projects.

------
rodionos
Does anyone know a probabilistic CSV parser that can map fields to a domain-
specific structure?

------
ganfortran
So something written in Haskell is news now?

