
Show HN: Duckling – Open-source datetime expression parsing - blandinw
http://duckling-lib.org/
======
muxxa
> Intervals (like June 12-13) I wrote a Python library to do things like
    
    
        >>> r = Recurrence('June-July 2014').intersect(Recurrence('Monday to Friday'))
        >>> datetime.date(2014, 6, 13) in r
        True
        >>> datetime.date(2014, 6, 14) in r
        False
    

If anyone is interested in hacking on it with me, send me an email and I'll
try to get it up on github.

~~~
crdoconnor
That sounds really useful.

------
aembleton
I'm impressed. I thought I'd try it with `next wednesday at 20 to 3 in the
afternoon`. It understood!

It didn't understand if I used `afternon`. Perhaps as an improvement it could
try selecting the most likely word from misspellings.

~~~
chc
I don't even understand that. What time period is that meant to specify?

~~~
radnor
20 to 3 in the afternoon = 2:40 PM

~~~
infogulch
Oooh, "twenty minutes _until_ three". I typically use _until_ or shorten it to
_' till_. But I've seen/heard _to_ as well.

~~~
samdb
'To' is the most common version in British English.

------
aleem
One suggestion from a use-case standpoint.

On iOS, "tomorrow at 6" is parsed as 6PM instead of 6AM. This makes sense
because usually people really mean 6PM. This is context dependent--in chat
logs etc this is desirable.

Semantically speaking, the Duckling library does the right thing by parsing it
at 6AM, but if the goal is ultimately to parse human expressions, then the iOS
approach is probably better.

EDIT:

Another issue I ran into is that it correctly parses:

"tomorrow evening at 6"

but fails with:

"see you tomorrow evening at 6"

It would be nice to pass it the entire sentence since that's how most people
will intend to use it.

~~~
infogulch
I think the best way to handle ambiguities like this is to present the user
with an unambiguous result that highlights the choice that it made, but then
to also provide a list of possible ambiguities to the application itself so it
knows what the most likely corrections would be.

From the Limitations section:

> ... we only display the closest upcoming time, if any, or the closest past
> time otherwise. It can result in surprising outcomes, like “one year after
> Christmas” will be actually analyzed as “one year after last Christmas”

So this could be the interaction:

> User: "one year after Christmas"

> Computer: "OK, one year after _last_ Christmas" // putting emphasis on what
> could be ambiguous

> User: "no, after _next_ Christmas" // the application expected that next vs
> last could be ambiguous, so this is understood correctly

~~~
ar7hur
Absolutely. We are working on something called "assumptions" where Duckling
informs you about what assumptions it made to produce the result (like: time
was ambiguous, I chose PM), and then you can change these and get a new
result. Coming soon.

------
hardwaresofton
This project looks great. I think it's a great example of finding one thing to
do well, and doing it well (though of course, there are the other competitors
you will have to catch up to, like SUTime, etc).

I also like that this project was attempted by the layman (no offense
intended). I feel that a lot of academic projects have this "if you haven't
been studying ngrams for 20 years don't bother" feel to them, and people don't
seek to deeply understand, instead of just handwaving "somebody smart thought
of this". That kind of thinking reduces new thought in a given field.

Will be using the library in my personal projects for sure, extra points for
using Clojure (in my book), as I've been recently learning about it and
getting into it.

------
teh_klev
Unfortunately it doesn't parse a Scottish colloquial expression such as "the
next again day at 4pm". Which means in two days time at 4pm :)

~~~
ar7hur
I'm very excited about supporting Scottish colloquial expressions! Working on
it now, thanks :)

~~~
AbsoluteDestiny
Colloquialisms will pose tricky when they conflict. For example, "3pm next
friday" is deciphered by duckling as Friday, 3 October 2014 at 15:00:00 +0000
(UTC) which would not be correct if talking colloquially where I grew up as
"next" means not this one but the one after or, more generally, "the one next
week not the one this week".

This is probably not the case everywhere, which is why duckling uses next and
this interchangeably.

------
peterwwillis
Perl modules to implement this would be Date::Manip and
DateTime::Format::Natural. To see some funnier modules (like calculating
Discordian dates or Japanese eras) look here:
[http://www.perl.com/pub/2003/03/13/datetime.html](http://www.perl.com/pub/2003/03/13/datetime.html)

I actually bumped up against legacy time/date issues while working on SSL cert
parsing. An old Perl interpreter's 32-bit limits kept resetting my dates!
Rather than upgrade perl or my architecture, I wrote my own perl methods to
calculate infinite time (sorta?) on 32-bit systems with old perls.

For those that haven't worked with date parsing before: timezones are
surprisingly complex, leap years are stupid, daylight savings is _really_
stupid, and leap seconds are impossible without a regularly updated leap
second database (similar to timezones, but worse). (The math to calculate
dates correctly is rather simple, but you need to be pretty good at math to
optimize it) [https://github.com/psypete/public-bin/blob/public-
bin/src/ne...](https://github.com/psypete/public-bin/blob/public-
bin/src/networking/check-ssl/PortableTime.pm)

------
ominous
Nice tool!

Feedback: I shared it to a friend and his reaction was "bah, it doesn't even
work with the example suggested".

Meaning, he saw the placeholder and pressed enter.

~~~
Arnavion
Yes, the placeholder is misleading in that it looks like it's been entered
into the textbox already and I only need to click "Try me!", but actually I
need to type it out first.

It could at least detect that the input is empty instead of saying it failed
to parse the input!

~~~
ar7hur
You're right, fixing it now.

------
jblz
Really cool project. Thanks for sharing the source.

As an aside, I noticed it was renamed from "Picsou" ([https://github.com/wit-
ai/duckling/commit/0d9f666ae4da114803...](https://github.com/wit-
ai/duckling/commit/0d9f666ae4da1148031e65b291d09f96a4b96073))

Were you worried about getting scrooged by Disney? :)

~~~
ar7hur
Ha! I was waiting for somebody to ask :)

The original name was Picsou (Uncle Scrooge's name in French) because the
parsing strategy is super greedy. We liked the name, but when we decided to
open source it we thought it may be hard to pronounce, so we switched to
Duckling (keeping the duck link...).

------
pit
If you're having a big party

    
    
       tomorrow at three thirty people are coming over
    

you may be a half hour late.

~~~
Ygg2
As a human, without a colon that thing is ambiguous.

It can be:

    
    
        tomorrow at three thirty, people are coming over
    

or: tomorrow at three, thirty people are coming over

~~~
pit
Totally. I guess that was my point, that it's completely ambiguous, so I was
curious to see which interpretation the algorithm picked.

------
Thrymr
Looking forward to time zones support in something like this. Parsing phrases
like "4 o'clock tomorrow my time" or "8am on the East Coast" would be useful.
Few libraries that try to guess time zones do this well (they just assume you
mean whatever your device is localized to at best).

------
trab
I have been frantically researching for a temporal tagging library that can be
used on an Android application with no good results.

I have looked at SUTime, HeidelTime, natty and some others. I am trying to
parse (among others) expressions of the type "the first week of the previous
month", "The last week of September". The only library that can parse this
type of query is SUTime.

Can you comment on why you implemented a home grown solution instead of using
SUTime or some other library readily available. Have you measured the
performance of Duckling vs the state of the art in temporal tagging ?

Duckling seems very well made with good docs but unfortunately for me will be
hard to make work on Android.

~~~
ar7hur
SUTime is very good, like all the StanfordNLP stuff. We chose to do Duckling
because:

\- To my knowledge SUTime only supports English

\- We wanted something that's easy to extend. SUTime is somewhat hard to
extend, especially if you are not into Java

\- We needed not just temporal expressions, but also monetary data,
temperatures, quantities...

That being said, Duckling is still young and certainly not as proven as SUTime
yet.

------
taternuts
This is cool! I too was pretty impressed it nailed "the second friday of
october 2017", was kind of hoping it would get "the second friday of october
in 4 years" but still cool

------
lbarrett
This looks amazing! I wish it would parse ISO8601 times, though.

------
sleepyhead
If you want to do this in Ruby try Chronic.
[https://github.com/mojombo/chronic](https://github.com/mojombo/chronic)

------
primo44
Your "try me" text box should say "eg. tomorrow at 6am".

"ie." means "that is" (as in "restating...")

"eg." means "example".

~~~
blandinw
Fixed, thanks.

------
hamburglar
Suggestion: if you want people using the demo page to be able to spot errors
easily, it would be useful to give a plain english description of when the
time they specified is. For example, if I enter "quarter of six", you could
parse it in my local timezone and spit back a piece of text like, "that is a
little more than 2 hours from now, or: Thursday, 2 October 2014 at 17:45:00
-0700 (PST)".

------
EyeballKid
Along somewhat similar lines, my own date/time-parsing library for golang:
[https://github.com/bcampbell/fuzzytime](https://github.com/bcampbell/fuzzytime)

I wrote it to parse dates and times in news articles and blog posts. Still a
work-in-progress, but someone might find it useful!

------
krammer
Wow, we've tried to solve this problem inhouse and our results are much worse
than this. One question, how hard is to detect that kind of expresions inside
a random text? Like gmail does for suggesting a calendar appoinment within an
email.

~~~
blandinw
Even though the demo website expects the whole input to be a time expression,
Duckling actually detects substrings inside a larger block of text.

We've mostly used it on short sentences, but it should work on larger inputs,
like articles. I'd recommend splitting very large inputs into sentences
though.

~~~
ar7hur
Demo site updated, now shows partial parses.

~~~
krammer
Great! I'm sure we will use it soon. Thanks!

------
heeen
It seems to have problems with places as time zones, e.g. next sunday noon,
german time

~~~
ar7hur
Indeed, that's not supported yet, good catch. Will do.

------
huu
Was hoping it would parse "tomorrow's yesterday" as today.

~~~
francis88
"the day before tomorrow" works though

------
zongitsrinzler
This if off topic but how much does Wit use Clojure?

~~~
ar7hur
Probably too much :) All our backend is Clojure, all our new web developments
are in ClojureScript (with React and Om). The only places we're not using
Clojure are iOS, Android and Raspberry/embedded linux. We're using Rust more
and more for the latter.

~~~
zongitsrinzler
Wow this is very interesting. I have been following Wit's progress since I am
quite a geek for all kinds of automation and AI stuff.

Would it be possible to port this into JavaScript using ClojureScript and use
it on the client side?

~~~
ar7hur
Actually we are planning to port it to a language suitable for embedded use.
Maybe Rust?

------
medell
Who wants to use this to make an Alfred Workflow that creates Google Calendar
events? :) QuickCal hasn't been supported in years.

------
dangerlibrary
hah! 1/2/2014 is January 2nd. Take that, Europeans.

Unless you are checking my IP address to guess the best convention...

~~~
Someone
Are there countries that use slashes for d/m/y?

What surprised me was "1-2-2014":

    
    
      From Thursday, 2 October 2014 at 1:02:00 +0000 (UTC)
      to Wednesday, 1 January 2014 at 0:00:00 +0000 (UTC)
    

On top of the "where did it get those timestamps from", time flows backwards
in that interval.

~~~
EyeballKid
> Are there countries that use slashes for d/m/y?

Oh yes, loads of them. Lots more than use m/d/y anyway. See
[https://en.wikipedia.org/wiki/Date_format_by_country](https://en.wikipedia.org/wiki/Date_format_by_country)

Canada looks the most hellish, eg: "Immigration Canada Stamps use DD/MM/YYYY
and Canada Customs Stamps use MM/DD/YYYY." eek!

------
coldcode
Looks very cool, but sadly I have no way to use Clojure on my iOS apps.

------
sanemat
I hit "try me" button, but it does not work, nothing happens.

~~~
ar7hur
Hey sorry little downtime on the demo site :) it's back now

