Hacker News new | past | comments | ask | show | jobs | submit login
Natty: a natural language date parser in Java (joestelmach.com)
10 points by joestelmach on May 25, 2010 | hide | past | favorite | 6 comments



OK, it can't parse:

  The day before Sunday week
  The day before Sunday
  Next Sunday
  Sunday
Now, you may have tried putting in "Sunday" and it worked, but it didn't when I tried it.

I had a leading space.

Going to the "Let use know" link takes me to a github issues reporting page. I stared at it for about 30 seconds, then decided life's too short and I'd report my findings here.

So I've made it parse some of the above, it appears that there we odd spaces, but I can't retrieve the exact cases now, so I can't really make a sensible bug report.

But it still doesn't parse "The day before Sunday week," nor "The day before a week on Sunday." It also doesn't parse "26-05-2010," the usual UK data format, but you probably knew that.

Is your test data set available?


Thanks for the feedback.

I'll admit to the amateur mistake of not handling leading white space (that's now fixed on the master branch.)

It looks like the only example from your list that couldn't be parsed is "The day before Sunday week". Forgive my ignorance, but I've never seen 'week' used in that context here in the US. If you'd be willing to describe the proper use, I'll look at implementing it.

As for the UK format, please see the issue I created here: http://github.com/joestelmach/natty/issues#issue/3, and lend some advice if you'd like.

The test set is available here: http://github.com/joestelmach/natty/blob/master/src/test/gun...


In the UK and Australia, "Sunday week" would be taken to mean one week after the Sunday after today. Also phrased as "a week on Sunday."

I have use for the library, but cannot use Java. It's nice to know that such a solution exists, and perhaps I can push towards an equivalent for my contexts.

Thanks for your response.


Nice, joestelmach! We're using JChronic (https://jchronic.dev.java.net/) at DotSpots right now for this stuff internally. I'd love to switch to something better maintained.

FWIW, Ruby's chronic contains some date formats that this doesn't seem to support right now ('5 minutes ago'):

http://chronic.rubyforge.org/

Aside: I'd really like to see a publicly-available set of natural language date test cases shared across these projects.


Thanks for the feedback.

The chronic project was actually the original motivation behind natty. I think the chronic project is great, but I believe the grammar-based, AST approach is the way to go for long-term maintainability.

Implementing relative times has been on my list of things to do (in addition to recurrence.) I created a feature request here: http://github.com/joestelmach/natty/issues#issue/4, so feel free to list any time formats you'd like to see implemented.

I agree that a generic list of test cases would be nice. Any thoughts on how such a list should be published?


A list that can be used for interoperability tests would be great, even across languages. For example, Perl has DateTime::Format::Natural that provides this functionality. There's a list of supported inputs at http://search.cpan.org/dist/DateTime-Format-Natural/lib/Date... .

I guess such a list of test-cases could simply be a JSON file containing the input strings plus an output specification. The options for output would only be explicit datetimes, datetimes relative to the now, or timespans.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: