Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: tu – Convert natural language date/time to UTC (github.com/ad-si)
83 points by adius on April 12, 2024 | hide | past | favorite | 45 comments



https://github.com/ad-si/tu/blob/main/src/main.rs#L37

I'll definitely be using this but I'd suggest replacing print with println.


Glad you like it! =)

It's supposed to be usable "inline", so it shouldn't print a newline. (E.g. `touch note-$(tu yesterday).md`)

Some terminals have an option to still write the $ prompt to the next line and use an enter symbol (⏎) to signify that no newline was printed.


Trailing newlines are removed in backtick and $(..) substitution per POSIX:

> The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more <newline> characters at the end of the substitution. Embedded <newline> characters before the end of the output shall not be removed; however, they may be treated as field delimiters and eliminated during field splitting, depending on the value of IFS and quoting that is in effect. If the output contains any null bytes, the behavior is unspecified.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

I don't know of any shell that doesn't follow that.


I’ve never had a problem in Bash with `$()` and newlines. All tools that I’ve used terminate their input with a newline. Is this a problem in other shells, perhaps?


Using a linter is a good idea.

It's not command substitution but the contentes in $variables will be expanded and splitted so you need to quote them or the newlines will get lost.

  $ touch asdd
  $ foo=$(echo -e '\n\nboop\n*dd\n'); echo $foo "$foo"
  boop asdd

  boop
  *dd
Try create files with new lines and globbing in them, and forget to use quotes around filenames. Chaos ensues.


Can it check whether it's writing to a terminal? There are many programs that will write colorized output to a terminal but not colorized output to a file.


UTC is not a representation or a display-format.

That display-format appears to be (derived from?) ISO 8601. And I think ISO 8601 is not bound to UTC; it could equally be used to represent Solar Time. Possibly the most natural representation of a UTC instant would be an integer, because fundamentally UTC is a count of milliseconds.

Just quibbling about terminology! Not knocking the program, I like it.


> Possibly the most natural representation of a UTC instant would be an integer, because fundamentally UTC is a count of milliseconds

It's not! It's a subtle point, but because of leap seconds, the correct representation of UTC is the tuple (year, month, day, hour, minute, second, ...), with the `...` being filled in with your desired precision.

From: http://www.madore.org/~david/computers/unix-leap-seconds.htm...

> Unlike TAI and UT1, the UTC time scale should not be considered as a pure real number (or seconds count): instead, it should be viewed as a broken-down time (year-month-day-hour-minute-second-fraction) in which the number of seconds ranges from 0 to 60 inclusive (there can be 61 or 59 seconds in a minute); during a positive leap second the number of seconds takes the value 60 (while a negative leap second would skip the value 59, but this has never occurred).

...

> If we attempt to condense UTC to a single number (say, the number of seconds since 1970-01-01T00:00:00 or since 1900-01-01T00:00:00, or the number of 86400s-days since 1858-11-17T00:00:00, or something of the sort), we encounter the problem that the same value can refer to two different instants since the clock has been set back one second (negative leap seconds, of course, would cause no such difficulty).

Most datetime libraries get this wrong. I know because I'm working on a new one that specifically doesn't get this wrong.


We could technically do # of minutes since 1970-01-01T00:00:00 (or whatever) & seconds though, no? (int32 minutes, double seconds) would get us a pretty big range in a pretty compact format.


There are lots of equivalent representations. The main point here is that it's misleading to think of UTC as just a timestamp from some epoch. It needs something richer than that.


its RFC 3339


I have been searching[0] for something like this, so thank you! It doesn't quite parse my original example, `14 december 11:20`. but it parses `14 december 2024 11:20`.

I was not aware GNU date could do this as well, but that also doesn't parse my original example.

A small configuration of adding your preferred timezone or reading the system timezone would be nice. The code seems simple enough that I may try to add it myself.

[0] https://emacs.stackexchange.com/questions/79703


Thanks for the good suggestions! I added tracking issues at https://github.com/ad-si/tu/issues.

I'd be very happy about a PR! ;-)


This is great synchronicity. I just recently needed this. And I needed it in a Rust project as well, can't get any better.

I was trying to do stuff with chrono-english [0] and parse_datetime [1].

I noticed chrono-english doesn't work for both `tomorrow 4pm` and `4pm tomorrow`.

I second the desire for the timezone support.

[0] https://docs.rs/chrono-english/ [1] https://docs.rs/parse_datetime/


Pretty neat, though natural language is famously ambiguous. From the example,

  tu today      -> 2024-03-16T12:56:41.905455Z
  tu tomorrow   -> 2024-03-17T12:56:41.905455Z
I interpret "today" as the time period between 00:00 and 23:59:59.999999, and tomorrow similarly.


The classic example is 'next Saturday'. Is this the next Saturday (since today is Friday that would be tomorrow) or the Saturday of next week (i.e. eight days time).

And then you end up with needing a locale based week, since different cultures (e.g. US versus UK) use different days to start the week (Sunday/Monday)


> 'next Saturday'. Is this the next Saturday (since today is Friday that would be tomorrow) or the Saturday of next week (i.e. eight days time).

this is never ambiguous. "saturday" is the saturday of the current week, and "next saturday" is the saturday of the following week


> this is never ambiguous

What is that based on? Your personal perception is not evidence of wider ambiguity either way (and neither is mine).


What is “next monday” at saturday?


"Monday" is two days after, and "next Monday" is 9 days after


Your boss comes up to you and says "I need this by Tuesday" and then storms out of the room. You look down at your watch to confirm you're not crazy. Today is Tuesday. When does your boss need it by?


boss spoke incorrectly. boss should have said "next tuesday". what you're asking for is essentially AI, which is beyond the scope of this tool


Those examples had me puzzled. I would have expected "tu tomorrow" to be 00:00:00 of the next calendar day (converted to UTC), rather than "now + 24 hours". If I wanted a particular time (besides 00:00:00) on the next day, I'd like to be able to input something like "9 AM tomorrow" or "tomorrow +9h".


also depends if you mean "today in UTC" or "today in local time expressed as UTC". most people want the latter


This makes me think of a fun fact I learned this week: by default Unreal Engine parses dates with a broken regex that stops working on European daylight saving time, and pretty much everyone around here have to patch it themselves to have a game that works in summer…


I am not 100% sure of the use case for this ... though it looks kinda interesting?

I run my servers in UTC (as everyone should (and some (many?) products require)))

Everything else handles timezones "correctly" (in my experience)


My use-case would be exactly the one described in the README, for example I would like to add one-off events to my calendar by writing an unstructured note.

For production stuff like servers I wouldn't even think about it.



At first I thought you posted the Tuesday's printer bug

https://beza1e1.tuxen.de/lore/print_on_tuesday.html


Why not just use `at`?


Does anyone have a good dataset for testing libraries like this? I suppose it's not difficult to generate one these days, but curious if there is a standard.


I don't think this is very well made. Uses a hand rolled parser with no formal grammar. Special case logic is not documented, e.g. https://github.com/ad-si/tu/blob/main/src/chrono_english/par...

But I guess it solved the author's problem and we can't argue with that.


how easy would this be to rewrite in C?


Dates honestly terrify me. I am already in a panic state over not having some sort of alternative timekeeping system than the jumbled mess of date formats and interacting with time zones and unix time is simply unreadable, i wonder if the concept of dates should even still exist anymore. When a global org says "get this in by tomorrow" that could be as little as 4 hours away or 20, and i have to go do math to figure out what people even mean.

Abstracting it even further into NLP is even scarier.

I don't have a solution I just feel deep pain about this as a remote worker dealing with logs and timestamps and who is where, there has to be some different way.


The most complex piece of code I ever wrote was a scheduler for raising triggers in the a form such as "every 2 minutes, from midnight to 5am, on the Tues and Thurs of each week, unless it falls on 25th Dec", etc, etc. The main problem I recall was that the server running the triggers and the client entering them might be in different time zones, the server having to honour the client's version, and those TZs might have different DST switchover dates, etc.

Basically, how humans measure time is psychopathic.


Not knocking the tool, just an alternative for people who might not be aware.

GNU date almost does the same as their examples:

    date -Is -ud 'today'    --> 2024-04-12T14:04:08+00:00
    date -Is -ud 'tomorrow' --> 2024-04-13T14:04:08+00:00
    date -Is -ud '2 day'    --> 2024-04-14T14:04:08+00:00
    date -Is -ud '9 week'   --> 2024-06-14T14:04:08+00:00
    date -Is -ud '1 month'  --> 2024-05-12T14:04:08+00:00

    date -Is -ud '2024-04-10T13:31:46+04:00'     --> 2024-04-10T09:31:46+00:00
    date -Is -ud 'Wed, 14 Feb 2024 23:16:09 GMT' --> 2024-02-14T23:16:09+00:00
And if you want Z format, you can use a custom format string:

    date -ud '1 month' +'%FT%TTZ' --> 2024-05-12T14:13:22TZ
So it would be easy enough to add an alias/function to your shell configurations and just use built-in date rather than installing a new program. Assuming GNU date relative date parsing meets your needs - I don't think it's quite as flexible as some libraries, e.g. in Python.


I was thinking (and trying out) exactly the same, thanks for posting! There are many reasons to reinvent the wheel, so I have no specific feelings about this one. In my case I will certainly stick with GNU date. But there is one command line tool which complements `date` in a very helpful way: https://www.fresse.org/dateutils/


I built a JavaScript library for my needs: https://github.com/derhuerst/parse-human-relative-time


Standard PSA: Do note that storing future times only in UTC can lose important information for almost everybody at some point. When a country changes its daylight savings time dates (as happens regularly and has happened before for the US) or decides to stay in daylight savings forever (as the EU and US are both proposing), any future events stored as UTC may then be at an incorrect local time.

The safest format for a stored future timestamp is a local time, an IANA timezone name, and, if you need it for efficiency, a derived UTC time. But you need to rederive the UTC time whenever the timezone database changes (or whenever the related entity/user changes their timezone), or it may become incorrect, as above.


If you're worried about the future timestamp being at a specific local time (say 10 AM on a future date), it's actually better to store a naive time and a location than the IANA timezone name.

Then, if a state decides to stop doing DST, or any other funkiness like that, you can still get the time right.


What if my country splits itself in two, and the location “Oslo/Norway” that used to describe my time zone no longer describes the place I live. And they change the time zone offset to be one hour off from what it used to be for Oslo/Norway but in the place I live the time zone offset remains as it was before the split?


Don't use Oslo/Norway if you're in e.g. Trondheim; store Trondheim/Norway.

I should note that it is not possible to make a database of geolocation->UTC offset that is legal in all countries, since there are countries with disputed borders, some of which care very much that no software imply the "wrong" border.


Oslo/Norway is the name that common time zone databases use for all of Norway.


Comment you originally responded to suggested:

> If you're worried about the future timestamp being at a specific local time (say 10 AM on a future date), it's actually better to store a naive time and a location than the IANA timezone name.

Europe/Oslo would be the IANA name for mainland Norway (Svalbard and Jan Mayen are in different IANA regions).

If you're meeting somewhere at a specific time, you might as well specify that place really precisely (Maybe: Nidaros Cathedral, Trondheim, Norway) and then use that place to infer the proper UTC offset when the date approaches.

There's no perfect solution as landmarks, names, &c. can all change.


> Europe/Oslo

Sorry yeah, this is the one I was thinking of when I said Oslo/Norway.

Agree with your other points as well.

I guess to increase the long term robustness one could record some additional data:

GPS coordinates for the intended event

GNSS coordinates using another system such as BDS or GLONASS for the intended event

The current time with time zone at the time when the record was created

The IANA name of the location where the record was created




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: