Hacker News new | past | comments | ask | show | jobs | submit login
Start sending dates the right way (aka The ISO8601 101) (tempus-js.com)
86 points by Keithamus on May 21, 2012 | hide | past | favorite | 80 comments



ISO 8601 defines a number of lesser known features. For example, it allows for not only fractional seconds (12:34:56.78) but also fractional hours (12.3) and minutes (12:34.93). There is a special notation for the midnight of certain day (24:00:00 or 24:00), and obviously another for a positive leap second (08:59:60 etc.). There are three ways to write a date: 2012-05-21, 2012-W21-1 and 2012-142. Intervals can be specified in terms of start and end, start and duration, duration and end, just a duration (i.e. no context). There are also recurring intervals which can be bounded in the number of recurrences or not. And so on and on and on.

That said, ISO 8601 tries to cover most cases for date/time representation. Implementing every bit of ISO 8601 is not desirable of course, but it is certainly worth looking at.


> 2012-142

Does that represent the 142:nd day of the year? Isn't that ambiguous when you can specify a year and month in the same format?


The format is YYYY-DDD with leading zeroes.


So 2012-142 is not a valid date? Should it be 2012-0142?


2012-142 is valid. 2012-012 is valid. 2012-12 is not. For nth day specify exactly three digits. No confusion with months.


To be precise, 2012-12 is valid and refers to a month. There is indeed no ambiguity.


So how would I refer to the 12:th day of the year?


2012-012. Duh.


I don't see the need for a placeholder for the n-thousandth day of the year. Unless we want to accomodate something like 2011-0367 also equating to 2012-0002?


... with ISO8601 intervals you can express this as a string format, rather than using something ghastly like seconds or milliseconds:

  'P6Y4M4DT3H45M15S'
To me, that is absolutely no less "ghastly" as just saying the period is 3600 seconds (or whatever...)


Well, 3600 seconds can be expressed much more simply than that: 'PT1H'. Which is more human readable than '3600'. The point in intervals is a compromise between human and machine readability - the point being that we can more easily determine long periods of time expressed in unit values rather than milli/microseconds.


> 'PT1H'. Which is more human readable than '3600'

Not to mention six years, four months, four days, three hours, forty five minutes and fifteen seconds, which is nothing short of 200072715 seconds (which my brain definitely wants to parse as two billions and something).


Yeah, I was going to say that too...

> The nice thing about ISO intervals is that they are human readable

That P6 monstrosity is not human-readable. Not by 99.999% of humans, anyway.


Nonsense, it's very readible. You could tell someone who can't code to say "period" for "P", "hour" for "H", "time" for "T" etc. and they'd be able to read out the period exactly, accurately all the time. They would also know how write their own forms of this. Humans can also look at it and know, intuitavely, without a calculator how long it is.

Trying to get them to multiple seconds (incl all the fun with leap seconds!) would be hard.


What is all this "Human Readable" nonsense. Who the hell is reading all these date times?

If two pieces of software are passing dates around, use a UNIX, UTC timestamp. If a human wants to read it they're probably a programmer and know how to parse a timestamp. If they're not a programmer, then you shouldn't be showing them unformatted date times anyway.


(1) dates/times before 1970 cannot be represented using a Unix timestamp

(2) you'll have much fun with leap seconds

(3) a Unix timestamp is just a number represented as floating point ... not much different from all other numbers. A standard date representation on the other hand has a unique format that's distinguishable from all other data types. This comes in handy not only for humans reading it, but also for parsers of weak formats such as CSV

Also data lives longer than both code or the documentation describing the data, this being the main argument for text-based protocols.

(4) programmers are people with feelings too and would rather read human readable dates (since they are so common), rather than lookup the boilerplate for converting a Unix timestamp to a human readable format. Here's the Python version, which I know by heart just because I had to deal with a lot of timestamps:

     timestamp = time.time() # now
     datetime(*time.gmtime(timestamp)[:6]).strftime("%c")
Oh wait, that gives me the date in UTC. Ok, that date is meant for me, the programmer that knows what he's doing, so here's how I can read it:

     datetime(*time.localtime(timestamp)[:6]).strftime("%c")
Great, now I should write a blog post about it. Or you know, just use standard date formats that are human readable, because there's a reason why our forefathers used the dates representations that we use today, instead of counting scratches on wood or something ;-)


Times before 1970 can be stored as negative numbers.


True, but many libraries still do not know how to deal with negative timestamps. For instance PHP on Windows, prior to 5.1.0. MySQL's FROM_UNIXTIME was also not working last time I tried it.

And many applications and scripts can break, like those that store and manipulate timestamps assuming a certain format (e.g. positive integers).

The Unix timestamp was created to deal with timestamps, not with date representations. Therefore this requirement (dates prior to 1970) was not an issue that had to be dealt with explicitly.


calling out PHP is a bit of a cheap shot. Nothing works properly in PHP.


This is good general advice but breaks for things which don't fit the Unix timestamp assumptions of second-level accuracy: I need ISO8601 because I handle dates for which lack precision below at day at best and often just a month or year. If I'm formatting dates there's an implied precision difference between displaying '1923-1927' and 'January 1st 1923 to January 1st 1927'

If you need to handle variable precision, time zones, ranges, etc. you can either invent your own format or use ISO-8601 which at least has the virtue of being more human readable and more likely for people to have previously encountered.


Very good advice! ISO8601 is a perfect tradeoff between the machine readable Unix timestamp, and the human readable mess used in http.

The only thing libraries often get wrong is that the timestamp should always be present and default to 'Z'. It's pretty rare to want timestamps in local time except during debugging, but that's often the default. It catches a lot of people out.


Yes, this is one thing that I've had a lot of difficulty with when trying to implement a reliable ISO parser. I've found that Python is one of the worst offenders with this, as by default the datetime classes have no concept of tz, and it is significantly more difficult to attach tz info to a Python date. (Loosely mentioned in the article).


Take a look at pytz and python-dateutil


RFC3339 is a pretty reasonable profile of ISO8601 http://www.ietf.org/rfc/rfc3339.txt

I'm always a bit nervous when people talk about ISO8601, given that very few people have probably read the spec, and are likely guessing as to its content.


I agree. The main barrier to actually reading the ISO8601 spec is that it costs money, and is not cheap (if memory services its ~$150), leaving people to read the draft specs (which are harder to get a hold of in their full form) or the Wikipedia entry.


For 99% of use cases on the web, you'll probably manage to get away with the W3 subset:

http://www.w3.org/TR/NOTE-datetime

which avoids most of the awkward corners, and for which a parser is a bit easier to write.


Possibly true, except that parsing dates and times should almost always be done with strptime() or a similar library function.


Full ISO 8601 is complicated. Try to restrict yourself to RFC 3339 http://tools.ietf.org/html/rfc3339 a profile of ISO8601 if possible.


This is a great recommendation but if you're trying to be interoperable with other languages it's just not possible.

If you're in control of the output format though, I would fully agree.


The original article is about generating date strings, and I would give it more credence if it mentioned the RFC 3339 profile. (edit: I just noticed you're probably the author; please excuse my abrupt tone)


It's true, I should have mentioned RFC3339 in the article. I will make some amendments to it later today.


You can pry my "<milli|micro>seconds since the epoch" timestamps from my cold dead hands.


You'll shoot yourself soon enough when you will need to handle timezones and become sick of inventing your own ∆epoch+TZ format :-)


Unix timestamps are amount of time since the epoch in GMT, so there is no need to store the timezone.


Nope. They don't include leap seconds.

The technical definition is "Number of days since 1st Jan 1970 in GMT" × 86400 + "Number of seconds since midnight GMT"

Unix time clocks does funny things when there's a leap second which is about 1½ years. Some pause for a second, some go forward then backward etc.


Attaching a timezone to a time is an information in itself. By coercing time data to UTC you are losing this bit of information.

A user may want to have some information shown as 8AM PST at the same time he wants another one shown as 8AM CET.


A timestamp and a timestamp with time zone are two different things and you need to use the appropriate one (which is usually time zone with timestamp). However, the argument is really not about attaching timezone information to timestamp which is an unfortunate but necessary thing, but storing times in the archaic multiple-base mixed format stuff humans traditionally use (because it's human-friendly) that computers simply do not need. An ISO timestamp mixes bases 60, 24, 12, a weird mix of base 28, 29, 30 an 31, and a weird mix of 365 & 366. This is craziness.


You mean UTC? GMT is tied to London, and could be confused with BST during summertime. UTC isn't.


Yes.

Technically UTC = GMT and always has and for this "epoch time" conversation is completely identical.

However there's a people problem. Some people think "GMT = the time in London now" which it isn't, since the UK switches to BST for daylights saving. Saying "UTC" avoids the "time in London" interpretation problem.


Then you'll just love TAI64:

http://cr.yp.to/libtai/tai64.html


So you software doesn't support timezones? Good luck with that.


Yes it does. Store UTC times, display in local timezone.


I suppose the only drawback to using epoch alone would be if you need to remember the timezone the date was stored from. But thats easy enough to fix.


which, if you think the problem through enough you'll have to conclude that you do need to store the time zone, unless you just don't really care about having accurate times. Read through some of the other comments and you'll realize that there's certain kinds of calculations and comparisons you just can't do without the time zone, and a historical time zone database, and a leap seconds database. Do not make the mistake of assuming that dealing with times and dates can be easy.


Mad late reply but i use epoch when my time needs don't matter. Aka: I can get by without worrying about epoch not being second precise and I don't need to worry about dates except to display them in non UTC.

For my needs 90% of the time iso8601 is overkill and unnecessary.

But dates in what I do don't need to be complicated. Which is rare. Also I never said working with times and dates is easy, evaluate your needs for the situation. Going full tilt with timezones and full date parsing for some general server logs for example doesn't always make sense is all I'm saying. Tschuss!


I like to have a floating point value of seconds since the epoch for my dates. Python is a wonderful language.


+01:00 is not a useful timezone. If you want human-friendly semantics for things like "+1 day", you need to use the symbolic name for the timezone (Europe/Lisbon or whatever). If you just want an absolute time, you're better off expressing it in UTC, possibly even as seconds since epoch.


Correct, these are time offsets, not time zones. Although, absolute times are exactly what they are for and the reason why e-mail headers use this format. Whenever referring to offsets ("+1 day") or reoccurring events such as in a calendar app ("every 4/1 at midnight"), a real timezone is needed. I've seen plenty of code introduce DST related bugs by taking the current timezone offset and using it to do date/time calculation throughout the life of the application.


There is a difference between "+01:00" and "Europe/Lisbon". One is always 1 hour away from UTC, the other isn't and includes summer time.


Yes, exactly. My point is: what is the use case where you would ever want "+01:00"? If you want to express an absolute[1] time, you use UTC or seconds since epoch. If you want to express a time that's meaningful to humans, you need a symbolic timezone (otherwise the answers to questions like "what is the time 1 day after this" will be surprising).

[1] Yes, I know there's no such thing as absolute time; perhaps "machine time" expresses what I mean.


Exactly. One should just about always store UTC time.


Beware that older browsers, including IE8, won't automatically parse ISO8601 dates, so Date.parse('2012-05-04T12:20:34.000343+01:00') or new Date('2012-05-04T12:20:34.000343+01:00') return the date representation of 'invalid date'. ISO8601 parsing was only introduced with JavaScript 1.8.5. This means that if you're supporting older browsers, you'll need to write your own parser on the client side or use a third-party library (I generally use my own regexp based parser)


If you look at the website this article is on (http://tempus-js.com/) it is a JavaScript library for replacing the Date object with something that offers more functionality and is browser compatible down to IE6 & co.


Safari still does not parse partial dates, i.e. new Date("2012-05-04") gives 'Invalid Date'. Very annoying.


If you're sending or storing local dates and times, it may actually be better to use the local time (as ISO 8601 does) with the (Olson tz) name of the time zone it's in, instead of the time zone offset as in ISO 8601. This is more resilient against future time zone rule changes, at least that's what this article argues: http://fanf.livejournal.com/104586.html .


The article doesn't mention the fact that ISO8601 strings will also sort correctly in chronological order, or does this not matter since epoch-seconds do as well (as long as there are enough leading zeroes)?


I never understood what the T was for, besides being visual noise. why not just use spaces?

    2012-05-04 12:20:34.000343 +01:00
Or, if spaces are not allowed for an unknown reason then:

    2012.05.04-12:20:34.000343+01:00
Much better, still I prefer the simplicity of this:

    20120504.122034
In UTC, almost as compact as an epoch, but human readable. Add as many digits as nano precision is needed.


The first option is generally allowed by protocols where spaces are appropriate. Using the other alternatives would get one into the business of defining standards, which no sane person with an appreciation for the subtlety of that task would do unless they had no choice. Having multiple separators or mashing the numbers together would undermine the distinctiveness of ISO8601, and the distinctiveness allows someone to know the precise semantics of a date even when it is taken out of context.


Postgres's timestamp type uses a space instead of a "T", e.g., "1999-01-08 04:05:06"

http://www.postgresql.org/docs/8.0/static/datatype-datetime....


You say "visual noise", I say "readability".


I don't get the argument that timestaps are not human readable -- they are, the only thing you need is a good viewing software. And yes, it is worth it since at least 99.999999% of accesses to this data come from machines.


To paraphrase your own comment:

Timestamps are human readable. You just need a machine to change them into a human readable format


I was trying to say that it is better to invest a bit of time in improving data viewers/editors to present timestamps in a nice form than to waste huge amount of effort on parsing and serializing stuff no human will ever see. Besides, human readable is a relative term -- even ASCII text requires serious machine processing to become human readable pixel pattern.


They're certainly not human readable, but I want to know who these people are that are reading the date times being passed around by two pieces of software.

Also, why are people passing around date times for specific timezones? Converting a UTC timestamp to local time is trivial in every language I've used. Converting a local time to another local time isn't.

It sounds to me like a problem that doesn't exist, and a solution that causes more problems. I'll be sticking with unix timestamps thank you very much.


I don't understand why you would ever use anything other than seconds/micros/millis since epoch. Basically no parsing; Efficiently storable; Simpler; Easier; Better.

Only argument I've ever heard is human readability... If you're reading these by hand so often then just write a script/tool/whatever to convert them to human readable. This is still easier since you don't have to find an acceptable ISO8601 implementation. And frankly, how often are you reading the dates manually but not as part of some log output of your program where it could convert it to human readable before logging?

This is sending dates the wrong way.


One reason is because unix time is not actually seconds since the epoch. It doesn't include leap seconds (a leap second is added about every 1½ years). Currently unix time is about 30 or so seconds behind "Number of seconds since (unix) epoch".


Why does that matter? You can reliably calculate the missing seconds when converting it for output.


No, you can't reliably calculate it forward or backwards. To figure out the offset for dates in the past, you'd have to store the list of when leap seconds are added. This system is hardly a "just the number of seconds since epoch", but now have a arbitrary data attached. The official group that figures out leap seconds can't predict things more than 6 months in advance. How are you going to figure out in advance what's going to happen in 2 years time?


So every mail header in the world should represent:

  Date: Thu, 17 May 2012 08:36:04 -0700 (PDT)
as

  Date: 1337268964
because no one needs to look at mail headers and it should be the job of the user's mail reader to display it in a human readable form? I do not think it is so black and white. In the case of mail headers, the benefit of keeping the timestamps human readable outweighs the programmer cost for parsing them correctly.


"no one needs to look at mail headers and it should be the job of the user's mail reader to display it in a human readable form".

Exactly...

Who reads email headers? Maybe some server admin who's going through some backups of emails looking for something. He might even appreciate having a date format that's lexicographically ordered for his searching purposes.

Having a human readable date is like changing HTTP Content-Length to say "one million four hundred and fifty thousand and sixty four" so that when I'm reading through my raw server logs I can easily see the magnitude of the length of the responses.


I prefer epoch time for all the reasons you mentioned, but there are situations where you need something else. For example, you can't represent a holiday in epoch time (July 4th has four different start times in the lower 48), and if you need to do precise calculations epoch time is ambiguous around leap seconds.

I wish there was something simpler than 3339/8601, it really is a daunting mess of incompatible implementations and optional elements, but time is hard.


Reliability and lack of information. The problem with a "format" such as timestamps is that there is no defined parameters, especially cross language. I cannot rely that sending a timestamp in seconds will be read in seconds the other side (for example JS does not use seconds, it uses ms), there is no way to tell what one is over the other, either. I cannot look at a ms timestamp and say "This is in ms" over a second-based timestamp. It also lacks information, such as tz, and is of a fixed specificity. What happens when I want higher resolution time due to customer feedback? Go back and change the whole system to use MS over S.

ISO8601 defines its parameters within the body. I know I am parsing seconds from an ISO stamp because it is appended with an 'S'. It can support variable specificity, so I don't have to be second-accurate if it isn't needed.


Unix time (which is UTC, and expressed in seconds) does not embed a time zone, which is useful information in some contexts.

For example, with e-mail you want to know both the absolute time (so the mail client can indicate how long ago that was) and the sender's local time (important for human interaction).

Implementation wise, making the timezone explicit forces the implementer to think about it at least once and not make the mistake of writing a local time, which works while testing on systems that are on the same timezone and breaks down in the wild.


What do we do about date calculations that cross an epoch wrap?


Signed 64 bit seconds since epoch is enough.


Is the whole Internet on 64-bit time_t now? Will they all be by 2038? I'm guessing we won't know that answer until 2037...


x32 ABI, which got merged in newly released Linux 3.4, defines time_t to be 64-bit. I am starting to feel hopeful that we will be better prepared.


That's cool for Linux, going forward. Now we just have to take care of all the old mainframes and legacy system that will never get their O/S updated...


We won't know until 1901 ...

   >>> datetime.datetime.utcfromtimestamp(-2**31)
   datetime.datetime(1901, 12, 13, 20, 45, 52)


Since nobody else has mentioned this, an interesting essay on time by Erik Naggum; it was done with thought to implementing a time library for common-lisp, but should be readable for non-lisp users:

http://naggum.no/lugm-time.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: