
ISO-8601, YYYY, yyyy, and why your year may be wrong - ingve
https://ericasadun.com/2018/12/25/iso-8601-yyyy-yyyy-and-why-your-year-may-be-wrong/
======
kstenerud
It probably would have been better to use entirely different letters (xxxx
instead of YYYY) to reduce sources of human error.

Notwithstanding these small issues, iso8601 is a godsend, but even with this
spec it's amazing how many times we just get it plain wrong dealing with time.
Time is hard! It gets even worse in binary formats, dealing with leap second
tables, time zones, daylight savings, different epochs etc, which is why I
developed the smalltime [1] format.

[1]
[https://github.com/kstenerud/smalltime](https://github.com/kstenerud/smalltime)

~~~
gpvos
The Twitter bug in December 2014 was caused by using %G instead of %Y. So that
won't help much. G comes before Y alphabetically, as does almost ant other
letter...

~~~
userbinator
_G comes before Y alphabetically_

That's irrelevant. There's a comment I made on a discussion about this a while
ago:

[https://news.ycombinator.com/item?id=17059958](https://news.ycombinator.com/item?id=17059958)

It's sad that most people seem to put the blame on everything else _except_
the developer --- who was simply not exercising any common sense nor thinking
critically. There's a very good reason to how the format specifiers were
assigned, and anyone who doesn't notice the pattern (and surprising deviations
from it) has no one to blame but his/herself.

~~~
acqq
But what is relevant is that the manpage explanation is still "precise but
misleading":

[http://manpages.ubuntu.com/manpages/cosmic/man3/strftime.3.h...](http://manpages.ubuntu.com/manpages/cosmic/man3/strftime.3.html)

"The ISO 8601 week-based year (see NOTES) with century as a decimal number.
The 4-digit year corresponding to the ISO week number (see %V). "

It's still misleading in the sense: "do we want ISO standardized year? Hell
yeah! We've found what we want."

It induces all the wrong reflexes: is ISO datetime "the name of the _most
standard_ date format that can't be read wrongly? Yes. (
[https://en.wikipedia.org/wiki/ISO_8601](https://en.wikipedia.org/wiki/ISO_8601)
) Is it "ISO 8601"? Yes. Do we get "four digit" "decimal" year? Yes. Hm...
it's "week-based"? "Well every year has weeks, so I guess it's all right." Has
references to some other part of documentation, "that means that the details
probably don't matter much here." Etc.

The proper documentation would be: "%G (short for "wronG in most use cases")
returns the "week-number-based-otherwise-wronG-year" used only for the special
number-of-weeks-based calendar representations, according to the ISO 8601
rules for such representations, and it only by accident sometimes looks like
the common calendar year. Should not be used unless it's specifically needed
to produce such a number-of-week-based calendar. For the details about these
calendars see (the reference)."

In the later case the "don't touch this unless you know you need exactly this"
is explicit. It even gives a good mnemotechnic device for remembering the
"wrongness" of using it in most of the cases. Good documentation is really
important, even if the "traditional" -n-x users feel some kind of satisfaction
in having the most misleading or useless documentation, to the point that when
somebody asks some specific question they point to the very man pages which
exactly don't answer that exact question (there are many such examples on the
web). And I've personally seen exactly these same self-satisfied programmers
behaving just like the users they subconsciously (or consciously) mock, that
is, being equally clueless, once they are in front of any other, probably even
a bit better manual, but for the topics they are not familiar with. Like
setting up a darned printer.

Being user hostile is never a virtue, and never something that should be
supported or explained away.

~~~
userbinator
You are omitting the requirement of common sense, or should I say discouraging
it, and that is a dangerous path to go down. Almost all of the letters are
strongly mnemonic. Someone who does not even think about what %y or %Y would
be, after reading %G and thinking it correct, is going to have trouble with a
lot of other things too.

~~~
acqq
> Someone who does not even think about what %y or %Y would be, after reading
> %G and thinking it correct, is going to have trouble with a lot of other
> things too.

I'm surely not questioning that specific claim that you make now. However, my
major claim is:

Everybody is stupid, if he's not working in the area of his narrow specialty,
and more than that, even those who are working in the areas familiar to them
will not always have ideal circumstances in which they will use the manuals or
the APIs. Therefore designing anything only for those who have infinite time
and concentration to use your product is inherently wrong.

Specifically, I can imagine a person who under ideal conditions would spend
some necessary days to learn all the details of that formats and date use
cases and what not to have some other circumstances in which it has to produce
some result fast or while distracted, and that other persons who are in charge
to confirm what that person produced also fail to recognize the error, to the
point that the subtly wrong implementation is eventually not properly tested.
Which is what provably also happens in practice. And I also really saw the
persons capable of learning and remembering a huge amount of all unnecessary
(for me) switches for commands x, y, z, x1, y1, z1 etc, in some other, still
not too stressful situation unable to eventually manage to install the
mentioned printer on the same OS.

In the same sense, the writer of manual who spends days learning about all the
features that he documents should not assume that his readers will spend the
same amount of time or will have necessary conditions to figure out all
nuances that were "obvious" to the writer at that specific moment. In fact the
similar forces that produce buggy code are the ones that produce poor
documentation.

So we should not try to excuse both bad code and bad documentation, but
instead support the "empathy" for the possible less-than-ideal conditions of
our users.

And I claim that the documentation obviously produced without understanding
what most of the users need to be able to easily read from it is bad, and that
it should be recognized as such.

------
marcosdumay
It would be nice if anywhere at the beginning of the post or around it the
author said it was about swift, or that the blog is focused on it.

I understand other languages may have similar problems, but they will
certainly have very different formats.

~~~
wool_gather
> they will certainly have very different formats

Not necessarily; as the article discussed, the format characters are part of a
Unicde standard. Apple/Swift Foundation uses the ICU library[0] for the heavy
lifting of date formatting, which is certainly widely available (by design).

[0]:[http://site.icu-project.org/](http://site.icu-project.org/)

~~~
marcosdumay
That reference seems to be offline at the moment...

I don't think it's widely used. It is great that some standards body took that
problem into their scope, but it may take a long while until most languages
agree on anything here.

~~~
tedunangst
Yeah, what crazy obscure languages expose a binding to strftime???

------
pavlov
IMO there’s no valid reason for date parsers to accept YYYY by default.
There’s no sensible use case where you’d want to mix week-based dates with
month-based ones, and the latter type is way more common.

So why not have a separate “WeekOfYearDateFormatter” subclass for the rare use
case? The default class could then explicitly fail when you’re accidentally
using YYYY in a format string, saving you from a weird end-of-year bug that
might spoil your holiday.

~~~
ivan_gammel
It would reduce value of a pattern-based API as an idea, that you can specify
any format with combination of special characters. But you are right: this
case is so special that it would make sense to protect majority of developers
from misuse. If builder pattern is used for date formatters, then it could
throw an exception unless certain flag is also set:

    
    
        var formatter = format(“ww-e-YYYY”)
                        .withWoYCalculations()
                        .build();

~~~
naniwaduni
One of the big benefits of using these APIs is often that you implicitly
accept a greater range of user formats than you anticipated.

------
Groxx
Most people, for their _entire_ career, should literally never write out a
date/time-format string. Even once. If you want ISO8601, _use that constant_ ,
don't write it out.

Use pre-defined formats, like 8601, 3339, Long, Short, etc. Or datetime
skeletons if your system supports it and you MUST do something non-standard,
and even then do a day or three of research before typing the first letter.
Basically nothing else is even remotely acceptable for internationalization
(and whatever the "intercalendarization" equivalent would be) and stands a
good chance to get even the _extreme basics_ like "yyyy" wrong.

~~~
paulryanrogers
IIRC, ISO8601 isn't a single constant, fixed format. It's a variety of
optional representations that generally follow the rule least-to-most
specific. For example 19991231 would be as valid as 1999-12-31.

~~~
Groxx
Yeah, good point on 8601. But that's _even more_ of a reason to use a pre-
defined 8601-parser/printer/whatever instead of a hand-written format string.

If you're using 8601 for an interchange format... don't. But if you're forced
to emit a hard-coded format for whatever consumer can't be forced to do things
sanely, this would be one of those exceptions to "most people". And it should
immediately stop after that exception.

------
pjungwir
In Postgres `extract(week from t)` has a similar danger. If you combine it
with `extract(year from t)` then you land in the wrong place. You need to
combine it with `extract(isoyear from t)` instead. That bug is sort of the
opposite of this one: instead of parsing a date and using the ISO year by
mistake, it's formatting a date and omitting the ISO year by mistake.

~~~
naniwaduni
This seems substantially less likely to be a problem since

(a) G-V-u is relatively rare compared to Y-m-d (b) The choice-of-year problem
is inherent to week-based dates, but not to month-based dates, since years are
not week-aligned, so knowing the difference is table stakes for implementors
of week-based timestamps

Now, "isoyear" seems like an awful name in various ways, not least of which is
people seeing "iso" and assuming that this is what they really want (i.e. the
same problem all over again)...

------
hafthor
Java:

new java.text.SimpleDateFormat("YYYY-MM-dd").parse("2018-12-30") returns Sun
Dec 31 2017

new java.text.SimpleDateFormat("YYYY-MM-dd").format(new
java.util.Date("12/30/2018")) returns 2019-12-30

java.time from Java8 also affected.

java.time.format.DateTimeFormatter.ofPattern("YYYY-MM-
dd").format(java.time.LocalDate.parse("2018-12-30")) returns 2019-12-30

~~~
adamvoncorswant
Isn't it fun that while we're all discussing 'yyyy' vs 'YYYY', it sees like
we're missing out that what everyone actually should use most of the time is
rather 'uuuu'?

~~~
lstamour
uuuu appears to be Java-specific:
[https://stackoverflow.com/questions/41177442/uuuu-versus-
yyy...](https://stackoverflow.com/questions/41177442/uuuu-versus-yyyy-in-
datetimeformatter-formatting-pattern-codes-in-java)

~~~
adamvoncorswant
Yes, indeed so. That's why I answered in the java thread. Sorry for not making
that clear.

------
monochromatic
Some additional stories I enjoyed about this “feature” (which I’d never heard
of):

[https://rachelbythebay.com/w/2018/04/20/iso/](https://rachelbythebay.com/w/2018/04/20/iso/)

[https://rachelbythebay.com/w/2018/05/13/dates/](https://rachelbythebay.com/w/2018/05/13/dates/)

------
sam_goody
This appears to be about the DateFormatter class in Swift.

However, it states there that YY is part of the Unicode standard, so I imagine
it might affect other languages as well.

~~~
saagarjha
Foundation (where DateFormwtter is defined) does use Unicode’s standard, so
this should be language agnostic.

~~~
ubernostrum
A lot of languages take their date/time formatting from C strftime (and quite
a few simply use light wrappers around actual strftime), where the format code
for ISO year is %G.

And FWIW, Python's (strftime-based) datetime library won't let you mix ISO and
non-ISO format codes. Trying to use %G with %m, for example, raises an
exception, as does trying to use %Y with %V (%V is the ISO week number format
code).

~~~
ubernostrum
And just to clarify a bit: the specific restriction Python imposes is that if
a strptime() format string contains one of %G (ISO year) or %V (ISO week
number), it must also contain the other one, and must contain a day-of-week
format code (%A, %a, %u, or %w).

Examples:

'%G/%m' is illegal; it contains %G without %V, and does not contain a weekday
format code. Attempting to call strptime() with this format raises ValueError.

'%V/%u' is illegal; it contains a weekday format, but has %V without %G.
Raises ValueError.

'%G/%V' is illegal; it contains both %G and %V, but does not contain a weekday
format code. Raises ValueError.

'%G/%V/%u' is legal; it contains both %G and %V, and contains a weekday format
code.

'%G/%V/%w' is legal; it contains %G and %V and a weekday format code. It's a
bad idea, though, because %w numbers days 0-6 starting Sunday, while ISO (%u)
numbers them 1-7 starting Monday.

If you need to work with ISO week date formats for some reason, you should
stick to one of these two format strings:

'%G-W%V-%u'

or

%GW%V%u

The date of this comment (December 26, 2018) comes out as either '2018-W52-3'
or '2018W523' using those format strings.

~~~
bonzini
What is the rationale for forcing the presence of the day of week? It seems
plausible that a weekly report, generated every Sunday for the previous week,
would have %GW%V as the title. Seems more correct than using %V together with
%w at least.

~~~
ubernostrum
I don't know for certain, but what I would guess is that strptime() without a
day-of-week indicator is ambiguous.

strptime() produces a datetime object, which consists of year, month, day,
hour, minute, second, microsecond, time zone, fold. If you do something like
"2008-12" with format "%Y-%m", strptime() will fill in the remaining arguments
with day=1 and all time components set to zero, so what you get is
datetime(year=2018, month=12, day=1, hour=0, minute=0, second=0,
microsecond=0).

That works because it's unambiguous -- there aren't multiple possible
numbering schemes for the day of the month in the strptime() formatting
options.

But there _are_ multiple possible numbering schemes for the day of the week,
which means a year + week with no day-of-week format code is ambiguous. Worse,
the two options don't even share a start: one of them begins numbering at 0
(Sunday) and the other at 1 (Monday).

So I'd guess the insistence on a day-of-week format code is to force you to
indicate which day-numbering scheme you want, in order to avoid the possible
ambiguity.

(and you might think it's reasonable to assume if someone uses ISO year + ISO
week number, they'd also want ISO day-of-week number, but we're talking about
dates and times here, and "reasonable" left the building a long time ago)

------
scottlamb
How hard would it be to make a liner for the common cases if this? I'd expect
most format strings to be literals (no concatenation or anything), passed to
the formatting API in a way that can be easily statically determined. Any use
of YYYY without we should be assumed to be a mistake.

It can also be detected at runtime. Debug builds could warn it crash.

Future APIs just shouldn't have YYYY mean this. Use another letter.

~~~
scottlamb
Bad autocorrect / typos:

    
    
       * s/liner/linter/
       * s/we/ww/
       * s/warn it crash/warn or crash/
    

Sorry; I typed this comment on my phone in a hurry, then didn't look at it
again until I could no longer edit it.

------
dmitrygr
As always, the 2-part "Falsehoods programmers believe about time" is a good
read:

[https://infiniteundo.com/post/25326999628/falsehoods-
program...](https://infiniteundo.com/post/25326999628/falsehoods-programmers-
believe-about-time)

[https://infiniteundo.com/post/25509354022/more-falsehoods-
pr...](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-
believe-about-time)

------
tnorthcutt
A little amusing that PHP, everyone’s favorite whipping language, eliminates
this particular footgun (it uses ‘o’ for the week-based year).

~~~
rachelbythebay
I’ve seen it done. Had to patch a few, too.

~~~
saagarjha
‘o’ comes first alphabetically, so I can see how this can still be an issue.

------
glitchc
Maybe YYYY should be changed to IIII minimizing confusion.

Related article (heavily discussed on HN in the past):

[http://rachelbythebay.com/w/2018/04/20/iso/](http://rachelbythebay.com/w/2018/04/20/iso/)

------
sneak
Seems like a good candidate for a GitHub sitewide search, with (manually
filed) issues against improper uses (maybe a cut and paste) to alert callers
who are using it incorrectly.

------
estebank
This is one of those bugs that scare me the most, because they are literally a
time bomb.

Twitter was affected by this a few years back
[https://www.google.com/amp/s/amp.theguardian.com/technology/...](https://www.google.com/amp/s/amp.theguardian.com/technology/2014/dec/29/twitter-2015-date-
bug)

~~~
macintux
Agreed. Calendar/time-related bugs are brutally hard to test for, one reason I
was so pessimistic about Y2K.

------
jimmychangas
Software i18n is still hard to do, even in 2018.

Not completely related to the post, but sometime ago I had to build a calendar
widget, in javascript, capable of displaying ISO and Hijri dates
simultaneously. It turns out the most reliable way to convert between the two
is to use a lookup table, similar to what Java Time does. Algorithmic
implementations available started to drift in odd ways after a certain number
of days.

------
Waterluvian
Are these date format strings part of ISO-8601 or standardized in some way?
They look pretty much identical in JS and Python.

If so, maybe a big help in preventing human error are editor plugins that
verbalize what a given string represents. I found the regex websites that do
this to be invaluable in learning and validating regex.

~~~
saagarjha
> Are these date format strings part of ISO-8601 or standardized in some way?
> They look pretty much identical in JS and Python.

They’re part of Unicode.

> maybe a big help in preventing human error are editor plugins that verbalize
> what a given string represents

This is a good idea, but it’s easy to get this wrong. There are websites that
let you enter a format string and it will format the date for you as a
“preview”, and they often have a list of format specifiers that you can pick
from. The issue is that it’s easy to pick YYYY because it might end up coming
first in the list and have a description like “the year”, which makes it seem
no different than the one you’d want to use.

------
Pxtl
Case sensitivity is a mistake in every platform that embraces it. I will die
on that hill.

~~~
lmkg
I'll go ahead and voice some support. My position isn't as extreme, but I do
think that case-sensitivity is overused and should not be the default. There
are places where it makes sense and should be used, but not many compared to
how much it's used in the wild.

This is _especially_ true for programming languages. Case sensitivity means
that fooBar and FooBar can be valid identifies in the same scope but refer to
different bindings. I see many ways where that can produce an errors and very
few (possibly none) where it can help create well-structured code. If the
names shadow, clash, or override then the error cases become much easier to
see and diagnose.

Honestly one of my favorite tiny things about programming in Lisp is the
identifier rules. The caps insensitivity, plus '-' being available because
there's no infix operators, means that multi-word identifiers are easier to
type (no shift key) and IMHO more aesthetically pleasing than underscores or
snake case.

~~~
Pxtl
Yeah, I could see a language taking the position that referencing fooBar as
FooBar is an _error_ , and that level of case-sensitivity makes sense to me
(my name is Martin not martin) but allowing both Martin and martin to exist as
distinct identifiers is crazy.

On the subject of spaces, I do wonder why no language has used enough
punctuation to make spaces allowed in identifiers. I mean, most Algol-derived
languages could get away with it if they put punctuation between type keywords
and variables:

    
    
        private, my class: my instance = my factory function(my parameter one, my parameter two);

~~~
ken
Lisp allows spaces in identifiers, or any other character. You just have to
escape them:

    
    
        (defun |my + function| (x y) (+ x y))
        (|my + function| 2 2)
    

(Or maybe write a reader macro to deal with them in some other way.)

~~~
Pxtl
Hah, that's even uglier than the SQL "just put square brackets around it"
approach.

We're looking for something _nicer_ than underscores and hyphens.

~~~
andreareina
Being that whitespace is the primary way most languages separate tokens, yeah
putting whitespace inside tokens is going to be ugly.

------
emmelaich
Golang's time formatting seemed odd.

Now it looks like genius.

------
geocar
When designing a new API, it makes sense to consider this kind of thing: How
are people going to use it, and how are people going to screw it up? Kids
these days copy and paste everything so bugs tend to multiply if you (god
forbid) make a popular API. Date formatter APIs are opaque and anyone trying
to use one in the morning or before the 12th day of a month has to check the
documentation, so what do we actually gain with this abstraction?

There's a small number of formats you might need that it makes sense to try to
enumerate them:

* ISO8601 (and get it right, it's a comma not a dot)

* that weird ISO8601 variant that uses a dot

* kdb's .z.p

* DJB's TAI

* ISO/IEC 9899 asctime/ctime

* IEEE 1003.1 "ls" format

* Yankee-doodle format/other localised formats

* RFC1123/RFC7231

* RFC2109

* RFC822/RFC2822

* Fancy "X units since/until" relative time

Do you really need so many others that you should have "yyyy-MM-dd" anywhere
in your code? Each of these are trivial to construct from a struct tm/Date
object, that you'll end up with fewer bugs if you stop making up mini-
languages for dates and just do it directly. Oh and your code will be faster.

~~~
fanf2
Both comma and dot are allowed. My copy of ISO 8601-2004 says in section
4.2.2.4 “Representations with decimal fraction“:

<< If a decimal fraction is included, lower order time elements (if any) shall
be omitted and the decimal fraction shall be divided from the integer part by
the decimal sign specified in ISO 31-0, i.e. the comma [,] or full stop [.].
Of these, the comma is the preferred sign. >>

Also, RFC 3339 is useful to have as a specific standard profile of ISO 8601,
because 8601 comprises several formats with lots of options.

~~~
geocar
> Both comma and dot are allowed.

They _are_ distinct formats: You're not going to alternate them in your
output. You're going to output one or the other.

> Also, RFC 3339 is useful to have as a specific standard profile of ISO 8601,
> because 8601 comprises several formats with lots of options.

RFC3339 describes ISO8601 well enough. Is there something you think I missed?

~~~
fanf2
I didn't notice any glaring omissions, tho if you are including DJB's binary
format you should also include the NTP and PTP scales.

RFC 3339 doesn't include the ISO 8601 week calendar formats, nor does it
include the 8601 syntax for time periods. 3339 is a lot simpler than 8601.

------
robrichard
Formatting dates with a format string is an anti-pattern. Format strings are
too easy to get wrong and don't scale when you need to account for different
locales. JavaScript's Intl functions do this correctly by providing an api
that accepts a locale and options like weekday: narrow|short|long, year:
numeric|2-digit, etc.

------
giancarlostoro
This sounds to me like a really weird YYYY2k problem.

In all seriousness, that is weird. In JavaScript you can't even format a
datetime string as far as I've researched, the only way is to import some
third party library, otherwise you're concatenating a bunch of function calls
to piece together the string you want.

~~~
tatersolid
Perhaps not part of ECMA standards, but browsers/node have Intl.DateTimeFormat
as the standard library

[https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Refe...](https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat)

~~~
giancarlostoro
I believe I've tried that approach before too, but fell short when my boss
wanted a specific format that just wasn't doable with that approach. I'm still
a little shocked that this isn't something part of ECMA somehow given how many
more other programming languages support using time formatted strings.

~~~
tatersolid
The whole point of Intl.DateTimeFormat is to _intentionally not_ give you the
specific format your boss wants, _because your boss is almost certainly wrong_
about how a date should be formatted in many cases.

You tell the API you want the date and the hours and minutes, but not seconds.
It localizes the result correctly for the vistor’s preferred culture
(language-country).

If you show someone from England “4/5/2019”, they will think it represents a
date 31 days after someone from the USA given the same string.

------
Svip
Using 'e' in the 'ww-e-YYYY' example is pretty bad form, as it is dependant on
your local system. So on a US system, 1 in the 'e' field means Sunday, but in
say a German system, it would mean Monday. Although, I cannot figure out which
one would be system independent.

------
rzzzt
The same format string for "week years" was introduced in Java 7:
[http://www.juandebravo.com/2015/04/10/java-yyyy-date-
format/](http://www.juandebravo.com/2015/04/10/java-yyyy-date-format/)

~~~
mehrdadn
I get how the year went _forward_ at the end of the previous year in your
example, but how in the world in this post's article did it move _back_? I'm
so confused.

~~~
rzzzt
I think that's the date parser's behavior in Swift (if no additional
information is given in the input string, revert to the first day of the week
before the first week of the year). The article I linked to displays what the
formatter does.

------
djhworld
Got bitten by this one, I can't remember the specific details other than the
bug was due to using YYYY in a system that parsed log lines.

The bug manifested itself around this time of the year, so not a great time to
be called out to find billions of log lines are being rejected by the system!

------
firethief
This website degrades unusually poorly with 3rd-party JS disabled. It looks
fine, and I was really confused until I realized the main content was in
images that had been omitted without any kind of placeholders.

------
imrehg
Seems like crushed the site (I get a database error)

~~~
LeifCarrotson
[https://outline.com/xSadBb](https://outline.com/xSadBb)

TLDR: ww-e-YYYY gives you week number, day in week, and the ISO year in which
to count weeks. yyyy-MM-dd gives you the calendar year, month, and day. Using
YYYY when you mean yyyy give you unexpected results.

YYYY-MM-dd unexpectedly - or expectedly, depending on what you expect from a
programming language and ISO spec combo - gives week zero (you didn't specify
ww), day zero (again you didn't specify e), which means it gives you the first
day of the last full week of the preceding year. 2019-1-1 parsed with YYYY-MM-
dd will return a Date of December 23rd, 2018.

