
Strftime's alpha-sorted man page vs. well-meaning people - zdw
http://rachelbythebay.com/w/2018/04/20/iso/
======
hyperman1
She raises a very valid criticism. The Java SimpleDateFormat has a similar
gotcha concerning %k vs %h ('What do you mean, my program only works in the
morning'?)

I don't think her solution is right, however:

    
    
      A better fix is to get people away from using format strings altogether. 
    

What do you propose instead?

    
    
      have one person get it right, [...]  then ban all other attempts to use the strings directly
    

Thats basically part of an in-house library for something simple. In my
experience, contractors either flat-out refuse to use it ('reinventing the
square wheel, har har har'), or don't read the docs and rewrite it using the
API. Then they extract common code from their application together with some
half-baked framework, declare it a new in-house library. so when the next
contractor comes, he has one more reason to ignore the well thought out
library. In-house libraries work fine for complex problems, but large
collections of small utilities get rewritten by everyone.

No. Learn the details of your programming language, libraries and frameworks.
Even their gotchas. Read blogs, listen to coworkers, learn the idioms.

As a result, you don't store this knowledge in your current corner of some
random company. You store it in the ecosystem of the language. It will
influence future languages, and slowly the gotchas either disappear or become
an near-universal standards.

And I guess that's exactly what this blog post helped to accomplish. Thanks,
Rachel ;-)

~~~
michaelt

      What do you propose instead?
    

Java provides a set of predefined date formatters in the standard library [1]
- like 'ISO_LOCAL_DATE_TIME' and 'RFC_1123_DATE_TIME'

Of course, they don't cover every possibility - if you want milliseconds,
you've got to roll your own :)

[1]
[https://docs.oracle.com/javase/8/docs/api/java/time/format/D...](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html)

~~~
hyperman1
Very true. Another example is the jquey UI date picker which stores nice
defaults in the i18n files.

I would classify these as storing the knowledge in the ecosystem.

------
userbinator
_Do you really want all of your programmers learning this lesson?_

IMHO yes, because if they don't pay attention to such things, they're going to
make even more subtle mistakes in the future.

Besides the explicit mention to "see NOTES", the fact that most of the format
specifiers are mnemonic for the common parts, like H M and S for Hour, Minute,
and Second, should be enough to arouse some suspicion when you see G being
used for "Year". They weren't assigned arbitrarily --- with the most mnemonic
ones being assigned first --- and presumably, people wanted to print years
very early in its history, so your next thought should be "then what did they
use Y or y for?" If you're really observant, noticing that months and days are
lowercase, you may also ask "what's g?" That will naturally lead you to the
right choice.

My point is, there's a pattern to these things, and the instinct to ask "why?"
and find out when something doesn't feel "right" is very important. More
abstractly, "reading requires thinking".

~~~
_vertigo
I don’t think that man pages being oriented in a more user friendly manner
will lead to programmers making mistakes in subtler areas.

You’re basically saying that programming even basic things should be hard so
that people learn to be thorough, but I think that people will have less time
to be thorough on the hard parts of programming if 1) even the easy parts
require navigating a needlessly complex mass of documentation and 2)
programmers have to spend time fixing all of the bugs caused each week by
needlessly complex documentation.

~~~
userbinator
I'm saying that the strftime() format specifiers for "basic things" are _not_
hard, since they're mnemonics, and anyone who doesn't find it the least bit
unusual to use %G instead of %Y when trying to print a year is not paying
attention. I do not think the documentation is "needlessly complex".

In fact, I just asked someone who _doesn 't know programming at all_, after
giving her "%m %d %H %M %S" and saying it was a way of formatting a date, to
guess how to add a year in front. She guessed... %y. That's actually the two-
digit year, which is close enough --- the point is she didn't say %G or some
other letter. I'd almost be willing to bet that if you repeated this exercise
with many other non-programmers, you'd find the same result.

Then, why would someone be perfectly fine with using %G? To rebut the first
sentence of the article, "I have another tale of humans being humans, and
getting into trouble when using computer systems", this is not "humans being
humans"; it's closer to the "intelligent human losing all semblance of
critical thought when put in front of a computer" phenomenon, which certainly
deserves further investigation.

~~~
kamikaz1k
You should have asked this person to do the exercise while having the manual
in front of them. Perhaps they might have made the same mistake?

Regardless, your test was specifically to confirm your bias that the mnemonic
is enough to make a correct decision. There have been plenty of discussions
about people blindly following authority. Can you really not imagine someone
reading the entry and going "ah, that's weird... But ok"?

------
kbob
"RTFM" used to be a valid critique, and it's my immediate response to this
article.

But maybe I'm wrong. The M keeps getting bigger, and our reading time keeps
getting more fractured. Maybe it's okay to just throw code out without knowing
what it does.

Or maybe management/engineering culture need to be clear on the external costs
of slapdash development and discourage breaking things fast.

~~~
oblio
> The M keeps getting bigger, and our reading time keeps getting more
> fractured.

Also the "M"s are generally really, really bad for any practical task. I'd
argue that many times they're also bad as references.

They're also not indexed or hyperlinked, there's no tables or any smarter form
of formatting. I know that there's also the info format but that's rarely used
and it's definitely not default, many distros don't even install the tools for
it...

It's a really thankless job to migrate documentation so nobody's going to do
it within our lifetimes, so we're stuck with man for the foreseeable future :(

------
fanf2
The irony of this bug is that immediately above %G is %F (short for %Y-%m-%d)
which is they wanted...

------
oherrala
There are 7458 hits for simple code search for "%G-%m-%d" on GitHub:

[https://github.com/search?q=%22%25G-%25m-%25d%22&type=Code](https://github.com/search?q=%22%25G-%25m-%25d%22&type=Code)

Plenty of code to go through and fix.

~~~
EdiX
Almost all of them are from the documentation of flysystem which is some kind
of PHP abstraction layer for filesystems.

~~~
TomK32
Not for much longer I hope
[https://github.com/thephpleague/flysystem/pull/927](https://github.com/thephpleague/flysystem/pull/927)

------
yason
Well.

Programming, like life in general, is made out of imperfections. There is a
shitload of corner-cases in APIs both ancient and modern that you cannot
possibly "fix for all" or, alternatively, learn to avoid. In many cases, it
suffices to take that %G after a cursory glance, bump into some weird non-
continuous dates later, read the man page again in the heat of a WTF, and
change it to %F.

There are applications and cases where dates, and especially the exactness,
correctness, and coherency of dates, is of some hugely important factor.
Software for banking, accounting, train scheduling, or flying come to mind. In
these cases the standard library behaviour can be tested for expected values,
or rather, a custom and audited library be written by the team.

But for the majority of programs, things like little glitches in dates,
compatibility of encoding conventions in file names and paths, floating point
numbers used as decimals etc. will be of little significance. It takes a lot
of these to align to unlock one big disaster, and while it is good to strive
for perfection there's always a cost to all the little things. If the failures
are manageable it is often more practical to just wait for those that will
eventually happen.

------
kazinator
The issue raised in this article is extremely tenuous.

The "Linux Programmer's Manual" man page from the Linux documentation project
has it as:

    
    
           %G     The ISO 8601 week-based year (see NOTES) with century as a deci‐
                  mal number.  The 4-digit year corresponding to the ISO week num‐
                  ber (see %V).  This has the same format and value as %Y,  except
                  that  if  the  ISO  week  number belongs to the previous or next
                  year, that year is used instead. (TZ)
    

This clearly says "%G is some ISO nonsense you don't want to be using".

But the "week-based year" alone should alert you that something is wrong.
Because, like, the year that you know is not week-based! There aren't exactly
52 weeks in a year: 365 isn't divisible by 7, and if it were, then 366
wouldn't be.

A (see NOTES) is also alarming. Why do you have to see some notes about
planting the year part of the date into a simple date string?

The glibc manual has it
here:[https://www.gnu.org/software/libc/manual/html_node/Formattin...](https://www.gnu.org/software/libc/manual/html_node/Formatting-
Calendar-Time.html#Formatting-Calendar-Time) It also makes it perfectly clear
that this is something weird:

 _" This has the same format and value as %y, except that if the ISO week
number (see %V) belongs to the previous or next year, that year is used
instead. This format was first standardized by ISO C99 and by POSIX.1-2001."_

 _Three_ warning labels here: ISO, C99 and POSIX 2001!

C99 itself gives it like this.

 _%G is replaced by the week-based year (see below) as a decimal number (e.g.,
1997). [tm_year, tm_wday, tm_yday]_

If you don't "see below", that is your problem, just like ignoring "(see
NOTES)". The "see below" paragraph has this:

 _%g, %G, and %V give values according to the ISO 8601 week-based year. In
this system, weeks begin on a Monday and week 1 of the year is the week that
includes January 4th, which is also the week that includes the first Thursday
of the year, and is also the first week that contains at least four days in
the year. [... more explanation]_

~~~
wffurr
Those are not obvious "warning labels". Standards usually sound like a _good_
thing.

Both comments bury the lede. The bit that alludes to but does not actually
state "this does not do what you want" is at the _end_ of the description, not
the beginning where it might actually warn someone off.

"You didn't read and understand the whole documentation for this complex
feature" is not making it easy to do the right thing. It's just fodder for
people "in the know" to make fun of normies.

~~~
kazinator
> _You didn 't read and understand the whole documentation for this complex
> feature_

Formatting a year's date into a string doesn't require a complex feature; the
"complex feature" part is a warning itself. The date object (struct tm) has a
year field that is a simple integer (almost: it's years since 1900). Sticking
that into a string in the format "2018" is trivial. The word "weeks" doesn't
enter into it and we don't need ISO to figure it out.

Also, you have to be super naive to believe that G was used for "year" instead
of the letter Y, and then _not even wonder_ out of curiosity that, if this is
how you're supposed to obtain the year, why Y was not available and what the
heck Y _is_ being used for. The software field requires people who are curious
like this and read documentation critically (as they do code).

Why would someone invent a date formatting printf thing in which %H is hour,
but year is %G? Why would they forget to add the year feature, then use Y for
something else, and add year support under G in some new standard? Of course,
any nonsense lettering is due to historic extension when letters are running
out. Why would the basic feature of generating year part of a date string be a
historic extension?

> _Standards usually sound like a good thing._

The first standard for a programming language or API is often good; then it is
usually severely screwed in subsequent revisions. This is massively evident in
C, C++, POSIX, Unicode, WWW, IP, USB, Ethernet, ...

~~~
wffurr
The whole point, which you are missing, is to make the API simple and obvious
for a naive reader, which is most of us. Not everyone is as immersed in Unix
and C culture as you clearly are.

~~~
kazinator
> _make the API simple and obvious_

Note that _make the API_ is a very distinct activity from _document the API_ ,
which is what is being criticized here.

A better API would use long identifiers for all weird stuff like %{iso-week-
based-year} instead of %G.

> _Unix and C culture_

I'm confident I could properly understand an above-par engineering document
written in plain English, such as what is being criticized here, from any
branch of computing.

You can't pin this on Unix and C culture.

Those man pages are documenting this weird ISO "week-based year" about as well
as can be. If you read documents in such a manner that you drop random
adjectives (such that "week-based year" looks like "year" to your eyes) and
ignore anything in parentheses as being optional such as "(see ...)" notices,
that's a comprehension problem.

If the descriptions were not sorted alphabetically, that would also be
criticized as unfriendly. Users would have to exhaustively scan that section
when trying to decipher a format string that someone else wrote.

------
NelsonMinar
This exact feature trap killed Twitter Android severla years ago:
[https://www.theguardian.com/technology/2014/dec/29/twitter-2...](https://www.theguardian.com/technology/2014/dec/29/twitter-2015-date-
bug)

------
corrigible
So, people don't always pay attention to what they're reading?... strftime(3)
says[0]:

"This has the same format and value as %Y, except that if the ISO week number
belongs to the previous or next year, that year is used instead. (TZ
(Calculated from tm_year, tm_yday, and tm_wday.)"

[0]: [http://man7.org/linux/man-
pages/man3/strftime.3.html](http://man7.org/linux/man-
pages/man3/strftime.3.html)

~~~
paultopia
The problem is that this text makes no sense standing alone... why would
anyone imagine that a week number might belong to a year other than the year
it's in without a separate detailed explanation?

~~~
samatman
> This has the same format and value as %Y

When I see a line like that, I check %Y immediately.

Unless I'm tired, in a hurry, or just not paying enough attention.

~~~
gpvos
...which happens to many people once in a while...

------
craigds
What on earth is %G even for? In what situation would that _ever_ be useful?

~~~
hyperman1
You should never use it in a day-month-year situation, but it is very useful
in a weekofyear-year situation commonly encountered in bookkeeping.

You have to decide what to do with the week that gets broken up when a new
year starts, and the ISO standard makes sure everybody uses the same week
numbers.

I personally experienced how this mattered when someone in our company was
unaware of ISO and decided to invent an other week schema as new standard.
First came the people who complained outlook numbered them differently. Then
the complaints that all other companies delivered using another system. Then
it turns out data transfers from the government didn't match either. Finally
one guy pops up and sends the link to ISO in wikipedia.

Our whole 1000 man part of the org was in CC of the original mail, and
naturally we received all reply complaints too. After a very noisy few days in
the inbox, all of us knew all about ISO weeks and why they matter. Training
couldn't be better.

------
zorkw4rg
you might call this a gotcha of this obscure standard called UNIX/POSIX,
although...

> %G The ISO 8601 week-based year (see NOTES) with century as a decimal
> number. The 4-digit year corresponding to the ISO week number (see %V). This
> has the same format and value as %Y, except that if the ISO week number
> belongs to the previous or next year, that year is used instead. (TZ)

I think that's pretty clear, so its actually just a case of not reading /
bothering to understand the documentation. I think if you really want to
bulletproof a system against incompetency we should probably not start with C.

~~~
kazinator
I don't think Rachel is incompetent. She had a brain-slip and is trying to
blame it on some external factors.

"I can't explain why I didn't understand this simple, clear piece of
documentation. But, aha, if the sections weren't sorted such that it came
first before the common thing I was looking for, this wouldn't have happened."

It's just grasping at straws.

There is no way for a documentation writer to predict who will have a
momentary brain lapse, reading what passage.

Putting something more toward the back of the document isn't a guarantee that
someone won't trip up on it.

~~~
rachelbythebay
It was never about me. I didn’t make this error. I did, however, fix it and
added steps to make it not happen again.

------
jwilk
Timezone abbreviations (like _PDT_ ) are neither human- nor machine-readable.
Avoid them.

~~~
rocqua
How to represent the CEST timezone then? Writing "Central European Summer
Time" in full is very big, but UTC+2:00 is actually something else than CEST.

I think standard abbreviations are a nice middle ground, at least when a time-
zone is needed. Otherwise, just report stuff in UTC.

~~~
andrewaylett
You should probably use the time zone identifier from the IANA list, which
would be 'Europe/Paris' or similar.

[https://www.iana.org/time-zones](https://www.iana.org/time-zones)

Accessible copy here:
[https://en.wikipedia.org/wiki/List_of_tz_database_time_zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)

~~~
hyperman1
There are subtle differences between them. I saw one where a French contractor
set up a superdome using Europe/Paris, while the server was located in
Europe/Brussels.

Turns out there is a period of a few months in I think 1937 where there is a 1
hour difference between them. This messed up birth dates: People get born the
day before, at 23:00.

So our software encountered an unholy trinity of bugs,legacy and the oracle
SOAP stack. Unless all servers have the same time zone, these people got the
birth day mangled when passing data between servers. Which means that
superdome infected the time zone for every server created after it.

~~~
andrewaylett
Dates are hard :(. Doubly so when interacting with other systems that may or
may not be set up correctly.

I'd never knowingly set up a server to use anything but UTC, because the
server's timezone shouldn't have any effect on application code and this way,
even people in the UK will notice six months of the year.

In this particular case, it sounds like the application was creating data with
an implied timezone without any verification that the timezone the _user_ was
thinking of matched the timezone of the data, and you may have been better off
not associating the point with any timezone -- or indeed with any time, just a
date.

~~~
hyperman1
Porblem turns up with xsd dates. Applications added time zones in the +2:00
format. The oracle soap stack helpfully simplified everything to a datetime.
Today this zone is the same for Brussels and Paris. So each server decided to
convert it back to its native format, then causes trouble when it encounters
the past.

I tried for a while to run my local development JVM with an insane default
time zone, Japanese character set, etc.. I hoped to shake bugs out of my
programs. Unfortunately, almost none of the libraries I have to use survived,
so i had to change it back. Sure did find a lot of bugs ;-)

------
speedplane
So many frameworks try to be smart about timzones and user settings. Showing
local time to users is fine, but please don't touch my UTC time that's
actually stored in the dbase.

------
kazinator
The compiler could help here.

It makes sense for %G to appear with %V, but not with %m and %d.

Nonsensical combinations like this can be diagnosed with a compiler warning if
the strftime format is a string literal directly passed to strftime, or found
to do so via data flow.

A run-time warning would be nice, but problematic.

------
xfitm3
I generally support efforts to reduce human error, but the author goes too far
by calling for format strings to be abolished.

Localization is hard, which is why these functions exist.

There is ctime() but it includes a trailing \n which isn’t always what you
want.

~~~
masklinn
> There is ctime() but it includes a trailing \n which isn’t always what you
> want.

It's also not useful what with being in a hard-coded US customary format.

------
true_religion
Interestingly enough, I've never heard of %G or the ISO Week. I thought I was
familiar with the ups and downs of strftime but I've mostly used it through
Python and the Python reference helpfully neglects to mention %G or any
tricksy parameters that you may get confused by.

[https://docs.python.org/2/library/time.html](https://docs.python.org/2/library/time.html)

~~~
jwilk
> Additional directives may be supported on certain platforms, but only the
> ones listed here have a meaning standardized by ANSI C

%G was added only in C99.

~~~
paultopia
That gotcha was _added?_ Woah. I was thinking it must have been some legacy
thing that hasn't gotten removed because there's code from, like, the dark
ages buried in the drivers and such that depends on it.

~~~
gkya
It was probably useful to someone else. I don't get why it should be a gotcha.
Libraries can not be optimized for individuals and features that one does not
use are not gotchas.

------
jcims
Isn't part of the problem the fact that it's a pain in the ass to write unit
tests for system time?

------
TomK32
Here's how the formats should be listed and grouped:

[https://apidock.com/ruby/DateTime/strftime](https://apidock.com/ruby/DateTime/strftime)

------
karmakaze
How about using a collation sequence like aAbBcC...? then many of the misused
flags are near the intended ones.

------
emilfihlman
This is not valid criticism at all.

This is just your programmer a) being lazy and not reading the manual (which
is short) and b) not writing tests.

------
alanbernstein
I think the real lesson here is "don't use man pages". Seriously, that's often
my last resort.

