Hacker News new | comments | show | ask | jobs | submit login
Responses to strftime and %G (rachelbythebay.com)
93 points by weinzierl 7 months ago | hide | past | web | favorite | 59 comments



This mistake happened at Twitter ( https://twitter.com/mattklein123/status/984587433646764032 ) - and people lost jobs.


If that account is a true reflection of events - it's really disappointing that engineers got fired (edit: or resulted in engineers leaving).

A company needing hours to identify/fix a minor code issue that affected 40M people is a senior management failure - not engineering.

When I read the initial %G post - I couldn't see how you could/would even test for it - unless you'd heard about such a scenario previously.

Rolling heads in engineering for understandable errors just creates a toxic environment for building anything remotely interesting going forward.


> When I read the initial %G post - I couldn't see how you could/would even test for it - unless you'd heard about such a scenario previously.

Interestingly, because IIRC the Gregorian calendar has a 400 years "full" cycle (hence the "Doomsday Rule") depending on the "expense" of your processes you could just use exhaustive testing: there are less than 150k days in 400 gregorian years. Considering exhaustive testing is feasible across 32b ranges (https://randomascii.wordpress.com/2014/01/27/theres-only-fou...), ~17 bits isn't much of an issue.

You would still need to define properties which hold across the board, but then you could just check literally every date of the cycle.

This get significantly less feasible if you need to check full calendaring (datetime x timezones).


Were they actually fired? I assumed that they got tired of the shitshow and quit.


"Blameless postmortems anyone?"


Reading this thread, it looks like the reason management got pissed was because android users didn't log back in after the bug logged them out. So they fired people because of a bug that caused users who don't find enough value in the app login again to stop using the app that they weren't using already. Sounds like twitter has been dying a slow death for a long time.


Pretty much.

> it looks like the reason management got pissed was because android users didn't log back in after the bug logged them out

Q4 of 2015 was the first time they reported[1] a negative figure for Monthly Active Users. They had squeaked by with 0% and 1% growth before, but that quarter broke the dam and reported a negative figure. That last few days of 2015 could have been leveraged to annoy the piss out of people with NYE tweet notifications to get users to inadvertently click on a link and squeak their MAU figure up to the flatline they were used to. With the 40m logged out Android accounts, it probably would have been fairly easy to get that to a flatline number with enough notifications.

It's basically hovered around 0-1% ever since, but even then it's only had 2 more quarters of negative quarter over quarter MAU figures reported[2].

Not that it's any excuse. And from the next quarter's results, a lot of users appeared to log back in eventually. Just not in time for that NYE dash to pump up the quarterly numbers. Which is likely a reason management threw such a tantrum to assign blame.

[1] The QoQ% for MAUs in the US: http://files.shareholder.com/downloads/AMDA-2F526X/626329529...

[2] Q2'17 and Q4'17, also for QoQ% for US only: http://files.shareholder.com/downloads/AMDA-2F526X/626329529...


"Don't ever log people out" is pretty much the credo for a lot of big things I've seen. A surprising number of folks will never manage to get logged back in. Passwords? They have no idea what they are, or how to reset them. They'll either set up a new account or churn entirely.


But that doesn't justify firing the programmer who wrote "%G" insetad of "%Y". If they chose to create a stupidly complex login process that involves memorising some string you're never supposed to use (because you should never be logged out), that's a design bug. What kind of idiot designs a system and says "the way to access the system is by using a memorised string" and "you should never log any users out". It's frustrating and it makes no sense.

Passwords. They're even stupider than the idiot who works for every single news company, who thought they should resize videos shot on a phone so they can be displayed on a widescreen tv, and then using that on your news app that people only ever use on a phone.

It's shocking how few companies will let you log in by sending a token to your phone or your email. It's how I bloody log in anyway, I just have to put up with your stupid password reset process and then be able to three times type some gibberish string that I'm going to forget.

As for those who log people out - well, my email is logged in on my phone anyway. If I've lost my phone no-one cares about your dingy little app. And even if they did, they have access to it by virtue of having my email account.

Bleh.


If all programmers get fired for bugs that make it into production ... well you know.


Nice find! "Probably because it comes first in the man page". Amazing!

The Twitter situation is in fact what got me looking into the problem where I worked, and I patched a great many potential holes. I also left behind things that'll hopefully keep it from being re-introduced.


This, and the thread from the parent tweet, is a fun read!


There is a god somewhere, dispensing Cosmic Karma as deserved.

Yesterday I posted in the original thread:

   The Java SimpleDateFormat has a similar gotcha concerning %k vs %h
Today one of my self-written utilities crashes in a weird way. Hacky as it was, it did its job nicely for over two months. Half an hour troubleshooting later it turns out the DATE FORMAT STRING is WRONG. Really. I kid you not. Neither %k nor %h turns out to be right, this specific program has a 1/86400 chance of absolutely requiring %H .

Then again, the standard format strings would not have helped here, I need to match the undocumented behavior of an existing program. And it only crashes when someone inputs something at exactly 00:00:00 and so far no bleary eyed user had to burn the midnight oil.

It is in test for now. In 8 days, this baby aids the data migration to production. This bug would have aborted the launch of a program which has to be released before the GDPR awakens. Very close call.

There is a lesson about hubris somewhere in here.


I posted a feature request to findbugs, which seems to have been renamed to spotbugs. At least Java shall suffer no more (if they implement this, at least)

https://github.com/spotbugs/spotbugs/issues/637


The backstory about the 'rename' is that the original FindBugs maintainer sort of abandoned it, so the community forked it:

https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2016-N...

I thought it was a nice demonstration that a project can fairly smoothly survive a regime change without a smooth transfer of power.


Nobody will look for developers with the following line in the ad:

- Must be an expert in strftime / Date / datetime

This is something everyone expects devs should know, but unfortunately working with timezones and date objects is super hard.

The example with %G is too specific for the whole problem. There are always date conversion / libraries / ISO / Locale specific / etc.. Damn it, even Apple returns specifically "PST" in their own responses ( why?! ) [1].

Everyone has a story with manipulating date / time and that's already a sign how bad the situation is.

Trying to be positive ( and funny ), I hope there will be some big and great conference in around 1747247444 [2], where the United Nations' computer department decides that it's too stupid to have so many rules for measuring time and sets a unified standard with no DST, leap years, etc.

1 : https://gist.github.com/lxcid/4187607

2 : https://en.wikipedia.org/wiki/Unix_time


> Everyone has a story with manipulating date / time and that's already a sign how bad the situation is.

Why yes. Some people even collated those stories:

http://infiniteundo.com/post/25326999628/falsehoods-programm...


The Pope tried. Took 200 years for his modifications to be adopted.


I had also this response: nonsense mixtures of conversions could be diagnosed by the compiler or possibly at run-time.

Example:

  foo.c:123:warning: strftime week-based year %G mixed with non-week-based elements.
You'd never use the week-based year %G with a regular month %m or day %d; the combination is suspicious and probably wrong.

Probably, %G is used for formatting a year-week date, like %G-%V: week-based year, plus week-in-a-weak-based year.

I'm not a big fan of run-time warnings that spew something to stderr and the like, and aborting a C program because strftime was called with %G and %m in the same date string seems awfully wrongheaded. It could occur with dynamic strings. These kinds of things could only be done with some API adjustment.

Higher level languages which wrap strftime have more leeway in implementing some of these ideas.


Often these issues come down to two sides digging in their heels: "just don't make mistakes" vs "just make it foolproof". Somehow accidents continue to happen.

What I have noticed with manuals is that I usually approach them in one of three different modes:

* what is this / how does it work / is it what I need?

* which flag or option that does _?

* what does this flag or option do?

The first requires a bit more introduction and tutorial than a typical manual will provide because they're usually written by someone who is so familiar with what it does that they can't remember life before understanding it. There is either too little explanation or too much detail, eg arcane ISO standards and niche uses like "week number". Most likely you need examples for it to finally sink in.

The second is probably the most frequent use of a manual, when you're still testing things out. You know the behavior you want and are scanning through the descriptions to find the right match. It's like reading a dictionary by value to find the right key. This is where the infamous "which tar options do I want?" meme comes from.

The third is what most manuals are optimized for. It's a reference for quickly looking up someone else's code to figure out what it does.

Each of these uses requires a different format. In the first case you need a lot more text and instruction that would be unnecessary noise for the other two situations. In the second case you want things organized by function, and preferably with context about the most commonly used options. The third is where alphabetical order makes sense.


Perhaps optimizing for fewer characters typed is optimizing for the wrong thing, no?

C-style format strings could be updated to work differently. Instead of having to lookup and memorize the format characters for hex one could use %{hex}. Instead of wondering what %d/%u/%p are for, one could use %{int}, %{unsigned int}, or %{pointer}. For date/time formatting using %{year} vs %{iso-week-year} would solve the specific problem in the OP.

%{m} vs %{b} vs %{B} requires both explanation and memorization. It also suffers from capital letter confusion, with m/M catastrophically poisoning the output while b/B is merely somewhat incorrect.

%{month} vs %{month-name-short} vs %{month-name-full} requires no explanation and makes it impossible to confuse minute vs month.

A better-designed string formatting system would require minimal explanation, little memorization, be trivial to review for correctness, be self-documenting, and be easily learned by example.


I agree we don't need to optimize for characters typed, and needing to memorize/lookup formatting directives is a time sink.

What I _do_ want to see optimized for is readability - and this often is related to more compact representations. This is not hard-and-fast, and there's tradeoffs in both directions. This tradeoff becomes more tricky as you need to express finer grained control over formatting.

Consider

  %{year}-%{month}-%{date}
  %{year-4-digit}-%{month-2-digit}-%{date-2-digit}
  %YYYY-%MM-%DD
Or, outside the context of dates,

  Computed %u records in %u seconds (%0.3f MB/s)
  Computed %{unsigned int} records in %{unsigned int} seconds (%{float-precision-3} MB/s)
  Computed %{correctlyProcessedRecordsCount} in %{totalProcessingTime as seconds} (%{processingRate as megabytes-per-second with 3 digits precision} MB/s)
I don't have a great solution here, I just want to point out that the verbose cases can also get ungainly. I think, if we try to optimize for "format strings read approximately like the text they output", that probably gets close to the best middle ground.


And you haven't even touched localization (digit grouping, decimal separators, singular/plural noun forms, currency formats, week/month names with correct capitalization, …) yet.


I don't think any of these solutions is really right.

    formatted = Calendars::gregorian->format("y-m-d h:m:s z", date)
is prefectly fine. And when you want to do week-year formatting

    formatted = Calendars::weekyear->format("y-m-d h:m:s z", date)
will get you there. And you can use the Islamic or Chinese or Persian or Javanese or Julian calendars without having to have additional confusions. You could reconstruct your own Gregorian lunar calendar and use it.

One of the major problems in the article is mixing two unrelated things into one package: the customary western calendar and the ISO week-year calendar. You can see why someone might want to do that (because some businesses like to use them both simultaneously for different purposes) but that was bad design (you can tell, because there's the same concept - the year - with two different possible values).

Since we can't always be precise, we need to make sure that we always improve our libraries so that they are confusion-proof. If it's explicit, it's good.


I would also be unwieldly verbose, and break backward compatibility.

Besides, what's stopping you from writing a template/preprocessor solution that "transpiles" to printf-strings?


> Killing format strings makes l10n much harder.

Wow I'm not sure who posted this comment but they got it exactly backwards.

Format strings being easy to use (and the alternatives often being non-existent) make it very likely to have shit l10n, because most every culture needs their own format string so you'll have to fuck it up for each and every culture before being corrected (if you ever are).

Despite there being an entire organisation dedicated to — amongst others — collating and encoding locale-dependent date and time formats: https://www.unicode.org/cldr/charts/33/by_type/date_&_time.g... (warning: humongous page)

Datetime format strings need to exist as the underlying low-level API for these, but the average developer, the developer who is not working on a datetime parsing or formatting library, should not have to ever write a "raw" datetime format string.

Sadly even in the best of world they'll have to deal with inane datetime format created or specified by some coked-up aye-aye, but that's basically the exception.


I wonder if this isn't an out of place recitation of a correct argument:

Format strings that allow you to reorder arguments are a godsend. I can use printf in c with a translation bundle, and reorder the arguments as my language needs:

  printf("second is %2$d and first is %1$d",first,second);
Compare with cout<<args; there is much less influence you can exercise.

So taking this capability away makes l10n harder. But this is an other kind of format string.

In Java there is the MessageFormat API. It can do almost everything l10n requires, at the price of perl-like format string complexity.


An interesting point, that might be it.



at least it's easier to spot though


A response that isn't mentioned here (perhaps because no one issued it?): WTFM.

The manual is a community project. Improve it and send a patch! Do not be afraid to send patches to anything and everything, this is the beauty of the open source world we live in.


This might happen more if the owner of man-pages entered the 21st century and took Github pull requests instead of maintaining a convoluted email patch mailing process [1].

[1] https://www.kernel.org/doc/man-pages/patches.html


Fuck that. Emailing patches is a great way of collaboration. The time it'd take you to get your head out of the GitHub clouds and learn how to use email for patches (you even found a guide!) would be less than the amount of time the article author spent on both of their articles.


I'll be honest, you come across as a bit of a curmudgeon, but you're totally right! In a fraction of the time it took them to write an article, and a response to the responses to that article, they could have just emailed a better wording and fixed the problem!


If I had a dollar for every time I've hit the MSDN documentation on .NET string formatting options, I wouldn't have to work anymore. It's one of those things that I can't keep in my head, and have to go read the docs, try it out a few times in LinqPad, swear at something, then read the docs again. Almost like regular expressions. Something to do with embedding a stringly-typed super-terse DSL breeds these problems.

I can't help but think this is another one of those holdovers from the bad old days of tiny main memory and 80-char wide displays that we're still beholden to out of inertia.


For the %G issue, I think it should be as simple as changing the %G description in the man page to "This is probably not what you want to use. See %Y. [insert current description here]".


It does, though not simple enough: _"This has the same format and value as %Y, except that if the ISO week number belongs to the previous or next year, that year is used instead."_


> This has the same format and value as %Y

"Sweet, I can stop reading here, this is what I want!" I'm sure I'll make this mistake at some point.


What if man pages colorized (or greyscaled) oft-used vs. esoteric or less common options? Just having a common visual cue could both save time and reduce risk.


I think `tar` is a great analogy for date formatting -- "Here is a few examples, one of which is probably what you want."

`tar cvzf archive_to_write.tar.gz dirname`

`tar xvf archive_to_unpack.tar`

Cf

"%Y-%m-%d %H:%M:%S" -> "2001-01-31 23:59:61"

That's a nice example. A four digit year, an hour past 12, months padded, two leap seconds. (:00 would be good for seconds too, as would 01 for the day.)

Maybe a short note, then next example -- 12 hour clocks, AM/PM, string months, truncated year. Then the next example, maybe this "week-based" date format. Or maybe just leave that for the end "reference" section.

Lead with two or three short examples that cover 95% of usage. It'll make the document a handful of lines longer, but it'll save these bugs and it'll save people time.


> Put it another way: the me of 2018 knows this. The me of 1998 probably did not.

For this, I partly blame the dearth of mentoring[1] in the industry. It's been discussed here on HN countless times before, and I'm pretty sure that has been the case since well before even 1998.

The author does, of course, point out the "rush-rush-rush" culture of shipping fast and fixing later (if the company still exists). That does certainly make sense for very early startups, but it seems to have permeated everywhere, if it didn't originate in larger companies in the first place.

I also partly blame the pushing out of programmers (and perhaps other technical professionals) with this magnitude of experience: to continue in their careers they face a choice of switching to a much less programming-focussed job such as management [2] or consulting or professional services or sales engineering, contracting, or essentially accepting permanent wage stagnation by pretending to be less experienced and going for those kinds of jobs.

Some of this is ageism, I expect, but a better explanation seems to be economic incentives. Technical debt is invisible in quarterly results, but big salaries of senior engineers are. I'll keep my fingers crossed that now that money is going toward flying^H^H^H^H^H^Hself-driving cars instead of 140^H^H^H280 characters, this could change. Likely not in time to beat my parents to retirement but for me.

[1] By which I mean hiring relatively inexperienced, but presumably talented, staff and having more experienced staff members train them in best practices and process and skills that may not otherwise be learnable from reading, formal education, or even pair programming (though that can be a vehicle for mentorship). Management buy-in on this is key, especially in startups, where the managers themselves may be the ones who can benefit the most from such mentoring.

[2] Which could, in theory, be a beneficial use of all that gained wisdom, but, then, the experience stagnates, and they might just have had a single-step Peter Principle promotion.


The Rachel Kroll of 1998 would have been hard pressed to be familiar with %G, something that only got standardized in ISO 9899:1999 and IEEE 1003.1:2001 and was somewhat experimental in the years before. (I don't remember precisely when I added these to my C library back in the 1990s, but they were definitely interesting but highly non-standard extensions in the 1990s.) Of course this supports xyr point that this was not "documented all over the place".


Sorry, I didn't mean to imply that anyone in 1998 should have known specifically about %G. I was, rather, referring, more generally, to something like the wisdom of finishing the man page, even if one thinks one has what one needs partway through, even if one is in a hurry.

That's something I've personally tried to teach whenever I can, along with as many rationales for why this is the case that I can come up with at the time.

This inherent flaw/tradeoff, of ordering, is one of them. No matter what order the author chooses, some reader (sometimes the same person at different times) will find that order suboptimal.


I will cite fanf2 from the previous thread:

> The irony of this bug is that immediately above %G is %F (short for %Y-%m-%d) which is they wanted...

https://news.ycombinator.com/item?id=17058036


I posit that it's the wording.

Keeping using either "The ISO 8601 week-based year" or "ISO year" is enough to keep people confused whenever they read or hear it. The reasoning is simply: do we want "international standard year"? -- hell yeah!

We all pattern match, so "ISO" and "year" match too much of what we want, so we wouldn't even try to doubt our decision to use it!

The proper wording is: don't ever use a word "year" in this case, keep calling the number just a "special G number":

"%G -- the very special G number used only when creating week-of-the-year calendars, see NOTES"

Then in NOTES it would have to be again elaborated that the number obtained by G is a very special number used only for week-of-the-year calendars as specified in a specific ISO standard in this case meaningful only for week-of-the-year calendars and nothing else, and even then, that there are other week-of-the-year calendar standards that do the calculation differently from that ISO specification, and that all this doesn't have anything to do with ISO standard date formatting! And never should this number be referred to as "year". Just a "G number for week-of-the-year." Write that it is "often, but sometimes definitely not the same as the year number."

That's in the line for "easy things should be easy (and obvious) and hard things less easy, especially if the hard thing is for almost all of the uses the wrong thing.

But we all know that a lot of man pages are often both completely precise but at the same time almost useless:

https://www.mercurial-scm.org/pipermail/mercurial/2015-March...

"generates a random man page for a made-up git command... more or less indistinguishable from the official docs"

Of course the approach (being precise while remaining useless) is much older and much more universal than that specific tool and specific pages at some point in time.



> The test has to be better than the code it's testing.

Indeed.

The quote is intentionally taken out of context, but I’ve been treating this as a logical tautology for at least 10 years, and it has served me well.



HN explicitly allows some resubmissions when the other posts haven't attracted attention. Please don't link to empty duplicates, but only if the current post is a dupe of an actual discussion.


This is a great response post!


The simpler answer to this is to use portability layers that work around common problems. This is partly why it is suggested you use libraries rather than rolling your own code.


Since when using a standard library became "rolling your own code"?


"Standard" does not mean "universally compatible". It means "standard", as in, something an authority or custom has established as a model or example. Gather together every "standard" system call you can find and run unit tests on them on multiple platforms and you will discover they don't always return the same results.

"Using a standard library" is a form of "rolling your own code" when the thing you want to do is non-trivial. Sometimes it's hard to know if what you're doing is non-trivial, but certain things should be a given. Dates and times are non-trivial. IPC is non-trivial. Memory management is non-trivial. Sure, you can call malloc() and free() yourself, but it's really not that hard to screw them up. Same for dates and times: they are more difficult than they appear.

This is part of why people write things like date libraries, to work around portability problems and provide what you actually want. In this case it's not so much a portability problem as a problem in understanding what you want to do and how best to do it.


I read it as "use abstractions over the (problematic) standard library, and ideally don't roll those yourself".


Here's a strftime fix for Python:

    >>> import strftime_fix
    >>> import datetime
    >>> datetime.datetime(year=2018, month=12, day=31).strftime('%G-%m-%d')
    '2018-12-31'
    >>> datetime.date(year=2018, month=12, day=31).strftime('%G-%m-%d')
    '2018-12-31'
https://github.com/Mortal/strftime-fix


I hope this is a joke…


README doesn't tell you that, but it also changes semantics of %Y, so that %Y and %G are swapped.

So I don't think it was meant to be serious.


Who'd write/pick %G from the man page when they want a year and day and month have their respective d and m letters? Just because it comes first? And it's sparkled with NOTES and a suspiciously long explanation.

> %G The ISO 8601 week-based year (see NOTES) with century as a decimal number. The 4-digit year corresponding to the ISO week number (see %V). This has the same format and value as %Y, except that if the ISO week number belongs to the previous or next year, that year is used instead. (TZ) (Calculated from tm_year, tm_yday, and tm_wday.)


Your comment is addressed in the post, twice.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: