Hacker News new | past | comments | ask | show | jobs | submit login
Big problems at the timezone database (joda.org)
103 points by CodeIsTheEnd on Sept 25, 2021 | hide | past | favorite | 32 comments

I've carefully followed the debate so far, and while I do believe there should be some change in the tzdb development process and the offending patch should be reverted for now, I ultimately disagree with Stephen Colebourne. In fact I found Colebourne more misleading than Paul Eggert, the current tzdb coordinator.

There has been two major ways to use the Timezone Database other than using the bundled tzcode. One is to use the compiled TZif file (intended to be used by tzcode, but also standardized in RFC 8536). Another is to use the textual format that is compiled to TZif via zic. The latter option was never intentional but used by a significant number of downstream projects including Joda-Time. And every time the textual format slightly changes, Colebourne complains about the breakage despite that breakage was induced by Joda-Time itself. And it has also caused a significant maintenance burden to the tzdb: most notably rearguard/vanguard splits [1].

This time Colebourne is complaining not because pre-1970 timestamps will be altered---they have changed a lot in the past---but because Joda-Time was falsely assuming that the zone never turns into an alias. Therefore Colebourne should have requested more concrete and reasonable guarantees for the tzdb. Instead he is claiming Joda-Time is representative of downstream projects and the tzdb should follow what Joda-Time is assuming. This annoying attitude is evident when you also look at Mark Davis (who is in charge of CLDR); while Davis agrees that the patch should be reverted (reasonably so) he is much more careful about his wording.

Technically speaking it has been true since 2014 that any time zones that looked alike before 2014 can be merged. The only difference is that, multiple such time zones across multiple countries were not merged yet at that time. That's what Eggert refers to the equity or fairness principle, but personally I came to think that he is actually giving marginal and tangential reasons to implicitly express the disdain about Colebourne.

[1] https://news.ycombinator.com/item?id=24587730

We have a client in Azerbaijan who was very angry their timezone was shown as Asia/Yerevan. As you may know, Azerbaijan is in a state of cold of war with Armenia. It can get pretty controversial.

It is not recommended to directly expose the textual time zone identifier for that reason. CLDR for example provides a mapping from the tzid to the translated name, and that can be tailored. In case your client doesn't like that internal identifier, the correct response would be abolishing the textual time zone identifier (there has been multiple attempts so far).

I wrote a dismissive comment, but I'm curious on why you think its possible not to expose TZ.

There are a lot of contexts in which I need to know the TZ. Logs are a really common one (though they should be in UTC). Calendars are the most common.

I've seen a few calendar systems that default to the invite sender's TZ. I needed to know TZ to appropriately schedule my calendar.

> I'm curious on why you think its possible not to expose TZ.

I didn't meant that. I'm dissecting the problem into two cases here: for end users it is always possible to replace the tzid with labels (possibly tailored) so tzid doesn't have to be exposed at all, and for developers it is required to abolish textual tzids. The latter would require the cooperation and concensus from tzdb's (and contributors') part.

> I've seen a few calendar systems that default to the invite sender's TZ. I needed to know TZ to appropriately schedule my calendar.

For recurring events, you absolutely need to know the TZ. You have teams in the Bay Area and London (common for many tech companies). You are in SF and have a recurring meeting with London on Monday mornings. The meeting today (Sep 27) is at 10AM, or 6PM London time. At what time will the meeting be on Nov 1?

>It is not recommended to directly expose the textual time zone identifier for that reason.

Texan here. When I was a kid, I would get salty when the computer said I was in "Chicago time". I thought we deserved our own time zone.

The decision might yet backfire when the each EU country (yeah Norway isn't in the EU) decide to adopt different timezones, because each of them will get to decide to keep UTC+1 or UTC+2: https://www.wired.co.uk/article/clocks-change-uk-2020-daylig...

Then again, that article might be outdated...

> The TZ Coordinator's argument is that there is a fairness/equity problem if Oslo is allowed to keep its pre-1970 history but other locations (typically in Africa) are not.

Ah, the 'F' word.

This becomes the fig leaf for vast injustice in the name of justice.

To be, uh, fair, I didn't actually know where Abidjan was (the example given in this article). The urban area has a population of over five million, compared to greater Reykjavik's population of under a quarter million - a bit smaller than the small city where I grew up, even!

So while I do agree that backwards compatibility is important, and I hope the technical changes are resolved in a way that seems to match the consensus of everyone else on the list besides the TZ Coordinator... if one were constructing a time zone database from first principles these days, I'm not sure what the argument would be to prefer Reykjavik as the synechdoche for the time zone to Abidjan, besides "It's in Europe."

If we started today from scratch, I think timezones wouldn't be set by towns but by arbitrary numbers or letters (something like "+8:00 DST-2")

Having to chose a single town on each timezone is just bound to be problematic whatever the criteria are, and it would also mean endless arguments about changing timezone names when the chosen town doesn't meet these criteria anymore.

Storing it purely as a numerical offset isn't enough, because you need to know the location. Does DST apply? When does DST start? Did DST exist in 1971? Will DST exist next year? Will it be DST next week? I'm not sure what you intended with "DST-2" in your proposed format exactly, but you really need a database for these things since there isn't a universal agreement on these things. Maybe there ought to be, but there isn't so we'll have to deal with that.

Plus countries change timezones. I can set it to "+8:00" and in two years my country changes its timezone and now the stored information (and thus my time) is incorrect. We also need to know on which date the change takes effect, because showing the wrong time for tomorrow or next week would otherwise show the wrong time.

I actually store timezones as "<country>.<zone>". e.g. NL.Europe/Amsterdam. This is because users select the country first in the UI (rather than a zone), and this way it will always be clear which country the user selected. It would be confusing at best, and potentially offensive, to show a different country later.

> If we started today from scratch, I think timezones wouldn't be set by towns but by arbitrary numbers or letters (something like "+8:00 DST-2")

That is exactly the wrongest thing to do, because your version makes it impossible to plan future local events.

It would essentially be a completely useless intermediate layer between something very similar to the current timezones and the "absolute" times below.

Er... Imagine selecting among those alphanumeric identifiers at system install time.

Is it worse than looking at a world map and trying to check the representative town on the same zone as yours ?

In a way it would help a lot to expand geographic knowledge, especially if the europe and US were all mapped to somewhere in the south hemisphere.

How about "The city in the zone that the most people in the world could point to on a globe"?

Forcing the whole globe to choose the same city is pretty much opposite of equity and fairness - people from e.g. Africa, South America will know of different cities than people from North America or Northen Europe so I don't quite understand why you'd all force them into the same crutch. Computers are supposed to make lives easier for people and since we can somehow bring ads to everyone in the world, we can probably also extend timezone databases to include most of the world.

Replace "equity and fairness" with "consistency" then it makes much more sense. I think Paul Eggert keeps using those words mainly because the patch was originally triggered by external complaints, but "consistency" better explains the gist of the situation.

There are two ways to make the tzdb more consistent: either merging time zones that are alike since 1970, or splitting time zones as much as possible. The latter was the pre-2014 approach and much harder to maintain (after all, should we split time zones just for newly discovered pre-1970 time zone differences?), so the tzdb has gradually switched to the former for a decade. This patch is just a continuation of this ongoing switch.

Alternatively, it might be possible to keep the latter approach but limit the scope so that the maintenance remains doable. For example the tzdb can forgo textual time zone identifiers and declare that downstream projects are responsible for the mapping to the external world (and thus politics). However that would make the tzdb much less useful. The current policy does seem to maximize the value of the database without much trouble (that is, limited to a minor drama) and any change to the policy requires a serious consideration about that. In comparison the forking proposal by Colebourne is at best naive.

My chaotic side would love yearly elections or competition to decide which town will be the winner for each timezones every year.

Bonus points if it's an online poll where the game becomes hacking as much votes as possible without getting caught.

Is there a way to ascertain how much code this will break?

As I understand it, there's two possible breakages. The first is https://www.hyrumslaw.com/ breakages, of the sort described in the article - currently, it is possible to distinguish Europe/Oslo and Europe/Berlin, even though they behave the same post-1970. In the planned release it will not be; they will be aliases for each other. This is semver-incompatible in the strictest sense (in much the same way that, say, changing the toString of some object or the formatting of some CLI tool could theoretically break downstream code).

The second is that, if you're actually processing dates before 1970 and you actually care about historic time zones, your code will break unless you go out of your way to load the "backzone" data file.

Probably there is not very much code that is actively processing pre-1970 timestamps (most applications are handling current or relatively recent data) and handling them in localized format instead of having turned them to UTC already and actually doing computations on them (instead of just treating them as strings for display purposes) and working with any of the affected time zones.

People who have turned pre-1970 local timestamps to UTC already are in some danger of seeing breakage: if they try to turn them back into the local time from the same timezone, they won't get back what they originally entered.

But a database with historical changes over time would resolve that. Isn't that part of the argument?

"Location x in Timezone P was y definition until z date, when it merged to a definition that coincides with Timezone Q"

As I understand it, that is what you would have had in tz 2021a, but not in the just-released 2021b (unless you enable backzone, but if some distributors start doing that then it's effectively a fork).

I think the API promise is that even if there are known discrepancies pre-1970, the standard database already might not be including that, and so people with pre-1970 timestamps already should have been using backzone.

Shouldn't each time zone, or definition within, have a time domain given for when the definition is valid?

Even within a single timezone, dst dates have changed, sometimes differently for different places within the same timezone.

DST start/stop dates have changed multiple times in the last couple of decades, so even if pre-1970 isn't a common use, it seems the method for dealing with it would already be required.

Yes, and the tz database handles this. (In fact the most immediate trigger for the 2021b release is that Samoa decided to get rid of daylight savings, but obviously did not retroactively change any times in the past... which points to an obvious constraint on that definition, you can specify the starting time where the definition is valid, but you cannot specify the ending time unless you can predict the future behavior of all governments worldwide. There have been tzdata releases with about a day to spare.)

In the past, most cities set their noon to actual solar noon, and when rail travel and communications made them pick a time relative to GMT, for a time they specified solar noon in GMT. For instance, America/New_York in the late 1800s gets you GMT-4:56, because that's what was actually used.

The rule for the tz database is that if two legislative time zones have the same definition from 1970 onwards, they might only represent them as a single time zone in the database. For instance, there's no America/Miami distinct from America/New_York, even though solar noon is a couple minutes different.

As it happens, Oslo and Berlin are the same from 1970 onwards, but the tz database currently has distinct Europe/Oslo and Europe/Berlin definitions. The proposal is to get rid of that distinction.

What if Oslo or Berlin vote to change their time zones, though? What if Norway pulls out of the EU in a "Nexit" referendum and then changes their DST to suit their economy? Wouldn't it be worth keeping the distinct Oslo and Berlin time zones?

Well, the same thing could happen to Miami, and there's no America/Miami at all right now. There's no sense defining every possible location in the world as a time zone to future-proof against that. It would have to be defined in a future version of the database.

Interesting fact: Norway is not a member state of the EU.

Not saying anything about this particular case, but with most technical arguments, the most heated tend to be the lowest stakes.

> Not saying anything about this particular case

Yes you are.

I mean, I'm pissed Boston doesn't have it's own entry too, but frankly, I wouldn't have it any other way. America/New_York is simply larger, and has now been around much longer (not the city itself).

If we want to get into a complicated mess, let's talk about daylight savings time and regional exceptions, but I'm not really sure how we're going to make everything "fair" one way or another. Next we'll be arguing about states vs countries, population vs power, god knows what else... it's all just not in the interest of good time keeping.

Time zones are a necessary evil, ideally our code doesn't suffer from our petty human concerns.

> Time zones are a necessary evil, ideally our code doesn't suffer from our petty human concerns.

I don't understand this position. The sole purpose of our code is to serve "petty" human concerns.

Time zones are certainly not petty human concerns; the notion that we sleep when it's dark and we're active when it's light is fundamental to our nature - as is the fact that we want to coordinate our activity with those around us. Time zones are probably the most effective outworking of this. Some alternatives are conceivable (for instance, we could record times in UTC) but they would effectively introduce the same problems (for instance, if business hours are set to run from 23:00-07:00 in some location, and then as trade patterns change it's agreed to reset them to 22:00-06:00, we need some way to know that we need to reschedule events scheduled for 06:30 in that location but not events scheduled for 06:30 in a different location) along with new ones (for instance, most people would be pretty upset at having the date change in the middle of the day, just because it's convenient in Europe).

Timezones are fundamental to any code that cares about human events in the future or the past.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact