Hacker News new | past | comments | ask | show | jobs | submit login
Timezone-naive datetimes are one of the most dangerous objects in Python (nerderati.com)
14 points by todsacerdoti 55 days ago | hide | past | favorite | 21 comments



The article doesn't really go into why this is an issue, it just says that it's dangerous but doesn't deliver with explaining this danger. It goes even as far as saying that a time without timezone is worthless, but this is not my experience.

A good general rule of thumb is that for things that happened in the past, UTC is generally fine. You only need a timezone if there was some kind of locality to the event (e.g.: when _and_ where this happened), but usually you can convert to local time. So storing UTC time is fine, adding the timezone in rare cases.

For things that are scheduled in the future, usually you will want to store local time + a timezone identifier (in olsen format, not offsets), to guard against timezone/dst changes (yes they happen all the time)

There are exceptions to rule, but at least in my case 90% of the times I work with are timestamps in UTC and 9.99% things that happen in the future where I can apply the localtime + tz rule.

But yes there are exceptions:

* Alarm clocks (if you set an alarm at 8am, you typically don't want this to shift when you travel). This is called floating time * If you need records of when something was originally scheduled vs when it actually happened if a timezone change occurred. * If you need to handle the awkward DST time change where every time between 1 and 2am happen twice.


Author here.

I admit that I didn't fully explain the danger, because I thought it was somewhat self-evident: if you attempt to compare two datetimes correctly, you must know their timezones. Anyone who has used Google Calendar to schedule a meeting in SF when they live in NYC can attest to that.

> You only need a timezone if there was some kind of locality to the event (e.g.: when _and_ where this happened), but usually you can convert to local time. So storing UTC time is fine, adding the timezone in rare cases.

We're both saying the same thing, here.

While UTC is not _technically_ a timezone, most databases/programming languages/etc. let you attach UTC to a datetime. When you say that you're "storing UTC time", you are implicitly storing it with enough information.

The difference is that I don't like the implicit nature of assuming a timezone-naive datetime is UTC, because that is a very dangerous assumption to make. Just store the timezone, even if it's UTC!


It's a pretty solid sign of an inexperienced developer when they think UTC needs to always be listed as the timezone. Most people with more experience know that if there's no timezone attached it should always be populated in, and assumed to already be in, UTC.

That said, it's rarely mentioned, which is a problem. And as the parent comment mentions, there are valid cases for tracking things in local timezones to retain user-local consistency.


There's a long history of databases using a client-controlled timezone per connection (which defaults to the server's local timezone) and storing only local time. I wouldn't rely on seeing UTC unless the team is very careful about migrating legacy data and consistently using TIMESTAMP WITH TIME ZONE columns and functions.


I think on the first read I didn't get that your article was basically about 'floating times', which upon a second read I totally agree is a weird default and good to avoid!


Isn't it still better to store a future event in epoch seconds/ms?


No, because countries update their DST rules all the time so you don't know for certain which UTC time corresponds with 'wall clock time'. So if we agree on a 1pm meeting in Sao Paolo, it might not be clear yet until later what UTC time that is.


I do not like that Python does not allow tz-naive time to be interpreted as UTC, which does not need a timezone. So you have to waste space by using a tz-aware format or you have to add the TZ +00 manually in some way.


“Epoch timestamps are timezone-naive”

Er, not really. Not meaningfully. They can be freely compared with timezone-aware timestamps. They cannot be freely compared with time zone-naive timestamps.

“Timezone naive” is a distinction that applies to datetime objects. It just doesn’t apply to epoch time. The problem with naive timestamps is that you have to somehow remember what the correct time zone is. You don’t have to remember that for epoch time.

The reason you want timezone-aware datetime objects is because you can safely and unambiguously convert them to, say, epoch time.


> Er, not really. Not meaningfully. They can be freely compared with timezone-aware timestamps

Author here. My original post was a little ambiguous on this topic; I've updated it to make it clearer.

The tl;dr is that in Python, `time.time()` calls the c stdlib `time` function (at least in CPython), which follows the POSIX standard. It turns out that POSIX standard does _not_ mention timezones at all: https://pubs.opengroup.org/onlinepubs/9699919799/functions/t...

To wit, you can't actually assume that timestamps are UTC in Python, which is a different kind of insanity:

``` datetime.datetime.utcnow().replace(tzinfo=pytz.timezone("America/Toronto")).timestamp() ```

differs materially from

``` datetime.datetime.utcnow().timestamp() ```


> To wit, you can't actually assume that timestamps are UTC in Python, which is a different kind of insanity:

The code you’ve put in this comment is some… utter nonsense, sorry! Not sure how else to put it. And you’ve come to a wrong conclusion as a result. And the wrong conclusion does not even make sense.

When you call .replace(), you’re constructing the Toronto time which has the same local time as the current UTC time. This MUST have a different timestamp than the current UTC time.

Basically, it is 5:00 PM right now in Toronto, UTC-4, which is 9:00 PM UTC. Your code is “Give me the current local time UTC (9:00 PM), replace the time zone with Toronto (9:00 PM Toronto), and then give me the timestamp for that.” The result is nothing more than a timestamp four hours in the future. You’ve managed to construct a sequence of API calls that adds four hours to the current time.

Some problems:

1. You SHOULD know better than to call .utcnow(). When you try it, Python will print out this message:

> DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

2. In general, .replace() CHANGES the datetime object into a different one, which represents a different moment in time. If you want the same moment in time but a different time zone, you want .astimezone(), which gives you the SAME moment in time, but with a different timezone.

3. Timestamps are safe to compare. They represent instants in time and you do not need to worry about timezone conversions when you work with timestamps.

4. The notion that “you can't actually assume that timestamps are UTC in Python” is nonsensical, because timestamps don’t have timezones! A timestamp isn’t UTC in the first place. It simply represents an instant in time.

I really think there are some fundamental misconceptions at work here. I am hoping that this comment will help bust some of your misconceptions and help you arrive at a more correct understanding of how time works:

- datetime objects can be timezone naïve or timezone aware,

- timestamps are in an entirely separate category, where timezones do not matter at all.

I recommend looking at something like Java’s JSR-310 library for more information about how a well-designed API would look. In particular, think about how Instant, LocalDateTime, OffsetDateTime, and ZonedDateTime map to Python classes. The Java classes are more strict in their semantics and can serve somewhat as a point of reference for how you “should” think about time.


When I discovered the arrow module when I was starting with Python (15 or 20 years ago), this changed my life. One common API for everything.

Today I never use a date without Arrow, no matter how simple the operation is. I am an amateur dev so there maybe performance or portability reasons that make it a bad idea for professional devs, but, man, how easier my life is.

I looked for similar modules in TS (found them) and Go (did not find anything)


Some with JavaScript timezone specific date times. My solution is to generally favour https://www.npmjs.com/package/@date-fns/utc so that my code works the same without any effort in any timezone.


The trouble is when you have to deal with timezone-aware business logic. `date-fns-tz` does exist, but the amount of brain contortions required to keep track of everything is frankly exhausting, not to mention error-prone.

Suffice it to say that I'm waiting with bated breath on the Temporal API (https://github.com/tc39/proposal-temporal).


Yep, I am in the Zulu habit.


There really should be a 'from_utc_timestamp(...)' which will cover 99% of timestamp parsing in Python.

Because there's not means people are always dragging in timezone libs or dealing with the consequences.


There is .utcfromtimestamp(). The problem is that this function returns a naïve datetime.

You don’t need to grab in a library, though. There is a UTC zone defined in the standard library. You pass it in as the second argument to .fromtinestamp():

  datetime.fromtimestamp(ts, UTC)


This is a big issue in the exchange of data via databases and APIs.

IMHO, the initialization of a timezone-free datetime should ideally emit a warning in Python.


There are linters that give you this statically, but you need to know to run the linter. An example: https://docs.astral.sh/ruff/rules/call-datetime-without-tzin...


Just to be sure, is this rule enabled by default?


In the case of ruff, I don't believe so. Your config needs to specify ALL or DTZ.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: