Hacker News new | past | comments | ask | show | jobs | submit login
Working with Postgres Types (jonudell.net)
65 points by alainchabat on Aug 8, 2021 | hide | past | favorite | 27 comments



I didn’t read the whole article (probably not aimed at me) but I just want to say that you should never, ever, ever use “timestamp without time zone” in PG because it is actually “clock face time” rather than “moment in time” and therefore the moment you try to do anything at all clever with time, you will start to question your own sanity. Well I did anyway.

There is not enough room here to explain fully but “Timestamp *with* time zone” effectively means UTC time which is the closest equivalent in PG to Unix epoch time and almost certainly what most people want. It doesn’t actually store a time zone.

You’ll thank me later :)


Both postgres timestamptz and timestamp are the same, they both store timestamp as UTC epoch time. Also timestamptz does not store the timezone given by the client, the only thing it stores is the UTC epoch time, so you'll lose the original timezone information. The only difference between the two is that with timestamptz postgres does a conversion between the text representation and the stored value based on a configured timezone (session, or database configured).

The decision on which type to use IMO depends a lot more on the behaviour of the database driver of your programming language, and which one works better with the type.

If you're not writing an application that uses a db driver, but interacting with pg over psql manually, timestamptz is preferrable because a lot of pg functions works better with timestamptz. If you use timestamp, you would need to be using explicit timezone conversion and be consistent to avoid weird double timezone conversion issues, which makes it far more verbose.


> “Timestamp *with* time zone” ... doesn’t actually store a time zone.

Thanks, I would never have guessed that!


I've made a "linter" tool to enforce this: https://github.com/kristiandupont/schemalint


TIL: one can write a Postgres hook to enforce this in the database at 'create table' time. This would fit nicely in the ProcessUtility hook.


Seconded. I now store timestamps as the number of microseconds since the epoch. It may not be as fancy as an actual timestamp type, but at least it works without surprises.


I went down this path for a time and I recommended it for MySQL some time ago. But actually epoch time and timestamptz are equivalent. You can specify the number of decimals in a timestamptz to get to microsecond precision if you like.


Except you have no idea when if you look at the database manually.


Luckily we provide a GUI so they can look at it in context.


Can you expand? I don't understand clock face time vs moment in time. If you have a link, I'm happy to go there.

By reading the doc, it seems tome that timestamp without timezone should be used whenever the entirety of time operations are handled by the application. I'd argue that using timestamptz might actually cause bugs due to the conversion, often overlooked because everything is configured to use UTC.


Let's say your application,for whatever reason, sends 3 different timestamps to pg: 2021-08-08T05:00:00+01 2021-08-08T06:00:00+02 2021-08-08T07:00:00+03

Do you want PG to store the same value or different values?

If you answer 'same', you want with time zone.

Now your answer might be: "make sure your application doesn't do that" in which case your probably fine either way but then you need to make sure you do that correctly.


That is pretty clear when I choose the type. Timestamp WITHOUT timezone would imply that information is truncated. Now, my application works only in UTC so I don't want any transformation to happen and it will be explicit (and timezone stored in a separate field) in the application when this has to happen. If this is the case, is there any problem with using timestamp without timezone?


If your application only works in UTC then you should use timestamptz.

The problem is that a timestamp of “00:00 1 jan” may have a different UTC time depending on the timezone your client is in. So if your laptop is set to Melbourne time then the UTC time would be “14:00 31 Dec”. If your server is set to UTC then the code you deploy would treat it as 0000 UTC. The behaviour changes depending on environment.

This happens because PG silently converts timestamp to timestamptz for many time operations, so the final result will depend on the environment TZ, which you don’t necessarily control or set. From the perspective of the SQL expression, the behaviour becomes nondeterministic.

If you want to guarantee deterministic behaviour you should use timestamptz which does not require conversion.

Timestamp is really only useful IMO when you have a feature such as “enable this setting from 2pm local time wherever the user is”

Side note: As much as I love PG, the implicit casting between these two incompatible types is pretty baffling.


I'm not sure if what you are describing is correct. If the client sends utc (as in a web browser), the server would transform it to UTC and then work with UTC in any context. So the database will always and only see UTC values coming in and out. Then, the application will transform the time into whatever timezone is needed. A point could be made in ensuring that postgres timezone should be set to UTC. But if this is the situation, even with the conversion nothing would happen


I'm definitely correct. The client I'm referring to is the PG client, not the browser. The browser connects to some HTTP server which is a client of the PG database.

Assuming you send the time in epoch time format, you would use:

    select to_timestamp(1628506048);
to convert to a `timestamp with time zone` (yes that is correct). If you insert this value into a database using `timestamp without time zone` then the database forgets that the time is UTC and will use whatever TZ is set on the client that happens to run the query.

If you can 100% guarantee that all your devices - developer machines, test machines, VMs, production, the whole works, if you can absolutely 100% guarantee that they are all set to UTC then timestamp and timestamptz are largely the same.

But why take the risk? I can tell you from experience that learning this the hard way is really hard.


Yep, the problem we had was all our tests ran fine on dev machines (local tz) but failed when deployed to production (UTC). We were doing some pretty complicated time calculations so it caused us a lot of trouble. I didn’t want to mandate UTC because the code should work deterministically regardless if the server time zone. But also, technically it’s a bug if you think you’re working in UTC, but you’re actually working in clock face time.


That's exactly how we handle it: all the clients and servers involved are set to utc.

I don't see it as a risk, I'd recommend doing that in any instance to save from confusion eventually, but thanks for clarifying in details.

Any chance you have an example query (or instructions to follow) to trigger a bad case? I'd like to verify our setup

I see your point, I need to evaluate the consequences of it


I don't see any problems. If your application only works on in utc then both types will behave the same. Regarding transformations, I'd argue that both types do a transformation just different ones. So if you are chasing a bug where a time zone was sent to the db, either type might be a problem depending on what you were expecting.


This is not correct. Timestamptz does not do a transformation when used in expressions requiring timestamptz (most of them) which timestamp does.

The two types are different and incompatible. Timestamptz refers to a fixed moment in time regardless of your local Timezone. Timestamp refers to the time on the clock wherever you are in the world. A timestamp of “7pm” means “7pm” wherever your server - or client, depending on how it gets executed - happens to be.

It’s really really confusing because select now(), which returns Timestamptz, internally uses UTC time, but prints the time in your local time zone eg ‘10:16 Aug 9 2021+10’ in my case.

Select now()::timestamp prints the same time, but without the timestamp, leading you to think that the former stores the +10, but it doesn’t.

In fact the +10 is telling you that PG knows that the time is UTC and your local TZ is UTC+10. The latter, without the offset, is telling you that PG doesn’t know what time zone the time is in and so can’t convert it to your local time zone.

It is super confusing but the best possible advice is never use a plain timestamp without time zone because it leads to a bunch of subtle errors that will drive you nuts.

Put another way: if you are used to using Unix Epoch Time, then the equivalent data type in PG is timestamptz.


I agree with you, but what I meant by both doing different transformations is that:

If the client sends 2021-08-08T07:00:00+03 to the server, in either case the offset of +03 is going to be lost. As you know, timestamp will keep the clock time and timestampz will keep the "moment in time" (under the hood by converting to UTC.) I'd argue those are just two different rules for "dropping the offset".


Oh ok yep sure, you're right. If you want to keep the timestamp then you need to keep it in both cases.

But with timestamptz you don't need to use the timezone in most calculations while with timestamp you do need to use the timezone otherwise unexpected behaviour can creep in.

But yes I do agree that you're technically correct, and that's the best kind of correct after all :)


- clock face time, lossy. non geo-aware/geo-normalized value. generally a scalar, or partial date time. value may not be robust if consumed outside of reporter’s locale

- moment in time, geo-aware. scalar + geo-value, such as TZ or UTC offset

I (mediocrely) wrote about this years ago: https://cdaringe.com/managing-application-dates


Or alternatively,

"timestamp without time zone default (now() at time zone 'utc')"

is fine as well and does make the time zone explicit in the type definition.


No, this just makes the default value the UTC time at the time a row is created, but it doesn’t help.

Let’s say I create a row at “12:34 5 April UTC”. If I now change client time zones and run a query, the epoch time will change to “12:34 5 April EST”. That’s probably not what you want.

The underlying issue is that most PG functions use timestamptz, and PG silently converts timestamp to timestamptz. This conversion uses the local TZ and can lead to nondeterministic behaviour from the perspective of the SQL expression.

Put another way, timestamptz is perfectly stable, timestamp can be nondeterministic, use timestamptz unless you are absolutely sure you know what you are doing.


Interesting read, not just this post but the whole series. I used to follow Udell's writings (though I can't recall now where they were published; must have been some magazine, I think, not just his own blog), but it seems he drifted out of my sight sometime in the last decade. Thanks for reminding me of him!


Agreed, I used to read Jon a lot back in the magazine days. Glad to see he is still at it!

This page will answer some of your questions:

https://blog.jonudell.net/about/


He used to write for Byte magazine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: