
Working with time in Postgres - neovintage
http://www.craigkerstiens.com/2017/06/08/working-with-time-in-postgres/
======
orf
You cannot have a post entitled "Working with time in Postgres" and fail to
mention Range Types[1]!

If you're using Postgres right now and have any columns like start_* and end_*
for anything (e.g numbers or dates), you need to stop what you are doing and
use a range type. They are amazing. You can do things like a unique index that
ensures there are no overlapping ranges, you can do efficient
inclusion/exclusion indexing and much more.

Use them. I'm always surprised more people don't know about them.

1\.
[https://www.postgresql.org/docs/9.6/static/rangetypes.html](https://www.postgresql.org/docs/9.6/static/rangetypes.html)

~~~
scott_karana
Practically speaking, how would you use these for noting starts and ends of
long-running jobs, say?

Would you set the interval starting time, but leave the end of the interval as
"present"/infinity? And then update the end of the interval when the job
finished? Wouldn't you also need to have a cleanup function to manually
"close" intervals if the worker crashed and restarted?

~~~
orf
I'm not sure to be honest, I would set the end as infinity I think.

I wouldn't have the worker process handle this itself though, as you would
need some form of cleanup. But you'd need the same with two individual columns

------
davidw
"I was in favour of space exploration until I realised what it'd mean for date
time libraries" \-- Joe Wright

[https://twitter.com/joe_jag/status/510048646482894848?lang=e...](https://twitter.com/joe_jag/status/510048646482894848?lang=en)

------
mrtbld
> Postgres has two types of timestamps. It has a generic timestamp and one
> with timezone embedded in it.

That's not correct, timestamptz doesn't have a timezone embedded in it. It's
just that it's timezone-aware. A timestamptz corresponds to a universal point
in time that have many human reprensentations, one for each timezone. psql
uses the default timezone of the postgres instance to convert a timestamptz to
a displayable string, so timestamptz are always displayed with a timezone, but
that info does not come from the stored value.

Timestamptz needs timezone information only for operations that would give
different results in different timezones, e.g. display as string, extract the
day part, add a 1-month interval (DST info needed), etc. Comparing two
timestamptz however doesn't require any timezone info.

The difference between timestamp and timestamptz is not about what they store,
but about how they behave.

Edit: In my experience, this is not always obvious because postgres uses the
default timezone of the instance whenever it needs such info with timestamptz
operations. Using an explicit timezone often requires convoluted code.

~~~
mixmastamyk
Thanks. If I store all datetimes from my app in UTC, with end users in more
than one timezone, which type should I use?

~~~
mrtbld
Well I would use timestamptz, using user's timezone only to convert for
display. Use cases for timestamp are very limited.

Just make sure you include a timezone info in string representations in your
SQL queries. For example '2000-01-01T00:00:00Z' where Z stands for UTC.
Otherwise that would insert a timestamp into a timestamptz column, in which
case postgres uses local timezone setting for conversion, implicitly; this is
not what you want.

See [http://phili.pe/posts/timestamps-and-time-zones-in-
postgresq...](http://phili.pe/posts/timestamps-and-time-zones-in-postgresql/)

Also you should use an equivalent type in you app, i.e. python datetime with
tzinfo or JS Date. And beware of UTC offsets: they can't handle DST. Python
pytz and JS moment-timezone provide DST-aware timezone info (which is built-in
in postgres).

Edit: if you can rely on your users system time for display that's even better
because you wouldn't have to explicitly deal with those DST-aware timezone
info.

------
ender7
Every time I deviate from using unix milliseconds as my timestamps, I end up
regretting it. If we use unix seconds, we get infinite bugs related to people
forgetting to convert to millis when comparing against the current time. If we
use Date objects, it's an even larger surface of potential bugs. Every Date
interface I've ever seen makes it far too easy to accidentally create a
relative time (i.e. anything that can't be mapped unambiguously to a single
unix millis timestamp. Usually means a datetime that defaults to the current
timezone). Does anyone have a preferred method that avoids these pitfalls?

At the end of the day I always come back to "solution with lots of possible
bugs" or "unix millis everywhere". And I always choose the latter. It means we
can't use nice date features in a lot of databases, but...eh? They've never
seemed worth it.

~~~
waffle_ss
The problem with Unix milliseconds is it's actually not always increasing
thanks to leap seconds. A positive leap second will result in the fractional
part of the last second of the day going up to .999..., then resetting to
.000... over again.

[https://en.wikipedia.org/wiki/Unix_time#Leap_seconds](https://en.wikipedia.org/wiki/Unix_time#Leap_seconds)

~~~
akira2501
There's always TAI64.

[http://dyscour.se/post/12679668746/using-tai64-for-
logging](http://dyscour.se/post/12679668746/using-tai64-for-logging)

------
luhn
I love Postgres' time handling, even more so whenever I have to handcraft
time-based queries in other databases, like MongoDB (which is more often than
I'd like).

Some things the author didn't mention that I like:

* Timestamp with time zone string parsing: '2013-06-27 13:15:00 US/Pacific'::timestamptz

* Timezone-aware to timezone-naive conversion (or vice versa): mytztime AT TIME ZONE 'US/Pacific'

* I haven't used tstzrange yet, but it looks pretty powerful.

------
cuu508
Another tip, if you work with per-user custom timezones, then "SELECT
some_date AT TIME ZONE %(users_timezone)s" is also sometimes useful and
needed.

Normally you would want to receive timezone-aware timestamps from the
database, and format them in user's timezone at display time--perhaps in a
template. But, if you're e.g. aggregating data for a day-over-day or month-
over-month report, then the conversion to naive dates need to happen on the
database side, so that day boundaries and month boundaries would match the
user's timezone.

~~~
yawaramin
I'm a bit of a n00b at date handling. What's the benefit to communicating with
TZ-aware timestamps versus with TZ-less timestamps with the shared
understanding that they're always at UTC? With the latter approach I can also
convert to my local timestamp for display.

~~~
zAy0LfpBZLC8mAC
That the database can't do computations on what it's not aware of. If you want
to ask the database for "events today", the database needs to know what span
of time corresponds to "today" for the user.

------
greggyb
I never see discussion of fiscal calendars in these sorts of threads and
articles. Whenever I read these, all I see is something that would fall apart
as soon as approximately half of my clients look at it.

We have found that it is more annoying to keep track of the behavior of the
library and of a home-grown date dimension. In my organization, we tend to use
a standardized pattern that can handle arbitrary calendars, even when we're
dealing with standard calendars.

~~~
solidsnack9000
Do you guys use a custom date type, then?

~~~
greggyb
No. Dates are dates. Everyone can agree when a specific day exists. It's all
about grouping.

Dates have attributes that group them together. "Month" is an attribute that
you're familiar with. "Fiscal Period" can take on many specific definitions
but it is analogous to "Month".

Those two concepts share a lot of properties. They each collect a series of
contiguous dates. Each is adjacent to a similar grouping that falls
sequentially and the last date that exists in one is one day prior to the
first day that exists in the next. Each falls within a larger category like
"Year" or "Fiscal Year".

Year+Period forms a composite key for a period. We can also assign a
monotonically increasing field that increments by one with each subsequent
period. That field allows simple arithmetic to shift forward and backward. We
typically call this attribute PeriodIndex or PeriodSequential. I'll abbreviate
to PI here.

If you have a reference PI, you can always find the immediate prior period by
subtracting one from the reference. We can assign these for any grain of time
period. We typically see Week, Period, Quarter, Semester, Year.

This is the baseline of how we handle dates. There are plenty of utility
fields we'll maintain for specific time-based needs, but it's all sugar on top
of that.

------
kclay
One thing I learned about working with postgres and time is that the timezone
is based on the timezone of the connecting sever and not actual sever. I can't
tell you how long it took me to debug code due to my workstation being at cst
but severs in est and then storing dates as utc. Bundle that with caculating
upcoming birthdays within 15-30 days before and leap years.

Yeah I didn't like it one bit. Sorta reminds me when I had to develop a Grantt
chart component in flex for a client, so many problems with dates.

------
purple-dragon
Fantastic article as usual. One correction: the literal for UTC 00:00:00
00:00:00 is 'allballs', not 'allsballs' as mentioned. I know this because it
made me giggle when I first discovered it, and it subsequently became an
immature joke around the office for a day or two afterwards.

~~~
craigkerstiens
Doh! Will fix.

------
deepsun
Last time I checked, I couldn't store a datetime _with_ timezone. It was
really strange that such a powerful database doesn't support storing full-ISO
datetimes, like '2017-01-01T00:00:00Z'. Instead, it converted it to date-time-
only instant, losing information of original timezone along the way. Sure, I
could fetch it back using any timezone I want, but I really wanted to know the
original timezone it was in.

~~~
Nullabillity
When would you want use something else than UTC for business logic? Time zones
(and their related nonsense) should be a view-layer concern.

~~~
zAy0LfpBZLC8mAC
Because there actually are times that are specified in terms of local time and
are not fixed to a specific timezone. Take a birthday, for a trivial example:
The span of time in UTC that corresponds to someone's birthday depends on
their location at the time.

~~~
talaketu
true that anniversaries are not fixed to timezone, but these are not
timestamps, and generally not even dates. Just recurrences specified by month,
day (or maybe time of day eg Armistice Day observed at 11:00am).

Better example: Typically an events schedule should specify destination
timezone when registering an event, so that if regulations on local timezone
change, scheduled event times remain valid.

------
gtrubetskoy
The week example is a tad misleading, 2017-01-01 is a Sunday, which in
some/most? countries is the first day of the week.

If the date were 2016-01-01 and you compared it with what week Postgres thinks
it is, you'd get:

    
    
      SELECT date_part('week', '2016-01-01'::date);
       date_part 
      -----------
              53
      (1 row)
    

This is because 2016-01-01 is still the 53rd week of 2015.

Edit: Actually, 2017-01-01 is week 52 according to Postgres, probably because
it uses Monday as the first day of the week.

~~~
lobster_johnson
Postgres uses the ISO definition of week for "week", which starts on Monday.
For "dow", it uses the American week definition.

~~~
anarazel
isodow for the sane definition ;)

------
joeclark77
This would have been real useful to me about a week ago as I was writing
several of these types of queries!

On the debate of "timestamp vs timestamptz" I reached the opposite conclusion
of the author: I've got Amazon RDS instances set to UTC and my timestamps are
stored as UTC times with no timezone awareness. Instead, I add the timezone
while querying. I think this is better because I never have to remember
anything about server settings!

I discovered that the `AT TIME ZONE` clause has two meanings, so I sometimes
have to use it twice. In this example which selects all records created this
month:

    
    
        ...WHERE create_date  AT TIME ZONE 'UTC' AT TIME ZONE 'America/New_York' > date_trunc('month',current_date)
    

the first occurrence of `AT TIME ZONE` tells postgres that the timestamp is in
UTC (which it is) and the second occurrence subtracts four or five hours
(depending on daylight savings time) to show New York time. If I only had the
second such clause it would _subtract_ that many hours... it would think I was
giving it a New York timestamp and I wanted to see the UTC time.

------
eastern
Actually using generate_series makes little sense. Why should one repeatedly
calculate data that will never change.

I have this table:

CREATE TABLE all_dates ( date_stamp date NOT NULL, is_month_end boolean,
is_year_end boolean, is_week_end boolean, is_quarter_end boolean, CONSTRAINT
all_dates_pkey PRIMARY KEY (date_stamp) )

filled with data from 1st Jan 1980 to 31st Dec 2050, which is the range my
application needs.

It's a mere 22k rows and has a whole host of uses.

------
mirekrusin
timestamptz doesn't embed timezone, it stores it as utc without any timezone
information.

timestamp does the same - stores value without timezone information.

the difference is with writing/reading those values where timestamptz behaves
as you'd expect and timestamp ignores timezone information.

timestamptz - gives you the thing that exists: unique point in time, ie. if
person A in australia and person B in europe hits the red button at the same
time - timestamptz will have the same value, regardless of the fact that those
two timestamp strings had different representations.

timestamp - gives you this local view of time: when person A in australia
wakes up 6am to work and person B in europe wakes up at 6am to work - they
will hit the snooze button and it will create same value in the database -
even though those events happened hours apart.

in both cases you'd have to store timezone in separate column if you want to
extract information on which timezone the timestamp was generated in. let me
repeat - both cases loose information on timezone. they just do it in
different way - timestamp by ignoring it completely and timestamptz by mapping
it correctly to unix epoch.

------
revicon
Weird, the example in the post (after changing table/field names for my
database)

    
    
      with weeks as (
        select week as week
        from generate_series('2017-01-01'::date, now()::date, '1 week'::interval) weeks
      ),
    
      SELECT weeks.week,
        count(*)
      FROM weeks,
        test_results
      WHERE
        test_results.date_created > weeks.week
      AND
        test_results.date_created <= weeks.week - '1 week'::interval
    

Throws an error for me...

    
    
      ERROR:  syntax error at or near "SELECT"
      LINE 5: SELECT weeks.week,
              ^

~~~
barsonme
yeah, the comma after the "with" block shouldn't be there.

i.e.,

    
    
        ... weeks
        )
        SELECT weeks.week

~~~
revicon
makes sense. After removing it...

    
    
      with weeks as (
        select week as week
        from generate_series('2017-01-01'::date, now()::date, '1 week'::interval) weeks
      )
    
      SELECT weeks.week,
        count(*)
      FROM weeks,
        test_results
      WHERE
        test_results.date_created > weeks.week
      AND
        test_results.date_created <= weeks.week - '1 week'::interval
    

it throws...

    
    
      ERROR:  column "week" does not exist
      LINE 2:     select week as week
                         ^

I would move this to the post's own "replies" section, but it doesn't have
one.

~~~
combatentropy
This should work. I'm sorry, I conformed it to my own style (lowercase and
different indentation).

    
    
        with weeks as (
            select week
            from generate_series(
                '2017-01-01'::date,
                now()::date,
                '1 week'::interval
            ) week
        )
        select weeks.week, count(1)
        from weeks, test_results
        where
            test_results.date_created > weeks.week and
            test_results.date_created <= (weeks.week - '1 week'::interval)
        group by week
        ;

------
pvaldes
I understand by the discussion that if you want to have a field with only four
qualitative categories (0, 1, 2, 3 with zero meaning "none" and 3 being "very
much") you could use a numrange or int4range for example instead the standard
integer type. Interesting. Apart of being much more restrictive in the allowed
input, are other advantages (less memory?) or cons (possible portability
problems?) that we should be aware of?

Footnote:

> Here’s just a few examples of things you could do with interals:

The author of the article could want to fix the 'interals' typo in the text.

------
midmoon2001
Handling with birthdays and Ages, my Fav: select age('1971-01-01'::date);
select age('2015-01-01'::date, '1971-01-01'::date );

------
rainbowliquor
date_trunc() is one way but to_char is even better as you can get the
resulting output to something nicer. Doing:

    
    
      SELECT DATE_TRUNC('week', CURRENT_TIMESTAMP);
    

gives:

    
    
      2017-06-05 00:00:00+00
    

vs:

    
    
      SELECT TO_CHAR(CURRENT_TIMESTAMP, 'YYYY-WW"wk"');
    

gives:

    
    
      2017-23wk

~~~
extesy
It's not the same: date_trunc returns timestamp, to_char returns a string.

~~~
rainbowliquor
True but in the use case of OPs example. In the article OP says "So if we
wanted to find the count of users that signed up per week:" which "2017-06-05
00:00:00+00" isn't a week, it's a date (with a time stamp as well which isn't
pertinent) which happens to be the beginning of a week. Using TO_CHAR() with a
format string makes it more legible and more recognizable.

------
andrewfromx
Have to mention
[https://github.com/activityclub/pointspaced](https://github.com/activityclub/pointspaced)
in a hn post about time. I use psd for so many queries now vs sql.

------
jontro
Small question / nitpick,

WHERE created_at >= now() - '1 week'::interval

would mean in the last 7 days right? not last week?

Did some work on this recently in mysql and had to resort to calculating this
using strtotime('last week');

~~~
craigkerstiens
Correct, it would give the results from this exact moment in time to that same
timestamp 7 days ago. Were you thinking it might give you up to say the start
of the last week or something?

~~~
jontro
Reading the end of the sentence "within the past week:" just above. However I
would be interested to know if the "last week" date range is easily doable in
postgres :)

~~~
craigkerstiens
Will try to clarify it a bit further. Offhand you could probably do something
like the following to find if it fell within some week range:

created_at BETWEEN (date_trunc('week', now())) AND (date_trunc('week', now() -
'1 week'::interval))

------
Dowwie
the generate_series example is a mess

For those who want to implement this in Python, I've written a gist:
[https://gist.github.com/Dowwie/bec0a29bcd37eea41cde8d5188626...](https://gist.github.com/Dowwie/bec0a29bcd37eea41cde8d51886267a4)

------
CharlesW
Any recommendations for similar "best practices" articles/guides for MySQL?

