Hacker News new | comments | show | ask | jobs | submit login

https://www.postgresql.org/docs/9.6/static/rangetypes.html

https://www.postgresql.org/docs/9.6/static/functions-range.h...

Holy shit. Why hasn't anyone talked about it sooner? I've seen literally dozens of tables with

    begin TIMESTAMP,
    end   TIMESTAMP,
and with handmade validation against intersection. And there is even a union operation! Seriously, my mind is blown.

Rails even supports it!

http://edgeguides.rubyonrails.org/active_record_postgresql.h...




People don't know to look for what they don't know.

It's a bit of an issue for something offering unique or novel features, because there's the risk no one will ever find them.

I'd possibly think that most people also just go "we'll use a db" and pick from mysql and postgres (probably based on which one comes to mind first for them), but don't ever think about it any more than that.

I wonder whether an official "cool things you possibly don't know about (and when/why they're useful)" page could help? Or maybe a "designing a schema/model? here a some features that might make your life better" page could help spread the word? Postgres is lucky to have Craig as an advocate, and I love reading everything he writes, but maybe someone just getting exposed to Postgres doesn't know to look at Craig's stuff? (Although, maybe Google rankings would make this page appear early enough to do a great service...)

I was talking to another dev (who's very experienced) about dbs and I said I'd always expect to go postgres because of the extra features (and intentionally tightly couple to it) and the response was "yeah but how often do you actually use them" which initially shocked me (answer was "all the time"....), but now that I think about it; it's probably a pretty common perspective...


I'm a bit flattered. The real credit goes to all of the devs who contribute to Postgres, though I'm happy to have been involved in some way and contributed some outside perspective to many of them that are friends.

Of course... I'd welcome more reading my blog, at the same time there are a lot of others that introduce good Postgres content to the world. This was my biggest goal in starting to curate Postgres weekly (http://www.postgresweekly.com) which focuses on Postgres for app devs. There's a ton in Postgres, which 9 years into spending way too much time with Postgres I'm still discovering each week so trying to help make a better way for others to learn was key.

Sadly the hard part about the cool things isn't the most important. Cool gets a lot of attention, but the really safe and stable things like transactional DDL, rich indexing, MVCC all that doesn't get as much love as hstore, JSONB, or HyperLogLog which is a shame cause they're really important when you're running a real business which cares about data integrity.


>> cool things you possibly don't know about

A hugely useful feature of postgres, that nearly no ORM/ActiveRecord library makes use of, is the RETURNING clause of INSERT statements[1] (also supported on UPDATE and DELETE statements, though those are less useful).

A write statement with a RETURNING clause returns a resultset just like SELECT, with one row per row inserted, updated, or deleted. In this way, you can request the values of columns whose data was dynamically determined by the postgres server. This is especially useful for sequence (ie: SERIAL, autoincrement) values, as well as other columns with default values (such as an insert datetime, or a column with a value based on a complex expression).

This functionality is amazing, yet is so underused. Try to do a multi-row insert statement with MySQL with an auto-increment primary key, and know which IDs were assigned to your rows. You can't, because there is no guarantee your auto-increment values will be sequential.

Example query, which will return the auto-increment sequence id and default insertion timestamp of the two inserted rows:

    INSERT INTO members (name) VALUES ('Foo'), ('Bar') RETURNING id, created_at;
[1] https://www.postgresql.org/docs/current/static/sql-insert.ht...


Some people don't want to use SQL. Or learn it. Or read documentation.

The anti ORM movement convinced only a few interested database users.


Many ORMs like ActiveRecord, the Django one and SQLAlchemy support it just fine, so it's probably just a matter of reading the docs :)


I find the Django docs a bit more digestible then PG docs, personally.

It's because of the Django docs that I've learned about a few of those awesome fields types. https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/...


Letting go of ORMs is so easy with good lightweight model binding libraries. e.g. Dapper (https://github.com/StackExchange/Dapper)


We use something like Dapper at where I work. I think for some projects it's fine, but for some a full ORM can really pay dividends. Especially if you're still working out your problem domain, all the hand crafted SQL can be a pain to update if you find you need to add just 1 more field, or if you figure out you need to split a table up because of a new unforseen feature, etc.


Dapper is great for reads but it doesn't handle complex updates nearly as well, although I'd argue most of those are architectural/design issues in nature.


How do model binding libraries compare to ORM's w.r.t. performance, e.g. query optimization?


ORMs are doing more work and are naturally going to have more overhead. With Dapper , at least with the way I used it, you're going to write your own SQL queries so there is nothing to optimize. I've even run into some nasty SQL server parameter sniffing issues. But despite that I still prefer the style over ORMs.


MyBatis is great for Java as well.


> The anti ORM movement convinced only a few interested database users.

IMO this was because they were too dogmatic and ignored the things ORM's did better, focusing on performance that didn't matter or on issues that come up even when you don't use an ORM (n+1 in the architecture).

That said, I don't think anyone that hasn't learned SQL really don't understand their ORM either. It's one of the fundamentals that any developer should know before they ever write production code.


A rather important library we use at work is built on Potsgres range types. And we interact with it through an ORM (a co-worker of mine will be presenting how it works, and the open-source release of it, at PyCon this year).


ORMs are tool. Arguments for and against them are tired and irrelevant. Use the tool if and where it helps.


And you can create constraints to ensure only unique ranges in a table!


Not just unique ranges (which is trivial), but also non-overlapping ranges!


Sorry, that's what I meant!


There goes a slab of my painfully thought out test cases.


Careful with this if you want to use it for partitioning data (it was atleast true till 9.6 in my experiments). You will never hit the exclusion parameter and consequently end up scanning all the child tables in vain.

Hopefully the partitioning thing is soon fixed 'properly'.


Um, can you explain it like I am 5, why this is a Holy Shit thing?


Because everyone who has been constrained to other databases has done things like "SELECT start, end FROM reservations" and then tried to write the where clause to find the range, had bugs (this person reserved a room for longer than the expected time).

And the end of the day, they ended up selecting all the start and end dates and then building their own interval tree ( https://en.wikipedia.org/wiki/Interval_tree ) and doing overlap detection themselves.

If they were using a database with good stored procedure support and had the skills to write the stored procedure or function in the database, they might then fight with writing the overlap detection there... but they likely found it a profoundly unpleasant experience.

And then here's PostgreSQL. It has it native. Hours, days, weeks of time spent writing, testing, debugging and otherwise fighting with this problem now suddenly become something along the lines of "SELECT isempty(daterange('2017-04-30', '2017-05-07')) FROM reservations" or something to that effect.

It represents a significant time savings when you need it for things that are more interesting than reimplementing data structures that aren't in the standard libraries.


Specifically, here's how that would be done in Postgres:

  create table reservation1 (duration tstzrange);
  select * from reservation1 where duration && '[2017-04-30, 2017-05-07]';
In a database without range types, it would look like this:

  create table reservation2 (start timestamptz, stop timestamptz, check (start <= stop));
  select * from reservation2 where '2017-04-30' <= stop and start <= '2017-05-07';
Personally, I don't think range types by themselves are anything special. However, combined with indexes and exclusion constraints, they're really great:

  create table reservation1 (duration tsrange, exclude using gist (duration with &&));
  insert into reservation1 values ('[2017-04-30, 2017-05-07]');  -- ok
  insert into reservation1 values ('[2017-05-10, 2017-05-12]');  -- ok
  insert into reservation1 values ('[2017-05-05, 2017-05-08]');  -- error!
Here's the difference indexes made when retrieving all the scheduling conflicts with a fixed interval (500k rows):

  No index (start, stop)        123ms
  Btree index (start, stop)     29ms
  GIST index (duration)         0.2ms
Quite a difference!


I have known about these for quite some time, but my only issue is that not all drivers in a particular language support range types.


If your driver or ORM is insufficient, you can use views:

  create table reservation (room text, duration tstzrange);
  insert into reservation (room, duration) values
      ('A100', '[2017-04-30, 2017-05-07]'),
      ('A100', '[2017-05-10, 2017-05-12]');
  
  create view my_orm_sucks (room, start, stop) as
      select room, lower(duration), upper(duration) from reservation;
Let's pretend building A caught on fire, so we have to move to building FM:

  select * from my_orm_sucks;
   room |         start          |          stop          
  ------+------------------------+------------------------
   A100 | 2017-04-30 00:00:00-07 | 2017-05-07 00:00:00-07
   A100 | 2017-05-10 00:00:00-07 | 2017-05-12 00:00:00-07

  update my_orm_sucks set room = 'FM200' where room = 'A100';
The underlying table gets updated:

  select * from reservation;
   room  |                      duration                       
  -------+-----------------------------------------------------
   FM200 | ["2017-04-30 00:00:00-07","2017-05-07 00:00:00-07"]
   FM200 | ["2017-05-10 00:00:00-07","2017-05-12 00:00:00-07"]
Note that updates involving the start and stop columns won't work with a view like that. You can use the rules system to get around that:

https://www.postgresql.org/docs/current/static/sql-createrul...

https://www.postgresql.org/docs/current/static/rules-views.h...

Or better yet, fix the driver or ORM!


Thank you for that idea with views, I had not thought about that perspective. It will come in handy if I have to use ranges in the future. I do think ranges are a better way to model things.


Which language(s)?


not a single driver in go supports them


Well, that seems to be a recurring issue with go.

Not nearly as mature libs as Java.


I am using Go, that is where I ran into the issue. But for the most part, I just stick to ANSI sql and I am ok.


Couldn't you just have a constraint based on more then one column? In fact for many applications there are different types of resources (rooms, people, etc) which you might want to be unique within anyway.


You still would need PostgreSQL's exclusion constraints to handle overlapping ranges since unqiue constraints do not handle checking for overlapping ranges, only exact equality. The below example guarantees that the same room can only be booked once.

    CREATE TABLE reservations (
        room text,
        during tstzrange,
        EXCLUDE USING gist (room WITH =, during WITH &&)
    );
You could also still have two columns and use an expression.

    CREATE TABLE reservations (
        room text,
        start timestamptz,
        stop timestamptz,
        EXCLUDE USING gist (room WITH =, tstzrange(start, stop, '[]') WITH &&)
    );


EXCLUDE constraints require the range to be a single value. (You could EXCLUDE over an expression based on both columns but it's nicer to have one.)


What do you mean by nicer? Not all drivers support range. Why is it better than an exclusion over an expression with two columns? (with simpler data types that the driver certainly supports)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: