Hacker News new | past | comments | ask | show | jobs | submit login
Mastering PostgreSQL in Application Development (masteringpostgresql.com)
440 points by thibaut_barrere on Nov 6, 2017 | hide | past | web | favorite | 94 comments

As someone that cares greatly about the Postgres ecosystem there aren't nearly enough books out there, so really excited to see this one come to life. Instead of targeting DBAs the focus on app devs is very refreshing so that people can start to leverage more of the awesome in Postgres. I haven't yet read the book myself but absolutely will.

If you're curious to read some of Dimitri's writing ahead of time his blog is also a great resource - http://tapoueh.org/

I think it's to a degree a function of significant volume of high quality material available in the open. It's hard to sell a book when you can go to Bruce Momjian's site https://momjian.us and get access to a ton of information and that is just one resource.

Agreed there are a number of pieces out there in the open, but not many in a consolidated form or that flow from one step to the next. Perhaps it is because there are a number that blog about all things Postgres, but a really common question I hear from people is what book should I pick up if I want to become an expert at Postgres. And while some books are out there, there hasn't been a perfect answer to date, hoping this one helps :)

Thanks for sharing the link to Momjian’s blog. I enjoyed reading his posts for content and his concise writing style.

"focus on app devs"

This is exactly the type of book I've been wanting for a long time.

I've been wondering for a while about the practicalities of using postgres as a replacement or near-replacement for a traditional backend - does anyone have any opinions on how far it's wise to go with this (postgres-only? authentication? direct connections from the client?) and/or how feasible that kind of thing is?

Derek Sivers wrote in 2015 about merging more of the business logic into postgres, including generating JSON in postgres [0]. I've personally used this approach and found it quite powerful. There may be downsides to that approach if somebody else is going to be maintaining the project and doesn't understand postgres too well.

There's also PostgREST[1], which exposes a REST API to a postgres database and has been discussed on HN before [2,3]

[0] https://sivers.org/pg [1] https://postgrest.com/en/v4.3/ [2] https://news.ycombinator.com/item?id=13959156 [3] https://news.ycombinator.com/item?id=9927771

This depends a lot on the usecase. You have to consider things beyond the simple serving of requests, things like rate-limiting, caching, logging, event tracking, for example.

About a year ago I designed a system for a client that was just Postgres with a Go http frontend. Go was used to handle http requests and responses, to translate the API from http to postgres functions/views, and serve the response straight from postgres. Even authentication and access control was handled using postgres's role system and RLS was looked into (and found to be viable for when needed). Postgres was a good choice because the project involved mainly a lot of data wrangling. Of course, I could do it this way because it was a completely internal system with a fixed number of users, and the burden of maintenance was minimal.

Main reasons for not going this direction, IMO, would be:

1. Developer proficiency, while building and maintaining. Far more people know Python, Ruby, JavaScript etc. than SQL. Far more people know how to think in imperative programming than to think in data models. Of course, one can write Postgres-hosted imperative programs as functions (in languages ranging from plpgsql to javascript), but at that point using the same language in a node or django app is much easier.

2. Unsuitability. Some things are just not suited to be run in a database. Realtime multiplayer games, multiple-source data compositing APIs, view rendering etc. come to mind. The truth remains, that general purpose programming languages (or special purpose, where the purpose is serving, languages) and environments have more possibilities than an environment that grew around dealing with data.

Sounds like client-server development on oracle 20 years ago...

Why not? There's only so many use cases for a relational database - "everything old is new again", and all that.

Or Sybase Open Server.

I haven't put anything into production yet but I've been playing with PostGraphQL (https://github.com/postgraphql/postgraphql). It automatically creates a GraphQL endpoint based on your schema using Node. You define your application logic as functions in your schema. It also can handle authentication for you. I'd definitely recommend checking it out.

Not sure that I would do authentication directly in PG (although it has many different methods [0]), I think it's still doable. You could move whole oAuth to PG through PL/SQL (maybe there is an extension for it?).

I would also encourage to introduce a proxy in between (e.g. pgbouncer [1]) to help you to handle connections and deal with the authentication.

We're using PG as "the brain" of our service since 2014 and it never failed us. The biggest downside is that there are no tools to help to debug or measure the performance of the code you wrote. However once you have functional logic, you can write fairly lightweight application layer around it and change it as often as you want.

You can also scale this setup almost indefinitely thanks to extension like citus [2] and still keep your application layer fairly thin.

[0] https://www.postgresql.org/docs/current/static/auth-methods....

[1] https://pgbouncer.github.io

[2] https://www.citusdata.com

> Not sure that I would do authentication directly in PG

I agree. But I am thinking about making a database role for each user. The user becomes that role after signing into the front end, like Apache.

   db.exec('set role to $1', req.remote_user);
Apache 2.4's form-based authentication makes this attractive.

> The biggest downside is that there are no tools to help to debug or measure the performance of the code you wrote.

Isn't there the explain command (https://www.postgresql.org/docs/current/static/using-explain...) and the \timing option for the command-line client, psql (https://www.postgresql.org/docs/10/static/app-psql.html)?

> I agree. But I am thinking about making a database role for each user. The user becomes that role after signing into the front end, like Apache.

I figured. If you isolate the data well enough, it could work. I'm always paranoid when it comes to DB.

> Isn't there the explain command and the \timing option for the command-line client, psql? Yes, there is. This however doesn't help you with triggers and UDF, which is how you usually create the logic in the DB.

I don't know how much cross-browser testing you do, but the site looks completely broken in Firefox, renders fine on Chrome.

EDIT, I tried adding https to check (cert is wrong too, however) and it looks better. Be aware the link posted (for some reason) is HTTP only.

Hmm. Firefox 56.0 running on Ubuntu 16.04 and it's working fine minus the hijacked scroll. Note to all front end developers: STOP HIJACKING SCROLL! It's bad UX. No exceptions.

I'm also seeing broken/missing CSS on Debian/FF 56.0.2.

Yep, that would tally, I'm on Ubuntu 17.10

The linked version works fine for me, Windows 10, Firefox 56.0.2

Launched today, and already has the best sellers package.

you must be new to internet marketing

Not really surprising, it probably just selects the SKU with the highest number of sales. Even if that's just 1 sale it's still the 'best seller'.

@tapoueh what level is this targeted at? It looks like it starts from a relatively basic level and covers pg usage fairly broadly. Would you recommend it to advanced users, or does it not go too deep into advanced usage?

The book definitely go into advanced usage. If you're already fluent in LATERAL joins, GROUPING SETS, Arrays, and JSON techniques, you might get bored to some degree.

Every single chapter has some beginner, then intermediate, then advanced topics covered, so I think everybody should have some of the right content, really. I tried and paid attention!

Why do you need my email to see the table of contents?

So that you can receive updates if I make some. You don't have to, though, read http://tapoueh.org/blog/2017/11/mastering-postgresql-in-appl... where you will find the full Table of Contents, or just buy the book ;-)

Best not to sneak an email collection scheme in on people even if it really is only for your newsletter. Most people have more than enough spam in their inbox already, if someone wants to get updates it's very easy for them to just click a button that says "Click here to get updates".

Email for updates being a foot in the door for further communications is pro forma and legally regulated. I think you're unnecessarily suspicious.

Even if it isn't nefarious, it's often unwanted. I usually turn around if I have to provide my email to see some sort of whitepaper or sample because it means I will have to later unsubscribe from whatever email list I was put on and hope they don't share the list with others (which happens, often).

Whenever someone asks for an email, I just type lalelu@yopmail.com, and then I go to http://yopmail.com/lalelu and click the activation link.

It's a beautiful service.

If only there was an email service that let you create multiple email addresses for free! Or better yet one that let you create an email address that only exists for 10 minutes. A 10 minute mail service if you will.

Basic marketing to sell the book...don't be dense

The background image in the e-mail collection form can be opened in a new tab. It's full-size, so you can see the ToC without buying to book or signing up.

I wish there was a discount code for the Dead Tree Edition. I have a lot of trouble focusing on reading PDFs on my computer versus having a physical book in front of me. And I don't have a tablet, unfortunately.

FWD15 gets 15% off, which basically negated the price of shipping for me.

Thank you kind sir. You dfinitely made me move forward with a purchase.

I have a curiosity: Some new SQL technologies, such as NuoDB, or MemSQL, chose to implement horizontal scaling by separating the data storage and transaction processing machines. I find this awesome because the data storage can scale independently from the heavy lifting of maintaining transactions and running big queries.

Additional machines can be added or removed from processing queries without having to scale the storage, and not have to worry about the complexities of data sharding, and data rebalancing.

Does anyone have experience with this kind of paradigm? Can PostgreSQL do something similar?

MemSQL doesn't quite do that. It has 2 tiers, aggregators and leaves. Leaf nodes store the data but they also do computation of local node data while aggregators split up the original query and assemble results with final ordering and filtering.

PostgreSQL can do something similar by using FDW (foreign data wrappers) to child databases, which can be other postgres nodes or different datastores entirely. It's still relatively naive with access but possible, and I'm not aware of any polished product that will do it all for you. The closest is systems like Citus, PipelineDB, TimescaleDB that work as extensions but the nodes do both compute and storage.

However there are now several SQL execution engines that you can use like Apache Spark (SQL), Apache Drill, Presto, Dremio, and others that will run SQL queries and joins over several different data sources, so you can scale each layer independently. It works especially well for data lake/warehouse needs where you can run an elastically scaling group of execution nodes against files in a cloud storage bucket. Not as fast as a focused system but cheap and effective.

Postgres doesn't have anything built in today, though a single node will comfortably scale to ... let's say somewhere between 100GB and 10TB depending on your use-case. Beyond that you can do read scaling with replicas, implement something with Foreign Data Wrappers, or hand-roll a sharding solution and stay on vanilla.

Alternatives to the DIY approach include open-source things like Postgres-BDR, CitusData, TimeScaleDB or closed-source single-vendor things Amazon Aurora, and RedShift.

Looks like a pretty rote implementation of Authority[0], but with a more modern UI. I wish we used PostgreSQL at my day job, this looks pretty interesting! I particularly like the idea of including the Docker container at the highest price point.

Edit: Looks like email sign ups are through ConvertKit, which is Nathan's company. Definitely Authority!

[0] http://nathanbarry.com/authority/

Just FYI, in the "The Dead Tree Edition" section, "professionally" is spelled wrong:

A professionnally printed paperback of 332 pages, with the same content.

Maybe it’s the first edition of the website :D

Thanks, fixed now!

"The Full Edition comes with the eletronic book package and an archive with a PostgreSQL dump file ready to be restored in your own instance, plus all the 265 queries as separate files for you to run and edit!"

eletronic is also a typo.

Fixed now. I really should have had my reviewer work on the website too, not just the book itself. Thanks guys!

The “dead tree edition.” Good play, sirs.

I’m going have to get that one though. I can’t seem to focus/finish when reading educational books on my kindle or in PDF form.

I like the “turn thousands of lines of code into queries”... I can’t count the number of times I’ve seen application code doing something a query could do either to appease an ORM or because a backend engineer wasn’t fluent in intermediate SQL.

Yeah... i'm totally guilty of abusing the 'python can do it' hammer instead of leveraging the DB. Just last week i looked at an old snippet of code that was essentially a 'query data, load in python, process, shove back in postgres' workflow. I was like, AHA i know better. spent 15 mins figuring out how do accomplish the same thing in the database. Now that shitty code is gone, replaced with 1 SQL statement that runs in seconds, compared to tens of minutes!

It feels good!

I had a rails/ar based report that I "fixed" at my last job that when it ran it generated 1100 reads (not even N+1, a lot of other crazy aggregates, etc).

It was complicated, but I got it into one query, subquery and a partition.

Went from 45s of app time to process to a <1s query!

Look interesting but I'm not crazy about the fact that to get the source + data, I have to get the $90 version.

True and it's unclear how usable cheaper options will be without those files. It's pretty much a de facto standard that source code going along a technical book is available for free. This is too bad because I would have purchased it instantly otherwise.

The book contains references for every data source it uses, all of them are available as Open Data. So you can load all the data, and copy/paste all the SQL queries with just the Book Edition.

The Full Edition isn't so much about the code contained in the book (SQL queries) as it is about having the very database used in the book handy to replay the queries.

I understand that the Full Edition doesn't make the same value for everyone, and it's fine. That's why it's not the only option. Given how many are sold compared to the other options, I'm confident it still is a solid option though.

Yep, the only version that's attractive enough (enterprise) is about 3/4 of the minimum wage here. I can't buy it, and it's a hard sell to my bosses, especially when the company doesn't even have more than 10 employees.

I've setup an Offer Code for the next 48h guys. Feel free to use it! http://masteringpostgresql.com/#packages

Thanks! That's awesome of you, but when I try to use it (as "hackernews") on the Enterprise Edition, the payment form's telling me it is invalid. What's up?

Went to fast and fat-fingered it. Works now!

Great, thanks again!

FYI. My attempt to use your gracious Offer Code on the payment form of the Full Edition, did not apply...

Fixed now!

One request :)

Please consider a Fully Dead Edition. It would give access to all the digital content, but include a paper version of the book. Thanks!

That's a good idea, unfortunately the providers don't know each other and don't make it as easy as it sounds. Given interest into that, I will see about making something happen, if possible. Thanks!

Thanks but it does not work in the dead tree edition?

I bought the dead tree edition. The FAQ page says that I should send author an email with the receipt to get the pdf and mobi version; however, I can't find the author's email address anywhere. Any clues?

disappointed /u/tomca32... easy as: SELECT email FROM authors;...

dim [at] tapoueh.org per his github: https://github.com/dimitri

Enjoy your day.

Edit: revised @ to [at] to reflect format in author's FAQ update.

The e-mail address is easily found in the paperback version, and I just added it to the FAQ as well, thanks for reporting the bug!

I didn't see any reference to PL/pgSQL in the table of contents. Does your book go into common strategies with stored procedures (e.g. When to LOOP to build a temp table vs. using table joins)?

No, I don't address that point in this book.

It gives you plenty SQL techniques to avoid ever having to LOOP in a PL/pgSQL function, and show how to use PL/SQL rather than PL/pgSQL each time it is possible, though.

Sweet. I'll give your book a read then. Gaining a more solid SQL foundation to avoid LOOP in the first place is what I need.

Looks interesting, wish I could "preview" before buying. Thankfully before venturing out on my own in the web development world, I was a member on a small business analytics team were we would all regularly write 3+ page SQL queries for reports. Sometimes they's take days to run. It was awesome and horrifying all at once but a good experience nonetheless and has allowed me to do what I do today.

I see in the FAQ that Tapoueh promises to refund anyone who isn't satisfied for any reason.

There is a 1 chapter preview available, not even halfway down the page.

It would be nice if I could buy it on Amazon.

I agree, I was disappointed not to find this on Amazon, would have bought if there, found another book with the same name though! I do ALL my reading on iPad Kindle app, thousands of books, I can’t be arsed to fill out forms to buy a book also other book or pdf readers I’ve tried have always been crap compared to kindle app.

I like the idea of it targeting developers but aren't most developers going to end up using an ORM in the end?

For example SQLALchemy in Python and ActiveRecord in Rails.

Is the book a ton of SQL examples or general theory?

Most of my day to day database activities as a web developer are creating good indexes, data modeling and setting up queries with an ORM.

I spent years in all sorts of ORMs and have sworn off them in favor of pure-SQL patterns like yesql (for Clojure) and massive-js (for Node.js). For me, the boilerplate of writing SQL queries is trivial compared to the productivity and performance inefficiencies of trying to understand and work around the magic of ORMs. Writing pure queries also encourages good practices like writing field lists in your SELECT statements and using more of Postgres' built-in functions.

The book covers some theory when it's needed and comes with lots of practical examples, all using real world data sets to make them easier to understand and relate to.

Using an ORM and how to best integrate “raw SQL” in your code is also discussed, and data modelling gets its own chapter too. I think you will like it!

Is it aimed at developers from ground 0, or someone with 10+ years of experience?

I've been working with both raw SQL and ORMs for a pretty long time. I'm definitely not at the DBA-tier and really complex queries scare the shit out of me but I haven't ever hit a complete roadblock when it came to modeling and querying data in a reasonably efficient way.

With that said, an ideal book for me would be one that just showed 100+ real world data modeling examples and how to query it in a bunch of real ways. Ideally an expert's thought process would be explained every step of the way. It'd happily pay $100+ for that just because I'm sure using it as a reference guide would save me enough hours in dev time to make it worth it.

You know what? Just buy the book and if you don't like it, tell me. I'll refund you. My guess is that you will like it ;-)

I have used PostgreSQL for 12 years. Early on I tried a few ORMs and I found them to be just something else to learn. If I had any issues, I had to figure out how it was all working on the SQL back end anyway.

My recommendation is to just directly learn SQL instead.

That's a good question. I don't use an ORM, but you do, so that tells us that 50% of developers use an ORM ;-)

Does anyone have any stats on what percentage of developers use an ORM?

IMO an ORM is not a substitute for knowing SQL, much less studying the actual database.

To me, an ORM is a tool that allows someone that knows SQL to write statements they would write by hand using business-level vocabulary.

Also, you have to be savvy enough to know when you are hitting the max expressiveness of the ORM and need some powerful SQL you better write by hand. Active Record certainly provides API for that, and it is an API you are supposed to leverage when the circumstance arises (Rails team over here).

This book seems like an amazing resource however I definitely cannot afford the price tag at $179 unfortunately. :(

Keep scrolling. :) The cheapest version, at $39, is way way down at the bottom.

I was a bit surprised by that as well. If you scroll further down the page, there are less expensive options. The eBook is $39. Paperback is $49.

Does it cover development in some particular language? E.g python+sql?

Mastering PostgreSQL in Application Development is meant to target any development environment, yet it does target some ecosystems: Python, PHP, Java and Go receive a special treatment, and I must confess Python is covered way more than the other choices.

Thanks for putting the time and effort into this, Dimitri

why do i have to give my email, to even get a look at the table of content?

not saluted

tapoueh why are you not using https on your site?

I want to buy the book but not sure if I should enter my card details on a non https site?

If you click through the links to purchase, you'll be able to see that purchasing his books are through third-party distributors which are all secure.

The payment forms were in HTTPS from the beginning, but the webpages (hosted at GitHub Pages). Now it's all HTTPS with a proper domain certificate, thanks to CloudFlare. Enjoy!

Chrome still complains that the payment form is using non secure elements on the page but anyway I bought the book.

Not really feeling sold on this.

It's called "Mastering PostgreSQL" but the site reads like it's trying to convince beginner programmers that SQL iteself is useful.

Even the case made here: http://tapoueh.org/blog/2017/11/mastering-postgresql-in-appl... seems like it would only be convincing to someone who has never had to write much SQL and also has not already had to do a pattern like that in an API call.

Yes the book targets more the SQL beginners, which are in my experience developers of all levels, from beginners to experts.

The title is inspired to a great deal by an idea of how to best learn new things, which is very nicely covered in the following Ted Talk: https://www.ted.com/talks/sal_khan_let_s_teach_for_mastery_n...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact