OP here (happy to see this on HN!) - if you're interested in our work in this area, I hosted a webinar earlier this week where we walked through a new constraint programming based system (utilizing CP-SAT) for Postgres index selection that we're starting to roll out in pganalyze:
Compared to the initial version, this updated version is both more configurable, as well as has better handling of competing objectives in index selection (write overhead vs query performance).
If you want to give this a try, and have pganalyze set up / want to try it out, feel free to send me a message (email in profile).
Ok, from battle trenches: even if there is a perfect index for your query, the slower plan might still win and there is little you can do about it. So, often you rewrite query or even restructure tables to archive performance.
If rewriting query and remodelling data are out of question, the options are much more limited.
Second, not only queries have rps, they have hourly, weekly and seasonal distributions. They evolve, become deprecated, and have different SLAs, tables have different write to read ratios. There was an instance recently when I slowed down a query, quite intentionally by deleting a very good index for the query and substituted it with worse, but smaller BRIN index.
The thing is, this particular query did not matter as much and SLA permitted slowdown, write path was far more important.
What I would like, and I assume GP as well, is the ability to write the low-level querying, as in bypass SQL and the planner, and give postgres the post-planner bytecode itself.
Exactly. I use hints as well, but in the end, you wonder, why waste your time trying to get the optimizer to do what you want, when you could bypass it entirely?
If your query doesn't know its own cardinality and can't scan do you just know you're always going to return one row? I ask because otherwise in my mind disabling statistics is usually a Bad Plan.
I’ve often wondered why we don’t put a thin tiny proxy in front of psql (or augment pgbouncer, other proxies) to collect telemetry on all real world queries against the db. Being a middle man there would give you lots of visibility into every single query being made as well as how long it takes. I guess the modern integrated stats tools help.
I think the real problem is not indexing correctly but rather modeling your problem correctly. You can throw indexes at the problem but that is sometimes just a bandaid to a more integral issue which would be inventing a new schema, new access patterns etc.
A relational database really shines when you have all sorts of queries on tables, so it’s hard to optimize your model for every potentially future desired query. Precalculating these, caching the results in separate tables is not very efficient, and requires you to predetermine every query you want to answer.
Indexing in this situation is often the better alternative.
Ideally yes, but sometimes your tables get so monstrously large that the auto-analyze is sampling thousandths of a percent of rows, and fiddling with sampling targets can throw off seemingly unconnected queries.
Planners are mysterious black boxes, which we can at best influence.
Have a look at the Postgres filter for the Envoy proxy: [1] (blog announcement post).
While it's not capturing as of today query performance, it collects notable telemetry for Postgres in exactly the way you mention: just because the traffic flows through it, making it a way to collect data "for free" and definitely without taking any resources from the upstream database.
It can also offload SSL from Postgres. The filter could be extended for other use cases.
Disclaimer: my company developed this plugin for Envoy with the help of Envoy's awesome community.
We have this with APM monitoring, where every SQL query in the application becomes a span in a trace. It is very useful. We can quickly see the specific parameters that make an otherwise fast query slow.
> I think the real problem is not indexing correctly but rather modeling your problem correctly.
There's also the issue that usage might change over time due to events outside of your control, that affects what the correct solution should look like.
For example, there's always a tradeoff between development effort and performance. It doesn't make financial sense spending tons of engineering time on optimizing a process that's done once per day and for which the simple solution is more than fast enough even though it takes an hour.
But if external circumstances change and that process suddenly needs to run every minute... perhaps not so easy to accommodate without drastic changes.
I definitely agree on problems not being modeled correctly, although I also frequently see application developers forego indexes. I think this is fundamentally a lack of understanding or inexperience.
Exactly, I have always been curious about the same thing - if we have live statistics on queries then what stops us from cloning data (to correctly capture statics such as skew, etc.) and running these live queries with a bunch of knobs tuned automatically (which indexes, on which columns, etc.), and reporting the cost-throughput paretto optimum to the user? The configuration space is too large to explore randomly? And it's structure is too complex to produce automatic suggestions?
I had no idea what was going on under the hood but used PGAnalyze for about 6 months after launching a new product. It was excellent at suggesting indexes and monitoring unused ones.
However, after a few months the ROI crept down until it wasn't worth it anymore (access patterns stabilized). I'm tempted to bring it back once and a while but the price tag keeps me from having it always on.
That makes sense, if the structure of your queries is largely static per commit of your codebase. You'd probably get more benefit out of a tool like this by running it as part of a CI pipeline, since that's the best time to test code changes including new queries. But then the challenge becomes simulating user activity and traffic levels during the CI pipeline. That's a solvable problem, but it's not as easy as just observing the real traffic in your prod network.
Exactly the same experience. And I wish it wasn’t. I also believe the pricing is wrong. I’d continue using it at $100/mo. But otherwise I’m just cancelling.
You can find our pricing on the website [0] - generally $100/month per instance (though note our smallest plan is $150/mo). pganalyze Index Advisor, which is the subject of the post is available with all plans.
Generally I (Founder and CEO) feel the pricing is fair for the value provided for production databases, and it allows us to run the business as an independent company without external investors, whilst continuing to invest in product improvements. That said, it may not be a good fit if you have a small production database, or only make database-related changes infrequently.
Seeing as you're here, the reason the price is too high for us is that "SCALE" plan is too much for us and the "PRODUCTION" plan is too little. I'd happily pay $200/month for two servers, but can't justify $400/month for two (we only have two right now).
Thanks, thats good feedback! And agreed, two servers is a bit of a gap in our pricing right now. If you're open to paying annual instead of monthly, we have a bit more flexibility since we can set them up as custom contracts (feel free to reach out)
> This is powered by a modified version of the Postgres planner that runs as part of the pganalyze app. This modified planner can generate EXPLAIN-like data from just a query and schema information - and most importantly, that means we can take query statistics data from pg_stat_statements and generate a generic query plan from it. You can read more about that in our blog post "How we deconstructed the Postgres planner".
Having something like this available as a library inside of Postgres seems really beneficial for tool authors. I wonder what the odds of getting it upstreamed are?
Author here - what we call "pg_plan" internally, is essentially very similar in spirit to what we do with pg_query [0], but the difference is that we pull in a lot more code from Postgres, and have more overrides in places that are not needed for the use case (e.g. MVCC handling, etc).
My gut feeling tells me that the chances of having an upstream library that contains the parser/parse analysis/planner are slim. Mainly from there being a lot of entanglement with reading files on disk, memory management, etc - I suspect one of the pushbacks would be it would complicate development work for Postgres itself, to the point its not worth the benefits.
For the pg_query library on the other hand I have hopes that we can upstream this eventually - there are enough third-party users out there to clearly show the need, and its much more contained (i.e. raw parser + AST structs). Hopefully something we can spend a bit of time on next year.
> My gut feeling tells me that the chances of having an upstream library that contains the parser/parse analysis/planner are slim.
Yea. The parser alone would be doable and not even that hard. But once you get to parse analysis and planning, you need to access the catalogs (for parse analysis to look up object names and do permission checks, for planning to access operator definitions, statistics etc). Which in turn needs a lot of the catalog / relation cache infrastructure. By that point you've pulled in a lot of postgres.
Of course you could try to introduce a "data provider" layer between parse analysis and catalogs, but that'd be a lot of work. And it'd be quite hard to get right in places - e.g. doing name lookups without acquiring heavyweight locks on objects, before having done permission checks, in a concurrency safe way, relies on a bunch of subsystems working together.
I mean, great it catches more easy wins but until you do an EXPLAIN ANALYZE your plan analysis is worth sh*t (sorry I am highly skeptical of those solutions based on personnal experience). I have seen so many cases where you add a "perfect" index for one query and it completely wreck the performances of another query because the planner decides to change and it makes a worst decision. It is very common in postgres to intentionally not add a good index or modify the query because we dont have query planner hints (rant for another day).
We need better tooling around indexes but it needs to take into account all the queries and run them against the real dataset IMO. I think AWS Aurora plan manager is on the right track if it would be combined it with your tech.
I'm a weirdo here on hacker news because I've mainly worked with Microsoft SQL databases my entire career, but it absolutely blows my mind when I remember that postgres can't lock in a query plan.
It's so nice being able to sleep at night and not have to that an ultra important won't randomly shit the bed.
One of my buddies that works at a PostgreSQL powered place once bitched about how they had a maintenance job that truncated a big table, then another job refreshed statistics, then the table rapidly filled up again and ask the query plans were awful because the table is usually massive.
I get the religion behind the open source stuff: with SQL, you shouldn't have to tell the thing how to build the query plan, you should just tell it what you want and let it figure out how to do it.
But if humans are expected to perform maintenance on the database, setting up scheduled jobs, refreshing statistics, identifying missing indexes, etc., then those same humans should be able to specify a query plan and/or specific indexes to use in a query.
I worked at a company with a big, complex Postgres database behind the back end, probably half of all the incidents we had were related to the query planner suddenly deciding to change its approach.
Eventually you get a vague idea of how to coax Postgres to make the plans you want it to, but the fact that you can't at least lock it in and the plan might change with any number of factors at any time... That's just bad design.
the guys at pganalyze do a fantastic job explaining the inner workings of postgresql on their YouTube channel, especially the 5 minutes of postgre series
> A fundamental decision we've made for the pganalyze Indexing Engine is that we break down queries into smaller parts we call "scans". Scans are always on a single table, and you may be familiar with this concept from reading an EXPLAIN plan. For example, in an EXPLAIN plan you could see a Sequential Scan or Index Scan, both representing a different scan method for the same scan on a given table.
https://www.youtube.com/watch?v=SlNQTtfjlnI
Compared to the initial version, this updated version is both more configurable, as well as has better handling of competing objectives in index selection (write overhead vs query performance).
If you want to give this a try, and have pganalyze set up / want to try it out, feel free to send me a message (email in profile).