Hacker News new | past | comments | ask | show | jobs | submit login

I kind of hate this "tl;dr" culture. I can explain the reason why this is never going to change (though it may get better with index scans) but I expect you'll just see a wall of text and decide not to read it. But if you never read long answers, you have no right to expect solutions to your problems or to understand anything. So I hope you'll read this, it may help.

PostgreSQL and most modern, ACID-compliant RDBMSes use MVCC instead of locks. MVCC has a lot of upsides, namely, you almost never need table-level locking for writes, but it ensures you need to look at the whole table to do COUNT. That's because MVCC is basically multiple parallel universes for data. What you see depends critically on which transaction you're in when you look.

For example, suppose you have a process that needs to know the number of users in some user table. You start your transaction, and then another process starts a transaction that goes through and deletes certain users. While this is happening, you need the count. In MVCC, deletion doesn't delete, it just marks rows with the transaction ID range in which they're visible. The rows that are being deleted by the other process are just being marked dead as of that process's transaction ID, but your transaction ID is below that one, so they're not dead to you. Later on, when the youngest transaction alive is older than the oldest transaction that can see these rows, PostgreSQL will actually expunge the data (VACUUM does this).

Because of MVCC, there is more than one legitimate answer the question of COUNT; potentially, one for each transaction in progress. The answer is changing constantly. People tend not to think in these terms; we tend to think, well the table has so many rows in it, right? Why don't you just increment the count when you write one? It's not true, because with ACID compliance, uncommitted transactions should not have visible effects until they commit and transactions in progress should not see anything vary during their operation. So there really are multiple right answers at any given moment. The expectation that SELECT COUNT(*) will be fast is built on the assumption that the database has some sort of master count of what rows are in the table it can just glance at. But it doesn't, and to have one would require removing MVCC altogether and using locks instead.




I'm not sure I buy your argument. It is certainly true that with MVCC "there is more than one legitimate answer the question of COUNT", but that's true for any query, not just COUNT. Given that we can maintain transactional correctness for the data, it's not at all clear to me why the same mechanisms cannot be used to maintain transactional correctness for the metadata.


Yes, but we're not surprised when those other queries have to read the table. For some reason it's surprising with COUNT. I maintain this is not a technical problem so much as an intuition problem.

Explain how you're going to maintain this metadata in more depth. If it can be done in an MVCC manner (i.e. without introducing locks) then it should be explored. How's it going to work?


We can just maintain multiple versions of the row count.


It might be worth seeing what the cost of this would be in practice. I suspect if the answer were this easy the developers would have just done it by now, but you never know.


I agree. Given count(star) is a common need you'd think there would be multiple count(star) results in the metadata for the table, each with a generation associated with it.


It's a short, easy query to write. I'm not convinced it's a big need.

So, your proposal is basically this: make a metadata table. On every insert or delete from the table, update the metadata table accordingly. MVCC will make sure anybody reading from it will get the right answer. Is that about right?


I do not see how this would be implemented without introducing extra locking which would harm concurrency, which would remove some of the performance advantage of having MVCC.

Now I see it could be worth taking that penalty on specified tables, jsut like you do for indexes. And some databases do support this in a more general form: materialized views.


I think you could fake this proposal with triggers. It might be worth seeing what the additional cost is.


Proper solution (not fake with triggers) should be at most as costly as having partial index on the table.


^ tl;dr no, it's still slow

sorry, I couldn't help myself :-)

I understand that it's semi-hard problem given MVCC but I could keep track of counts myself for each set of conditions and each table that I need. It's as simple as keeping count as record in some table, incrementing it in on insert, decrementing in on delete and adjusting on update if row begins or stops to satisfy conditions I want to count by. The only grudge I have with PostrgreSQL is that when confronted with this problem PostgreSQL fans respond with "We've got MVCC so it is supposed to be slow. Stop thinking it's supposed to be fast." not with "Hmm... Maybe we should introduce new facility similar to indexes where you could define what you want to count and the db will count by this conditions and provide results for you fast."


> The only grudge I have with PostrgreSQL is that when confronted with this problem PostgreSQL fans respond with "We've got MVCC so it is supposed to be slow. Stop thinking it's supposed to be fast." not with "Hmm... Maybe we should introduce new facility similar to indexes where you could define what you want to count and the db will count by this conditions and provide results for you fast."

Plausible, but somewhat bothersome. It could be a special case of the materialized view problem, but it would take some convincing that to suggest that a special count-optimizing physical structure is worth the additional knob and maintenance burden it brings with it. Not impossible to convince, but it would require some tight argumentation with ample evidence.

In my quick assessment, the need for faster counts over lots of data is subsumed by materialized views, which unfortunately is a very tough feature to write, but has the advantage of generally considered being being worth it. A cut-down version of materialized views that just supports counting on various dimensions that adds more bulk to the planner as well as requiring execution maintenance and bug-fixing seems less-worth it. Somewhere in-between is a variant that supports more aggregate functions besides count(), particularly ones where one can define an inverse transition function (which in the case of count is "add one for every record").

There are many interesting features one could add -- and this is one of them -- but at the end of the day there is going to be an assessment by the long-time community members who have the privilege of fixing obscure bugs for years as to:

* Who is going to write it?

* How long will it take vs. other features?

* Is a feature is going to be maintained to a level of quality we find acceptable into the foreseeable future?

* Is the impact worthwhile?

* Could this in any way be done as an extension?

The level of the bar of quality has changed over time, too, pretty much inexorably moving up. Getting LISTEN/NOTIFY in now would be much harder than when it showed up in...Postgres95 (and, granted, it was pretty busted back then, apparently, along with...a lot of stuff). LISTEN/NOTIFY turned out to have a good impact/complexity ratio, so it gets good maintenance, but in a hypothetical world where it was not already committed with a good history of service and use I bet it would have required quite some convincing to add.

There is some old code in existence now that would not be accepted again today. A more sad example is dusty implementation of hash indexes that thankfully few people use or see reason to use. Hash indexes possess ample warnings in the manual to not use them. Yet, it doesn't seem reasonable to deprecate them for people that rely on them, and nobody seems to care enough to improve it, or even convincingly whine about improving it. The community is pretty sensitive to these code barnacles, so if one goes in wanting such a feature, one has to be well-prepared to argue both feasibility of implementation and size and duration of impact.


People ask "why others are using MySQL instead proper database like PostgreSQL?"

Some of the people try and get burned because features that MySQL did without anyone noticing now take so much time that one wonders if he triggered some strange bug that caused db to do full table scan where one is unnecessary. Then he goes to the internet and look for clues and he sees response "oh, it's how it supposed to work, nobody is working on because it does not seem like a problem" over and over.

I think that feature might have an impact.

Humble question from person who never tried to build a database engine:

Do you know why db can't just keep track of number of entries in a full or partial primary key indexes and use them to give count fast ?


> Some of the people try and get burned because features that MySQL did without anyone noticing now take so much time that one wonders if he triggered some strange bug that caused db to do full table scan where one is unnecessary. Then he goes to the internet and look for clues and he sees response "oh, it's how it supposed to work, nobody is working on because it does not seem like a problem" over and over.

Just about every non-trivial database implementation will have a pathology that others do not. In this case, InnoDB is was faster because it supported index-only scans, a feature that was very useful, but hard to implement in PG because of some details of its MVCC implementation -- now there is an implementation in 9.2 that gives it a shot at being in the same class of performance even on wide-row tables. However, both are still much slower than MyISAM: if you search for "why is count slow on innodb?" you'll get a lot of hits -- it suffers much the same defect vs. MyISAM (which has poor SMP concurrency, allowing it to count things as a luxury) that Postgres does.

Perhaps "not a problem" is a lazy answer for "it's hard to get exactly right, and nobody in the same class of implementation is bothering, and this has been discussed to death and is more work than you realize, so don't expect anyone to implement it soon unless you intend to argue it (unless you are also a long-term maintainer) and do it." As you can see...that was many more words, including giving background on two MySQL storage engines.

> Do you know why db can't just keep track of number of entries in a full or partial primary key indexes and use them to give count fast ?

It could. But you'd have to spec out a brand new top-level set of utility commands (like CREATE INDEX) to make this physical structure, it now is yet another knob that can drastically change performance characteristics, the planner gets to scan the projections and qualifiers for yet another little optimization, someone else gets to maintain it (unless you are maintainer), and then there's going to be the person that asks "why isn't is this snapshot-isolated version not even nearly as fast as caching the count in memcached?" (or, if one does it the other way, the reverse), and then finally when you get around to implementing a more general set of functionality you'll be stuck with this old form for basically eternity (depending on your release policy) much like hash indexes. Don't forget to add backup-and-restore support, manual pages, and EXPLAIN and/or other diagnostic output, if necessary.

On the plus side, it could end the endless howling about this problem.

Personally, I think we can get enough hooks in place that someone should be able to implement this secondary structure in a satisfying way without coupling it inextricably into the database, and that's a project I'd have enthusiasm for.

If someone else made this feature in another popular database that serves similar use cases and it got used a lot, then I think the understanding of the upsides would be such that such a feature is more likely to happen. Index-scans fall into that category, and a special structure not seen in other databases doesn't appear to at this time.


Thank you for all the information.

> On the plus side, it could end the endless howling about this problem.

:-) I think that would be also immense relief for the howlers.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: