Hacker News new | past | comments | ask | show | jobs | submit login

I thought a lot of optimizers used tabled cardinalities to estimate the cost of various join algorithms.



They do, but joins still cause unpredictable data scan jumps. Even if perfectly optimized.

Think of an analytic query over a star schema, maybe the query needs 10 or 20 joins to align all the data for each of the fact tables, and scans most of the data (some aggregate report over the last year of data), but these joins don't reduce the cardinality much.

If you look at the data access pattern, it's scattered all over the place for each joined row. You end up needing to read in a lot of data, possibly many times to complete the query.


Pretty well all modern commercial DBs do use summary (or sampled?) statistics of the cardinalities. The alternative is rule-based rewriters which were heuristically-driven only, and AFAIK they justifiably went out of use decades ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: