Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why sampling a database is not instantaneous?
1 point by whatever1 6 months ago | hide | past | favorite | 4 comments
The other day I was trying to just sample some outputs from a (admittedly big)database, so I executed select * from database limit 10.

It took more than a minute to execute this (not in queue). The data engineer claims that my query is a bad one since the db is a big one and I did not include a where clause.

But are the db engines that naive?




Which database?

In postgres, the limit query can sometimes result in poor performance if the table statistics are not kept up to date. You can run `ANALYZE tablename` to refresh the stats manually, but the autovacuum process should do this periodically if configured correctly.


Redshift with around 5TB of data (each row has ~ 30 columns)


What exactly do you mean when you say "FROM database"?

If a "database" is a single table I would be surprised, however, if "database" is a view then potentially there is a lot that needs to go on under the hood to determine which 10 rows should be returned - particularly if something is doing a group by and cannot constrain it through any where constraints.


What type of database was it?

On MySQL that would normally be quite fast. SQL Server (which I'm not very familiar with) doesn't support LIMIT, but may not throw an error and may just ignore it... Can anyone correct me?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: