Hacker News new | past | comments | ask | show | jobs | submit login

You do need access to an index/DB of all files in a bucket in order to delete them in parallel. Otherwise you're stuck paginating with the B2 API.

You need a DB of all of the dead entries that need to be deleted, and that’s a fine thing to have.

There are lots of problem spaces where deletion is expensive and so is time shifted not to align with peak system load. Some sort of reaper goes around tidying up as it can.

But I think by far my favorite variant is amortizing deletes across creates. Every call to create a new record pays the cost of deleting N records (if N are available). This keeps you from exhausting your resource, but also keeps read operations fast. And the average and minimum create time is more representative of the actual costs incurred.

Variants of this show up in real-time systems.

My case was really simple. I was done with my ML pipeline and nuked the database, but pics in B2 remained with no quick way to get rid of them and/or to stop the recurring credit card charges.

IMO an "Empty" button should have been implemented by Backblaze.

Would this technique have been faster?

A single pass: paginating through all entries in the bucket without deletion, just to build up your index of files. And then using that index to delete objects in parallel.

I believe S3 is the same way.

S3 has an "Empty bucket" button, unlike B2.

Disclaimer: I work at Backblaze.

> no way to empty a bucket.

Backblaze currently recommends you do this by writing a “Lifecycle rule” to hide/delete all files in the bucket, then let Backblaze empty the bucket for you on the server side in 24 hours: https://www.backblaze.com/b2/docs/lifecycle_rules.html

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact