Hacker News new | past | comments | ask | show | jobs | submit login
How did I convert the 33 GB Dataset into a 3 GB file Using Pandas? (medium.com/aatomz-research)
7 points by Liriel on Sept 20, 2022 | hide | past | favorite | 4 comments



This is err.. Interesting. There are a couple of issues and I think it would be good if the author interrogated some of the tradeoffs made here.

I'm not sure the GC thing is correct, and certainly won't work in an interactive shell, you're collecting the garbage when there's still a reference to chunk so it won't free that memory, but (AFAIK) since you're in a loop, the next assignment to chunk will remove the previous assignment freeing the memory automatically anyway. So (I could be wrong about this) it works, but not for the reason you think.

Also,

> So we have to convert it to ‘float16’ or ‘float32’ to minimise memory usage.

Well, I mean, technically float16 does use less space, yes. But it also does a few other things you might want to mention - like reducing the maximum range of your values to +/-65k.


Yeah I found it really odd that they didn't mention that rather important trade-off.


Trade-off ? What trd-of ?


You lose precision.

This stackoverflow answer gives a straightforward example:

https://stackoverflow.com/a/64069641/1018861




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: