Hacker News new | comments | show | ask | jobs | submit login

This seems like a lot of unsafe work just to avoid a temporary copy of the string's byte encoding. Isn't the garbage collector tuned for exactly that sort of short-lived data?

You are right, it would go into the first generation and will be collected pretty quickly unless it is more than 85KB which many strings that you want to compress are. If you go into that string length you get into the Large Object Heap and that is a disaster. Unsafe work should not be scary. If you check out the code of GetByteCount for UTF8 encoding (as an example) you will see that Microsoft did the same thing.

I think this article is great. However, I think you should spend a little more time (like your last comment) describing why this is important; make it obvious why people should appreciate your work. I, for one, wish I was better at memory management since I spend most of my time in managed C# (and other languages with memory management). For your next post, spend a little more time spelling out the details for why each step is important: what is gained by not copying _this_ memory? Compare the memory usage profile of the managed vs. unmanaged algorithms. Explain the constraints of the GC (and perhaps some history). Many people assume the GC will always do a better job at managing memory than they can, so explain the corner cases for when that's not true.

tkelogg, I am realy happy you found the post interesting and informative. Sometimes one forget to explain things that may seem obvious and one thinks that the code speaks for itself when it doesn't. I'll keep your advice in mind when I'll write my next post. In the mean time I have edited the post and added some additional explanations. Thank You :)

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact