Hacker News new | past | comments | ask | show | jobs | submit login
The Limits of Copy-On-write: How Ruby Allocates Memory (brandur.org)
86 points by izend on Aug 28, 2017 | hide | past | web | favorite | 10 comments

I wrote this one. I've been working with Ruby roughly six years, but didn't understand its memory internals at all beyond the surface level until I started doing the research for this one.

If there's one takeway/point of interest that I'd recommend looking at, it's the novel way that Ruby shares a pointer value between actual pointers to memory and special "immediate" values that simply occupy the pointer value itself [1]. For example, a fixnum (most integers) has its binary value shifted by one bit to the left, and then a "1" flag applied in its rightmost position so that Ruby can recognize it. It doesn't have to go to heap, and the runtime can take advantage of that for faster calculations.

[1] https://brandur.org/ruby-memory#value

> it's the novel way

This is usual in Lisp (compilers/implementations) and i wouldn't be surprised if it was invented on the seventies once large (i.e. 36-bit long) registers were available.

I've seen that pointer sharing thing in other languages in the past. Back in the 80's we did that in an app in MacOS when the memory pointers didn't require all the bits.

This is usually called a tagged pointer and is a common way to optimize highly dynamic languages: https://en.wikipedia.org/wiki/Tagged_pointer

(I just noticed that two of my articles are cited for that page. How neat!)

For more examples, Objective-C does this on 64-bit, and many Lisp implementations use it. Swift uses it for certain enum payloads.

An interesting variant is to stuff your pointers into the payload of an IEEE754 NaN value. This is popular in JavaScript, since JS numbers are IEEE754 doubles.

Nice article covering both tagged integers and pointers in NaN: https://nikic.github.io/2012/02/02/Pointer-magic-for-efficie...

Copy of Mozilla's article presenting their switch to NaN tagging: https://evilpie.github.io/sayrer-fatval-backup/cache.aspx.ht... HN discussion with more history & links: https://news.ycombinator.com/item?id=1569825

Fascinating — I had no idea the technique had a name (the Ruby source isn't very heavy on comments). I'll update the article to include it.

This technique is so common in dynamic language implementations that it even has hardware support in some architectures, such as the SPARC TADD instruction that adds two registers and sets a condition code if any of the lower bits were set.

Here is a common-lisp compiler using tagged pointers for some common objects. https://github.com/sbcl/sbcl/blob/master/doc/internals/objec...

That is how Lisps speedup list operations since it is the most used data structure.

Since most pointers in macOS are 4-byte aligned (guaranteed by malloc), you can use the low bits for flags without having to shift anything.

Of course Obj-C already uses this for tagged-pointer numbers/strings/etc, so you've got competition if you want to start using this for something.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact