
Speeding Up Ruby with Shared Strings - tenderlove
https://tenderlovemaking.com/2018/02/12/speeding-up-ruby-with-shared-strings.html
======
hyperpape
The optimization of reusing the character array for a string was used in Java
until roughly the Java 7 days, when it was dropped because it could cause a
space leak: [https://stackoverflow.com/questions/15612157/substring-
metho...](https://stackoverflow.com/questions/15612157/substring-method-in-
string-class-causes-memory-leak#15612188).

One difference in Java was that you could slice out the middle of a string
(you just kept the start and end of the array, iirc). I wonder if that
difference makes this approach workable for Ruby, or if that implies that one
team made a mistake.

~~~
aardvark179
It’s not quite that simple. Java strings and ruby strings have quite different
semantics because Java strings are immutable whilst Ruby strings are both
mutable and can be mutated in place by C extensions. In Truffleruby we use
ropes to represent strings which allow sharing of substrings and convert them
to unique mutable ropes when they are accessed by C extensions, but that
requires quite a lot of machinery to implement.

~~~
hyperpape
I always forget that about ruby strings.

However, I’m not sure I can think through how it changes the performance trade
offs of sharing.

~~~
aardvark179
Well, if strings are immutable then you can point into an array in the
subsrring and the worst that can happen is a space leak. If your strings are
mutable then you need some extra book keeping so that attempts to copy the
parent or child won’t affect the other. Whether you get a gain from sharing is
going to depend on the length of your strings and what operations you perform
on them.

If you are trying to reimplement an existing VM/standard library then you may
also need to honour the expectations of developers. If they expect creating a
substring to be an O(1) operation and code against that expectation then you
probably need to honour it.

------
mmanfrin
Aaron is my personal career idol. I love seeing his work come up on here, it
gives me a view in to the deeper workings of ruby that are years beyond my
expertise, but in a way that I can mostly grok what's going on. I'm super
happy github hired him to hack around on ruby.

Also, his talks are excellent; his closing keynote for RailsConf in particular
seems worth recommending, as it was about his work speeding up Rails using
kind of similar string optimizations/caching:

[https://www.youtube.com/watch?v=BTTygyxuGj8](https://www.youtube.com/watch?v=BTTygyxuGj8)

~~~
tenderlove
Thank you so much. This really means a lot to me!

~~~
mmanfrin
You mean a lot to me :] thanks for being a person in the ruby community that
we all can aspire to be like.

------
vidarh
Object allocations really are a big deal with Ruby. My Ruby compiler project
has been really slow moving again over the last year, but as far as I can tell
the current main hurdle is a basic garbage collector, as even adding various
object caching, I actually exhaust the 2GB heap (the compiler was started when
i386 was a reasonable initial architecture choice...)...

Granted that's worse that it'd be for MRI, as it includes allocating actual
objects even for Fixnum and Symbol (MRI uses tagged values for both and just
pretend they're "real" objects), though I added caching for both that cut
hundred of thousands of allocations..

------
heartbreak
Is it documented anywhere what length of string can be embedded within an
RString?

~~~
mperham
24\. VALUE is 40 bytes and needs two 8-byte elements.

------
banachtarski
Looks like something that would benefit more from a string_view type
abstraction.

------
PricelessValue
So essentially string interning for Ruby?

~~~
hyperpape
There are significant differences. Depending on the case, it can be more or
less effective.

1) If you intern all strings, then you're guaranteed to have one copy of
"Hello World", which this doesn't (you could concatenate "Hello" and "World"
twice).

2) String interning doesn't let similar strings share the same array. The Ruby
trick lets you have

    
    
        "Hello World"
        "World"
        "orld"
        "rld"
        "ld"
        "d"
    

all share the same underlying array, whereas string interning gives you a
separate array for each.

~~~
rimliu
I think Substring in Swift is something similar?

~~~
_sdegutis
Yeah it’s pretty common optimization. Some languages do this with array
slicing.

------
raitom
Kinda suprised when I see title like that on HN that not so many people are
talking about Crystal [https://crystal-lang.org](https://crystal-lang.org).
Basically Ruby with C performances.

So far I’ve been more convinced by Crystal than Elixir as the next language
for a Ruby developer.

~~~
eropple
You don't? The joke of the Rust Evangelism Strike Force needs to be updated
with the Crystal one, near as I can tell. Nearly every thread where Ruby's
discussed has the Crystal Guy (specific identity of any given Crystal Guy up
for debate) parachuting in.

Crystal is not the next language for this Ruby developer. Ruby remains it,
along with JavaScript when Node makes sense. It often feels like the folks
evangelizing Crystal really don't get why people use Ruby--if I need static
typing I'm going to go use Kotlin or Rust or C++; looking like Ruby and not
having any of the things that make Ruby valuable to me (namely, the features
implied by _dynamic typing_ , and to forestall the usual, no, Crystal macros
don't replace it) is a demerit, not a positive.

~~~
hnzix
_> if I need static typing_

And that's when the Static Typing Is The One True Way person parachutes in. :)

~~~
carlmr
It is, but I won't try to sell it to Ruby devs. :P

