Hacker News new | comments | ask | show | jobs | submit login
Speeding Up Ruby with Shared Strings (tenderlovemaking.com)
194 points by tenderlove 11 months ago | hide | past | web | favorite | 37 comments

The optimization of reusing the character array for a string was used in Java until roughly the Java 7 days, when it was dropped because it could cause a space leak: https://stackoverflow.com/questions/15612157/substring-metho....

One difference in Java was that you could slice out the middle of a string (you just kept the start and end of the array, iirc). I wonder if that difference makes this approach workable for Ruby, or if that implies that one team made a mistake.

Good question, I'll have to read up on it. It looks like it's possible to share out of the middle of a string with MRI, but it requires certain compiler options that aren't on by default.

Thanks for the comment! <3

Does this optimization mean we can dispense with Bootsnap? The magic Bootsnap does makes me fearful and it adds yet another moving part that can break :(

It’s not quite that simple. Java strings and ruby strings have quite different semantics because Java strings are immutable whilst Ruby strings are both mutable and can be mutated in place by C extensions. In Truffleruby we use ropes to represent strings which allow sharing of substrings and convert them to unique mutable ropes when they are accessed by C extensions, but that requires quite a lot of machinery to implement.

I always forget that about ruby strings.

However, I’m not sure I can think through how it changes the performance trade offs of sharing.

Well, if strings are immutable then you can point into an array in the subsrring and the worst that can happen is a space leak. If your strings are mutable then you need some extra book keeping so that attempts to copy the parent or child won’t affect the other. Whether you get a gain from sharing is going to depend on the length of your strings and what operations you perform on them.

If you are trying to reimplement an existing VM/standard library then you may also need to honour the expectations of developers. If they expect creating a substring to be an O(1) operation and code against that expectation then you probably need to honour it.

I wonder how Ruby string performance compares to Perl then. They seem to have the same constraints.

Aaron is my personal career idol. I love seeing his work come up on here, it gives me a view in to the deeper workings of ruby that are years beyond my expertise, but in a way that I can mostly grok what's going on. I'm super happy github hired him to hack around on ruby.

Also, his talks are excellent; his closing keynote for RailsConf in particular seems worth recommending, as it was about his work speeding up Rails using kind of similar string optimizations/caching:


Thank you so much. This really means a lot to me!

You mean a lot to me :] thanks for being a person in the ruby community that we all can aspire to be like.

Object allocations really are a big deal with Ruby. My Ruby compiler project has been really slow moving again over the last year, but as far as I can tell the current main hurdle is a basic garbage collector, as even adding various object caching, I actually exhaust the 2GB heap (the compiler was started when i386 was a reasonable initial architecture choice...)...

Granted that's worse that it'd be for MRI, as it includes allocating actual objects even for Fixnum and Symbol (MRI uses tagged values for both and just pretend they're "real" objects), though I added caching for both that cut hundred of thousands of allocations..

Is it documented anywhere what length of string can be embedded within an RString?

24. VALUE is 40 bytes and needs two 8-byte elements.

Looks like something that would benefit more from a string_view type abstraction.

So essentially string interning for Ruby?

There are significant differences. Depending on the case, it can be more or less effective.

1) If you intern all strings, then you're guaranteed to have one copy of "Hello World", which this doesn't (you could concatenate "Hello" and "World" twice).

2) String interning doesn't let similar strings share the same array. The Ruby trick lets you have

    "Hello World"
all share the same underlying array, whereas string interning gives you a separate array for each.

I think Substring in Swift is something similar?

Yeah it’s pretty common optimization. Some languages do this with array slicing.

Why couldn't you do both?

Kinda suprised when I see title like that on HN that not so many people are talking about Crystal https://crystal-lang.org. Basically Ruby with C performances.

So far I’ve been more convinced by Crystal than Elixir as the next language for a Ruby developer.

You don't? The joke of the Rust Evangelism Strike Force needs to be updated with the Crystal one, near as I can tell. Nearly every thread where Ruby's discussed has the Crystal Guy (specific identity of any given Crystal Guy up for debate) parachuting in.

Crystal is not the next language for this Ruby developer. Ruby remains it, along with JavaScript when Node makes sense. It often feels like the folks evangelizing Crystal really don't get why people use Ruby--if I need static typing I'm going to go use Kotlin or Rust or C++; looking like Ruby and not having any of the things that make Ruby valuable to me (namely, the features implied by dynamic typing, and to forestall the usual, no, Crystal macros don't replace it) is a demerit, not a positive.

> if I need static typing

And that's when the Static Typing Is The One True Way person parachutes in. :)

It is, but I won't try to sell it to Ruby devs. :P

But Crystal is a totally different language. The syntax and some things like the basic object model look similar, but the whole way you use it and the ecosystem are totally separate. I don't think it's 'basically Ruby' at all - I don't think much of the Ruby ecosystem (Rails for example) would translate onto Crystal.

Departing the Ruby ecosystem is a dealbreaker in many situations. Crystal is faster but you’re gonna spend more time reinventing the wheel and building common tools that are sitting on the shelf in Ruby land. Not picking a side here, just trying to point out why Ruby is still very popular.

Neither are "next languages," all three differ significantly in scope. For the western Ruby dev (meaning Rails people) there is a much clearer and more consistent path forward with Elixir/Phoenix.

After a couple of paid projects with Phoenix I'm not sure it's a path forward. More like a fork in the road to a different destination. I'll steal pattern matching from Elixir and keep using Ruby on Rails. Phoenix adds too much unnecessary complications for the average web app, starting with Ecto. Potentially great when building complex systems, but there are not many of them.

My suggestions are: simple to average web apps, stay with Ruby (Rails) or Python (Django); complex, look into Elixir (Phoenix.)

I never used Crystal so I can't say anything about it.

Would you care to elaborate on the complications you've experienced?

I'm curious because my personal experience has been very much the opposite. I've built nothing but small projects using Phoenix, and I've considered it to be a very straightforward companion whenever I've needed to add a web interface to my Elixir apps.

My experience is that it's not so much that Elixir and Phoenix are complicated, but rather that they're very different from what a Rails (or Rails-like) developer is used to.

There's not just Phoenix' deceptively Rails-ish approach, but also Elixir as a functional (and somewhat strange?) language, and while you can do without knowing much about OTP, it's not exactly invisible.

On top of that, I find that all the best packages embrace this weirdness. Ecto is wonderful, and by far my favorite 'ORM', but it's very different from the more common ORM's. splitting up the view layer in templates and views is, in my opinion, a step up from the Rails approach. But it took me a while to adjust. The sparing use of macros is also 'weird', but again I find it a step up from what came before.

I kind of feel the same way about Elixir/Phoenix as I do about functional programming in general: Whether it's actually as complicated as it is often said to be or not (perhaps just a matter of 'what we learned first'?), I find that once I got over the hurdle of learning the building blocks and mindset, the stuff I could do felt more fun, more expressive and less brittle, and the approach is starting to infect all the other coding I do.

At the same time I completely understand that if you're comfortable with Rails, and it does the job, or if you use a lot of gems that might not have a (mature) equivalent in Elixir, there's no good reason to bother unless you need channels, perhaps. I have the luxury of being able to use my own tech for many projects, as well as the time available to dive into the Elixir/Phoenix ecosystem.

That's a shame, I haven't done much with Elixir/Phoenix but it looked really promising. How frustrating that these things have to be so complicated.

> Kinda suprised when I see title like that on HN that not so many people are talking about Crystal

There is a dedicated contingent promoting Crystal in every thread about Ruby. Don't really know that we need any more (better, perhaps, but not more.)

> Basically Ruby with C performances.

Well, vaguely Ruby-ish syntax, without the dynamic features or ecosystem, and with a static type system.

> So far I’ve been more convinced by Crystal than Elixir as the next language for a Ruby developer.

Not sure what that has the do with the subject here, which has nothing to do with either Elixir or the next language for a Ruby developer.

Language (or tool) evangelism with no connection to be thread beyond offering an alternative to a language mentioned in the thread is basically a uniquely geeky form of threadcrapping, and its really not any better than any other threadcrapping.

I assume if all you're building are command-line apps, that might be true. Generally speaking though, I think most Ruby developers are smart enough to move where the market and ecosystem is, and are not constrained by syntax.

Go to Crystal if you just want to go faster.

Go to Elixir if you actually want to learn more, expand your mind, and are tired of bugs that can only occur in a mutable OOP language. And if you want to take advantage of the extra guarantees provided by a functional language, such as functional immutable datastructures making concurrency idiotproof.

There isn't much that is "mind-expanding" about Elixir coming from Ruby or really any other interesting, dynamically-typed language that I know. It's fine but it's nothing special; BEAM is a better argument for it than Elixir itself.

Maybe rein it in a bit.

Crystal is so strict about type coercions, it's actually way harder to use than you realize.

Seriously, try converting over a decently complex script to crystal from ruby, you'll run into complicated problems.

Elixir is a successful experiment. Crystal could have become a success if they decided to have 100% syntactic equivalence. As it is, it requires rewrites in even trivial console scripts.

This is what baffles me about it. It keeps being touted by some as a "faster Ruby", while they explicitly make decisions that makes it incompatible with Ruby even when it comes to relatively simple syntax.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact