Helix: Rust and Ruby, Without the Glue

twelvechairs · on May 15, 2016

Very excited to see the development of the Ruby/Rust space. The two languages together would seem a joy of a workflow from concept through to maintenance, and the wealth of important Ruby personalities currently involved in Rust (as well as others) encourages that that this will be a well trodden and well documented workflow sooner rather than later.

I think this could help Ruby regain some of its early excitement, and remove the negativity of its 'just a scripting language', 'only for rails, which is too slow for modern development anyway' image.

steveklabnik · on May 15, 2016

There's a _lot_ of stuff in this space: mrusty, ruru, Helix. It's exciting stuff.

eropple · on May 15, 2016

mrusty is super cool. My only beefs with it are around the difficulties of mruby (package availability, 1.9.3 limitations, generally being "off the beaten path"), not of mrusty itself, but it's such a fantastic way to quickly build a scripting layer that I fell in love with it almost immediately.

I would pay a decent chunk of money for a cleanly integrated MRI implementation inside of Rust, but I also will not be holding my breath for it.

xuejie · on May 16, 2016

Just out of curiosity: what limitations of Ruby 1.9.3 is critical in your case?

Not to start a flame war, but according to my experience, Ruby 1.9 is already quite good, later versions of Ruby only introduce minor syntax & semantic changes, which is trivial to work around. This is nothing like the big differences between 1.8 and 1.9

I agree with package availability problem of mruby, tho.

eropple · on May 17, 2016

Nothing is critical, I'm just so used to writing Ruby 2.3 at this point that 1.9 isn't as enjoyable. I wish it were better, I'm not saying it's bad.

rattray · on May 15, 2016

> With Rails reaching version 5.0, there are plenty of APIs that are heavily used and extremely feature-stable. My goal with Helix is to eventually make it possible to reimplement a lot of these APIs as an native extension gem.

Sounds extremely exciting!

fouc · on May 15, 2016

Great blog post.. First time I've been sold on Rust, sounds like it's got some great features. Maybe I just had to hear about it from a Ruby dev. :)

Typos in the article: "slimed down", "you code could"

Is "needle_length = needle.length" actually necessary? I thought repeated calls would be zero cost, but I'm guessing I'm wrong.

kibwen · on May 15, 2016

If Ruby is anything like Python, nonlocal name lookups in loops can indeed be performance bottlenecks.

pg_bot · on May 14, 2016

I would be interested to see how Ruby's Set class performs in the Zesty example. If that is actually a bottleneck in the application, I would rather reach for something that is in the Standard Library instead of going fully native.

halostatue · on May 15, 2016

Ruby’s set class is implemented in Ruby wrapping a Hash (`{ key => true }`), not in C (or Java or…). It’s fairly good when you’re testing for containment, but the implementation is probably going to be as bad or worse for the #fully_contains behaviour…except the requirement that Zesty’s arrays be sorted.

Conceptually, a Set will do much better even with this sort of optimization.

bjz_ · on May 15, 2016

Interesting side-note: Rust's `HashSet<T>` is actually implemented by wrapping a `HashMap<T, ()>`. But because `()` is zero-sized (unlike a boolean), Rust can optimise a ton of stuff out.

MichaelGG · on May 15, 2016

But the Rust version of the blank check doesn't handle any encoding but UTF-8. Helix could wrap up a char iterator for it I suppose, one that calls rb_enc_codepoint_len?

And isn't there some common C lib that exposes Unicode functions like is_whitespace? Granted, using a cargo crate is easier than finding and adding a .h, and far easier than getting and linking another lib.

wycats · on May 15, 2016

The Rust version does a type coercion from Ruby VALUE to Rust String. The type coercions are defined generically using Rust traits (see the Helix README) so once somebody defines RubyString -> String once everyone benefits.

In this case, the coercion needs to ask Ruby for the encoding tag and ask Ruby to validate the encoding (which is does often enough that it's often cached) but after that we can safely coerce directly into a UTF8 string.

If we wanted to support other encodings, we could fall back to using Ruby's transcoding support (string.encode("UTF8")) and again, once someone does the work once it'll work for all helix users.

MichaelGG · on May 15, 2016

I was just pointing out that the C and Rust versions provided weren't quite equivalent.

chancancode · on May 15, 2016

You are definitely correct, this is definitely a bug.

Helix is setup to do the right thing – it already goes through a coercion protocol, we can easily add the encoding check there. We just missed that detail when porting the code, will fix it soon.

I suppose that echoes my point about how system programming in is hard to get right, there are just too many details you have to remember!

This is why having a shared solution like Helix is beneficial. By moving all the unsafe code into a common library, it's more likely that someone will notice the problem and fix it for everyone.

This actually touches on an interesting point I would like to elaborate on. When we say {Helix/Rust/Ruby} is safe, there is an important caveat – {Helix/Rust/Ruby} themselves could of course have bugs. I have definitely experienced segfaults on Ruby myself.

While true, this caveat is not particularly interesting. It is not a slight of hand. Moving code around doesn't magically remove human errors, that's not the point. It's about establishing clear boundaries for responsibility. (This is why unsafe blocks in Rust is great.)

When you get a segfault on Ruby, you know for certain that your code is not the problem. Sure, you might be something weird, but it is part of the contract that the VM is not supposed to crash no matter what you do. As a result, memory safety is just not a thing you have to constantly worry about when programming in Ruby.

It is the same thing as saying JavaScript code on a website "cannot" crash the browser, segfaults in user-space code "cannot" cause a kernel panic or malicious code "cannot" fry your chip. All of these could of course (and do) happen – but from the programmer's perspective, you can work with the assumption that they are not going to happen (and when they do, it's someone else's fault). It's not "cannot" in the "mathematically proven" sense, but it's just a useful abstraction boundary.

steveklabnik · on May 15, 2016

Akira Matsuda actually suggested at RailsConf that maybe Rails handling non-UTF-8 encodings was not necessary, and maybe phasing it out was a good idea.

I wasn't present for the talk, just saw his slides.

nateberkopec · on May 15, 2016

Akira was talking about a specific context - view rendering. Which makes sense, who the hell ever renders a view in anything other than UTF-8?

Checking input, however, is a whole 'nother ballgame.

steveklabnik · on May 15, 2016

He was talking about the view layer, that's true. Even then though, your source is likely to be in UTF-8, and Rails' form helpers add

  <form accept-charset="UTF-8">

so these days, the non-UTF-8 usage in Rails apps should be pretty tiny, I would think? It'd be stuff coming from outside of forms.

aidenn0 · on May 15, 2016

The Rust version works on strings not on bytes. Strings don't have encodings.

hetman · on May 15, 2016

What do you mean? All Rust strings are UTF-8 encoded, and all Ruby strings have an associated encoding.

steveklabnik · on May 15, 2016

All rust String and &strs are UTF-8 encoded, there are also other string types.

x5n1 · on May 15, 2016

huh? strings have encodings. rust strings are bytes encoded in utf-8.

https://doc.rust-lang.org/book/strings.html

aidenn0 · on May 15, 2016

In Rust a string is a sequence of unicode scalar values. I personally find it unfortunate that they dictate the storage of it at the API level, but that is a necessary evil for presenting a consistent ABI with foreign code.

I did not know that strings in Ruby have encodings. Is there a reason for that? I personally don't like mixing characters and opaque byte sequences as they are very different.

burntsushi · on May 15, 2016

> In Rust a string is a sequence of unicode scalar values.

The representation of a Rust String in memory is guaranteed valid UTF-8. To me, a "sequence of Unicode scalar values" is an abstract description, because it could be implemented via UTF-8, UTF-16 or UTF-32.

> I personally find it unfortunate that they dictate the storage of it at the API level

It is extraordinarily convenient and provides a very transparent way to analyze the performance of string operations.

For transcoding, there is the in-progress `encoding` crate: https://github.com/lifthrasiir/rust-encoding

I note that Go does things very similarly (`string` is conventionally UTF-8) and it works famously for them. They have a much more mature set of encoding libraries, but they work the same as the equivalent libraries would work in Rust: transcode to and from UTF-8 at the boundaries. See: https://godoc.org/golang.org/x/text

MichaelGG · on May 15, 2016

Ruby's Japanese heritage is probably why it handles encodings like that - I think there were multiple encs it had to deal with at once or something. Also Unicode doesn't completely handle all kanji in that there's some that have an old style not available in Unicode. But maybe that's not relevant.

aidenn0 · on May 15, 2016

Unicode now handles all the Kanji in JIS. I wouldn't be surprised if Ruby predated that. It almost certainly predates good library support for all the Kanji in JIS.

GolDDranks · on May 15, 2016

I think the problem isn't whether it handles all the Kanji in JIS – it does. But the problem is that JIS at the time was so common that it didn't necessarily make sense to settle exclusively for then-less-used UTF-8. That would make re-encodings necessary at interfaces and on IO.

steveklabnik · on May 15, 2016

Ruby encoding stuff changed a lot over its history; it was one of the big changes from 1.8 to 1.9.

twelvechairs · on May 15, 2016

Its a better way of doing things - you can handle things in their native format rather than have to arbitrarily convert to UTF8 (which is an 'encoding' itself).

[edit] I remember a talk where Matz was asked this specific question and tried to explain it clearly but seemed confused as to how the questioner could have such a poor grasp of unicode (the difference between monolingual americans and japanese i guess)

kibwen · on May 15, 2016

String is just a typedef for Vec<u8> with some extra convenience functions for working with UTF-8. There's nothing stopping anyone from just using Vec<u8> to handle non-UTF-8 data in their native format, nor stopping anyone from writing convenience types like String for other encodings.

twelvechairs · on May 15, 2016

Yeah right so Ruby effectively has just made a bunch of these (and done the hard work for you of defining how to convert between them and work with them all in similar ways), and the higher-level class which includes UTF8 and a whole bunch of others is called 'String'. Its really what you want from a high-level language - to just work with different encodings out of the box, but not have to convert to a standard interal type (like UTF8) to do so.

wtetzner · on May 15, 2016

Well, it's hard to say, really. It depends on what you're doing. The benefit of converting to UTF-8 when making a string from bytes is that string operations have predictable performance, and strings have predictable memory-usage. But of course, you then have to pay the cost of converting to UTF-8.

On the other hand, if you just track the encoding in your string type, then you don't have to pay a conversion cost at the boundary, but each encoding will have different memory-usage and performance characteristics.

atombender · on May 15, 2016

The reason is that Ruby supports non-Unicode encodings that are not subsets of Unicode. Not possible if your string is Unicode.

MichaelGG · on May 15, 2016

Right, so how do you get from Ruby strings (various encodings) to a Rust string? The sample code just calls std::str::from_utf8_unchecked(s) which is obviously not dealing with Ruby encodings.

aidenn0 · on May 15, 2016

Yeah, that's a clear bug. I was not aware that Ruby strings had encodings.

x5n1 · on May 15, 2016

All strings have encodings. It is not possible to represent a string which is a series of bytes except with encodings. I guess you probably mean default encoding or no encoding support... which implies ASCII, better known as US-ASCII.

aidenn0 · on May 16, 2016

In the high-level language I am most familiar with (Common Lisp), strings do not have encodings because they are vectors of characters, not vectors of bytes. How the string is actually stored in memory is an implementation detail.

Encoding is purely an artifact of I/O if your language has a character type that can represent all possible characters you might want read or write.

Rust's strings are almost this; if there were no way to get a string's raw representation, nor perform bytewise slices, then how the string was stored in RAM would be an implementation detail rather than part of the public API. Rust, being a systems language, probably does need to specify this so that it doesn't incur encode/decode overhead when dealing with foreign code that can understand utf-8.

ksec · on May 15, 2016

So it is Rusted Rails? XD

Anyway I think the same could be applied to Ruby Core as well. As I have been calling a Rusted Ruby for a long time.

Though I am not sure if this is a good thing for other Ruby implementation like JRuby.

shrugger · on May 15, 2016

I think this is a GREAT thing for other implementations like JRuby. Rusty Ruby would be able to learn from some of the implementation tricks that impls like Jruby and rbx have learned along the way, and maybe those guys will learn something from the Rusty Ruby crew along the way also.

Rust as far I know (not at all, honestly) doesn't interface with Java that well, so for JVM projects, it's nice to have a familiar language like Ruby that can tie in and make prototyping way easier.

I think people misunderstand how awesome Jruby is, because Ruby has never performed that well and Jruby has performed very well. Performance is not the only reason to both implementing a language. There's also the issue of mindshare, where a large group of people might already know Ruby, and it'd be easier in those scenarios to just give them Jruby and let them go to town than try and drag them through learning Java.

Rusty Ruby will do the same thing for Rust, I think. Rust is a very intricate, well-thought-out language, and I think it would benefit a lot from playing off the shared knowledge of thousands of Ruby users.

carlosft · on May 17, 2016

What are the dependencies for installing the gem? I assume the machine will need rustc/cargo to compile the gem.

steveklabnik · on May 17, 2016

I am not super specifically knowledgeable about these details, but Rubygems lets you upload binaries to make this not needed generaly.

solipsism · on May 15, 2016

Helix is the name Perforce is using to rebrand their set of tools. Just an FYI, I'm not saying that means this should not be called Helix.

poizan42 · on May 15, 2016

The amount of name reuse in software in recent years really frustrates. It makes it so much harder to find the information you are looking for.

Besides that I really don't get why perforce are renaming their version control system yo Helix. Perforce is a well-known name, it seems weird to just drop a strong brand.

bigfcjjyfcg · on May 15, 2016

This is fucking cool. You have convinced me to pick up Rust.

hobs · on May 15, 2016

Isnt a set containment problem a really simple problem to solve with a database, eg an inner join and a count?

bjz_ · on May 15, 2016

For something that is in memory and throw-away, why go through a DB?

hobs · on May 15, 2016

I was taking the point of "Why is the problem taking so long when it seems like a simple thing to calculate?" which was a little orthonogal from theme of the total post, so I can see why there would be some confusion.

The article talked about 30 minutes to run through a "is this set of items in this other set of a bunch of items".

While it would probably make sense to do this efficiently in memory the reality is the company mentioned are probably doing it inefficiently in memory; asking some tool which can do this type of work effectively and is fairly well known was my basic suggestion.

placeybordeaux · on May 15, 2016

Why hit the network/a central resource when you could just do it locally?

fouc · on May 15, 2016

It's a webapp, they're likely already querying all the meal information from a mysql or postgresql db. Chances are they could have made a change to their db models and written some sql to handle food requirements checking instead of handling it after the query. But that could've been premature optimization ultimately, and to speed it up now, it's probably easier to optimize the slow ruby code with some rust instead.

hobs · on May 15, 2016

Exactly, I dont mean to beat a dead horse, but most bog standard web apps have the quoted items in a db anyway, and these are very common problems.

For example if we have a few tables we can determine which menu items a user could eat:

  users
  user_ingredient_exclusions
  ingredients
  menu_ingredients
  menu

  -- items a user can eat based on not having any items in the excluded   list
  select distinct m.*
  from menu m
  inner join menu_ingredients mi on m.id = mi.menu_id
  left join user_ingredient_exclusions e on e.ingredient_id = mi.ingredient_id
  inner join user u on u.id = e.user_id
  where e.id is null
  and u.id = @id

drewbug · on May 15, 2016

Sometimes, it'll be faster.

placeybordeaux · on May 16, 2016

Unless you specify some asymmetry it will always be faster to do it locally.

jilljennV · on May 15, 2016

May I say? PRAISE HELIX.

accbcc · on May 15, 2016

Rust has no its ecosystem, just pyramid selling. Swift will beat it!

desireco42 · on May 15, 2016

:) I wish I understood this jab. Can you explain me.

accbcc · on May 14, 2016

I like Swift. Rust syntax make me headache.

Rotten194 · on May 15, 2016

I like apples. Oranges look weird.

pzh · on May 15, 2016

Hmm. Kind of makes you wonder why you're using Ruby at all, which is slow as hell. It appears that once they port all Rails libraries to Rust, it may not take that much effort to create Rust on Rails and get rid of Ruby.

rattray · on May 15, 2016

Ruby is phenomenally easy to write. This post essentially describes a way to make Ruby less slow, which would reduce the incentive to leave the language.

Once they port all Rails libraries to Rust, you can still use the extremely-ergonomic Ruby programming language, but it won't be "slow as hell" anymore. And ruby performance is "tolerable in most circumstances" as-is.

I certainly agree that it could be awesome to have Rust on Rails for those who would find even Rails on Rust performance intolerable, and Iron's abstractions inadequate.

brightball · on May 15, 2016

This is the exact reason I'm headlong into Elixir and Phoenix right now. It's like the best of both worlds.

donpdonp · on May 15, 2016

I find a great answer in this area is Crystal. I consider it 'go in ruby clothing'. Like go, Crystal is a typed language and outputs binary executables. They kept the beautiful ruby syntax and made better internals. http://crystal-lang.org/

tmikaeld · on May 15, 2016

I believe it is weakly typed, so type-checked on compile time.

tmikaeld · on May 16, 2016

Why downvote this and not comment as to why?

It's their own words.

technion · on May 15, 2016

Plenty of gems use native C to provide best possible performance. This doesn't mean everyone is planning to rewrite their Ruby applications in C.

Rust provides a safer option - which is worthwhile. There have been several examples of gems with memory issues. It doesn't change anything, unless this is a general "Ruby is too slow" sort of rant.

eropple · on May 15, 2016

It does no such thing as make me wonder why I'm using Ruby. I can and do write Rust, and I enjoy doing so. But I write Ruby more quickly and can more effectively build tools for both my own consumption and that of others, using techniques that I can't use in Rust. Being able to use Rust or something that isn't an unpleasant minefield (lookin' at you, C) where I need it for performance is extremely valuable, but there's no reason to throw out the beneficial, powerful aspects of Ruby to do it.

danso · on May 15, 2016

How many platforms and frameworks have been completely ported over to a better language? There are many languages that have arguably succeeded PHP, and yet WordPress and Facebook remain on PHP, which would belie your assumption that "it may not take that much effort to [re]create [some framework] and get rid of [that framework's original language]"