More

rolae · 2023-09-09T07:55:12

People say the title of the article "Ruby Outperforms C: Breaking the Catch-22" is misleading, which is true, this is about Ruby code optimized by JIT outperforming a extension written in C.

But to give some context: the author Aaron Patterson is a Ruby and a Rails core team member. The article and headline is clearly targeting the ruby community, where this article has been very well received. I think it's a good title for the intended audience.

The post clarifies in the first section:

> In this post I’d like to present one data point in favor of maintaining a pure Ruby codebase, and then discuss some challenges and downsides of writing native extensions. Finally we’ll look at YJIT optimizations and why they don’t work as well with native code in the mix.

edit: added original title of the hackernews post / article

wonnage · 2023-09-09T08:18:09

This is specifically about breaking the myth that performing expensive self-contained operations (e.g, parsing GraphQL) in a native extension (C, Rust, etc.) is always faster than the interpreted language.

The JS ecosystem has the same problem, people think rewriting everything in Rust will be a magic fix. In practice, there's always the problem highlighted in the post (transitioning is expensive, causes optimization bailouts), as well as the cost of actually getting the results back into Node-land. This is why SWC abandoned the JS API for writing plugins - constantly bouncing back and forth while traversing AST nodes was even slower than Babel (e.g https://github.com/swc-project/swc/issues/1392#issuecomment-...)

mananaysiempre · 2023-09-09T09:50:48

Interesting that both of the points you state completely contradict my experience with LuaJIT.

Parsing has always been one of the things its tracing JIT struggled with; it is still faster than the (already fairly fast) interpreter, but in this kind of branch- and allocation-heavy code it gets nowhere near the famed 1.25x to 1.5x of GCC (or so) that you can get by carefully tailoring inner-loopy code.

(But a tracing JIT like LuaJIT is a different from a BBV JIT like YJIT, even if I haven’t yet grokked the latter.)

LuaJIT’s FFI calls, on the other hand, are very very fast. They are still slower than not going through the boundary at all, naturally, but that’s about it. On the other hand, going through the Lua/C API inherited from the original, interpreted implementation—which sounds similar to what the Ruby blog post is comparing pure-Ruby code to—can be quite slow.

The SWC situation I can’t understand quickly, but apart from the WASM overhead it sounds to me like they have a syntax tree that the JS plugin side really wants to be GCed in the GC’s memory but the Rust-on-WASM host side really wants to be refcounted in WASM memory, and that is indeed not a good situation to be in. It took a decade or more for DOM manipulation in JS to not suck, and there the native-code side was operating with deep (and unsafe) hooks into the VM and GC infrastructure as opposed to the WASM straitjacket. Hopefully it’ll become easier when the WASM GC proposal finally materializes and people figure out how to make Rust target it.

In any case, it annoys me how hard it is in just about any low-level language to cheaply integrate with a GC. Getting a stack map out of a compiler in order to know where the references to GC-land are and when they are alive is like pulling teeth. I don’t think it should be that way.

aardvark179 · 2023-09-09T10:07:50

There is a very big difference between a simple FFI system and the sort of C interface offered by Ruby and Node. Those interfaces allow objects to be passed to the native code, and the native can then do pretty much anything to the language run state. This is great if you want a C library that can do anything your higher level language could do, but it also means the JIT has to treat all those calls as impenetrable barriers that cannot be optimised through, so even a small C call can prevent the rest of your application from being optimised.

We got round this in TruffleRuby by running C extensions through an LLVM Bitcode interpreter that was part of the same framework as the Ruby interpreter and allowed them to be JITted together, but that had other downsides, and wasn’t great for things like parsers which had huge switch statements.

mike_hearn · 2023-09-09T10:28:47

Yes but in this case the TruffleRuby approach would fix the Shopify issue, I think? And if by downside you mean longer warmup times that's an issue for YJIT or any other JIT too, so how much of a downside it is depends a lot on the nature of the deployment.

byroot · 2023-09-09T12:10:23

Shopify bigger repos are deployed pretty much every 30 minutes. As you point out most JITs struggle in these conditions.

But YJIT warms up extremely fast, and is able to provide real world speedup to these services almost immediately.

vidarh · 2023-09-09T11:28:46

I think a parser is perhaps the example of where resorting to a compiled extension can be beaten by something more JIT-favourable.

In a language like Ruby it tends to be heavily dominated by scanning text, and creation of objects, and 1) you can often speed it up drastically by reducing object creation. E.g. here is Aaron writing about speeding up the GraphQL parser partly by doing that[1], 2) creating Ruby objects and building up complex structures in the C extension is going to be almost exactly as slow in the C extension as in the Ruby, 3) the scanning of the text mostly hits the regexp engine which is already written in C.

(That said, I heavily favour not resorting to C-extensions unless you really have to; even without going as far as some of Aaron's more esoteric tricks for that parser you can often get a whole lot closer that you think, and the portion you need to rewrite if you still have to might well turn out to be much smaller than you'd expect)

[1] https://tenderlovemaking.com/2023/09/02/fast-tokenizers-with...

danmur · 2023-09-09T10:03:23

I think 'magic fix' could be replaced with 'fun thing to do'

yebyen · 2023-09-09T12:51:33

I spent a while rewriting a tiny bit of some useful Ruby in Rust and integrating it via Wasmer, and I definitely think my time expenditure is better classified as "fun thing to do" than "magic fix".

https://ossna2023.sched.com/event/1K55z/exotic-runtime-targe...

https://www.youtube.com/watch?v=EsAuJmHYWgI

My goal was to determine: how will I use Wasm as a Rubyist? Spoiler: I did not intend to use Rust, but embedding Ruby in a Wasm and running it from within Ruby proved to be a fool's exercise. I have a feeling that some of the claims I made in this talk about there being "no theoretical benefit" to running Ruby in Wasm in Ruby will quickly be proven incorrect. But I'm not expecting it to be faster.

If it's ever faster, that will be filed under "surprising results"

Michael Yuan who obviously knows a lot more about Wasm than I do addressed this in his talk as well, but not from the perspective of a JIT necessarily, although I guess you could consider local machine compile target optimization on the target machine at runtime as a type of "just in time optimization" it's not really (it's just the regular ahead-of-time type of optimization, but not fumbled and the majority of benefits immediately getting lost in such a way as it usually is...)

https://www.youtube.com/watch?v=kOvoBEg4-N4

The spoiler from that talk (for me anyway) was finding out that you do see surprising results sometimes, and sometimes it's due to a pathological case that (a) happens all the time, and (b) does not have a readily obvious solution in the form the problem regularly takes. Like linux distros shipping generic binaries "target-optimized" for the lowest common denominator on any given particular architecture dist target, because the binaries they ship obviously have to run anywhere.

(Recap of the conversation we had off-stage after the Q&A ended: We don't know when we say "x86-64" what modern opcode targets that means we really have available, so we have to ship a binary with only the oldest opcodes that are guaranteed to be available on any similar chip that we intend to support. Web Assembly uses a compiler on the target machine, cranelift, to translate from a platform independent binary to a platform-specific one. Running the compiler takes a while, but we can do it ahead of time. Will we come out ahead in the end?)

All of this stuff is certainly very fun to reason about :D

ksec · 2023-09-10T03:52:43

>to present one data point in favor of maintaining a pure Ruby codebase

Chris Seaton has been stating this for over 5 years. It is unfortunate this mental model has never caught on in Rails.

chucke · 2023-09-10T11:03:15

Evan Phoenix is saying it for even longer, and created rubinius as a means to prove it :) (it has since been abandoned, but Chris's truffleruby ported rubinius core and stdlib implementations, so it's great that they've all been feeding each other for quite a while)

ksec · 2023-09-10T12:30:04

Oh yes. It is unfortunate even when something is "right" or "correct" it doesn't means the world will move in that direction.

Hopefully now Shopify has enough resources to push this through.

rolae · 2023-07-11T15:29:18

Not necessarily. Something being too fast can be confusing. If you expect a process to take some time and it ends immediately, it can feel like it failed.

I remember the people from Blogger (google) talking about this problems. People were not very familiar with blog / website builders and users were confused when their blogs got created instantly, like "This is a big deal, me getting an entire website, what happened, what went wrong? It must have aborted the process…"

pickingdinner · 2023-07-11T16:09:21

> can be confusing

Confuse who?

An instant "success" notice beats waiting every single time, regardless of user level, if we're even classifying that.

bbarnett · 2023-07-11T16:24:52

The middlings are the problem. Users tech savvy enough to think "wow, that was so fast!", but not tech savvy enough to look and see if there was a 404 or whatever, in the web console.

Which, sadly, is the tech level of most UX people.

Unskilled users are just happy it was fast. Why would it take time, it's a computer!

It's a little like psychiatrists. A surprising number have loads of issues, and go into the business to help themselves.

But this skews perception.

UX people make all sorts of unfounded rules up, many created decades ago, when almost everyone was a "new user".

rolae · 2023-05-29T20:41:07

Both 3.5 and 4 hallucinated according to the professor:

> Most used 3.5. A few used 4 and those essays also had false info. I don't think they used any browsing plug-ins but it's possible--it was a take-home assignment and not one they did in class.

https://twitter.com/cwhowell123/status/1662517400770691072

rolae · 2023-05-24T21:04:16

Mine was basically ignoring all the mining, get as many food plots as possible, and focus on energy and food. Worked well against the computer, they believed there will be enough supply. But I actually bought out the warehouse and then drove prices extremely high.

So I sold food energy at a higher pricepoint than the ore. Often the AI players couldn't afford food / energy anymore and therefore their production collapsed and they did not have enough food to have enough time to change the installations on their plots.

rolae · on Oct 10, 2022

Bookshop.org is about that, privately funded with the objective to let independent book sellers stand up to the power of amazon.

https://bookshop.org/info/about-us

criddell · on Oct 10, 2022

Isn't Bookshop.org really just Ingram Content Group?

greenie_beans · on Oct 11, 2022

yeah bookshop hardly counts for a win for the indies. it's basically affiliate marketing if you're an indie and you link to "your" bookshop

rolae · on May 30, 2021

Andrew Kane created a dataframe gem fairly recently: https://github.com/ankane/rover

rolae · on March 23, 2021

Nice. I did something similar recently with an iFrame where you have a form for configuring customer sites and it displays the preview in an iFrame.

I had to throw in some debouncing and special handling for hidden inputs, as changes to hidden inputs do not trigger a change event on the form.

In the end the markup was very simple and is very reusable (StimulusJS v1):

        <div data-controller="iframe-preview" data-iframe-preview-url="<%= preview_path(@letter) %>">
          <form data-target="iframe-preview.form">
             <textarea name="body">My letter</textarea>
         </form>
         <iframe data-target="iframe-preview.iframe">
        </div>

This is when StimulusJS becomes really nice, when you can compose behavior in your markup with some simple data attributes. I did not think at first that I would need this controller in other places, but a couple of weeks later, I actually needed it, and was able to reuse the controller without modification for another use case.

strzibny · on March 23, 2021

Nice approach. And yes, Stimulus is great for things like this.

rolae · on Jan 15, 2021

That was a regression in version 3.4, fixed here: https://github.com/hopsoft/stimulus_reflex/pull/418

rolae · on Jan 15, 2021

Exactly.

Basically all you to is to call a remote procedure, which you can do by adding a simple `data-reflex="click->Todo#toggle`. In the remote method you change something about the state (for example db, redis). The server rerenders the page and morphs the difference.

If you need more control you can get it by defining which elements to render and to morph.

rolae · on Jan 4, 2021

It looks beautiful, but please increase the contrast. Some texts are #aaa on #fff, which is almost illegible. This is an accessibility issue not just for people with impaired vision, also for people checking a page on a mobile phone on a sunny day.