Concurrent threads using magic regexp vars like $1 stomp all over each other, qu...

cheald · on July 22, 2015

One of the things I've most appreciated about the JRuby community is that bugs get fixed in a hurry. My first experience with JRuby was trying it, something not working, jumping into IRC to ask about it, and headius had it diagnosed and patched 10 minutes later. From then on I was hooked.

The other major thing I like about JRuby is that it's much easier to hack on than MRI - the Java code is extremely clean and easy to understand, and it's easy to use tools like IntelliJ's debugger to diagnose issues quickly and easily. I've contributed dozens of commits to JRuby specifically because the barrier to entry is just a lot lower than it is in MRI.

The community is really, really good, and the project is extremely hackable - it's an embodiment of the best in open source, IMO, and I'm really excited for this release in the hopes that it gets more people using it.

headius · on July 22, 2015

This is not a hard one to fix but it didn't make the final release and had never been reported before, despite being largely the same for 9 years. It will be fixed as soon as we can get to it.

I will echo what others have said, though...closures capture and share a lot of other state. $~ and the related vars like $1 are supposed to be "special" but there's other state you're going to stomp on all Ruby impls. It's better to avoid using closure state if you know it's going to be called across threads, because most of that state will be shared on all Rubies.

Freaky · on July 22, 2015

> had never been reported before, despite being largely the same for 9 years

I'll admit to encountering weird "er, why is that randomly nil" errors quite regularly with JRuby when load testing webservers, which, er, I've never reported. It's never been obvious where the fault was, really. And it always seemed unlikely to be your fault, tsk ;)

> It's better to avoid using closure state if you know it's going to be called across threads

Yeah. In this case it was a hash of name -> lambda pairs which boiled down to variations on:

    ->(obj) do
      case obj.bla
        when /foo(.*)/ then "FOO_#{$1.upcase}"
        when /bla(.*)/ then $1
      end
    end

Called in lots of tight loops from a pool of worker threads. I refactored it into something neater, but it still should have worked fine :)

headius · on July 22, 2015

It would be worth proposing to ruby-core that captured closures are thread-local, but that would break a lot of code that actually depends on the sharing. Programming is hard :-(

jrochkind1 · on July 22, 2015

Huh, if the local various was only scoped to the closure block, and not above it, I would never expect it to be shared. I would think "avoid using closure state" means exactly that, use only local variables scoped no higher than the closure block itself. (It's true this can sometimes be difficult to ensure in ruby; block local variables can help).

Do I understand things right, or am I wrong here?

I guess the question is where the regexp special vars are scoped to though, I see how that's not entirely clear.

headius · on July 23, 2015

You're largely right. The problem is that $~ (and related vars) and $_ are scoped to the nearest method body. If they were scoped to the closure itself, there'd be no problem.

rugmug5 · on July 22, 2015

You really shouldn't be relying on globals anyway. Using $1 etc. is a serious code smell. It shouldn't by definition be threadsafe in any event. The only way that would work would be if you rely on the threads not actually running at the same time as an implementation detail of the MRI.

infraruby · on July 22, 2015

$~ and friends are not global:

  def f
    p $~ # => nil
    "123" =~ /\w+/
    p $~ # => #<MatchData "123">
  end

  p $~ # => nil
  "abc" =~ /\w+/
  p $~ # => #<MatchData "abc">

  f

  p $~ # => #<MatchData "abc">

headius · on July 22, 2015

Matz himself has said he regrets putting these variables in the language, and some day they may disappear. We'll match behavior as much as possible, but they're a relic no matter how you slice it.

cheald · on July 22, 2015

Well, it's fair to say "it works in MRI"; $1 being frame-local and thread-local is an exception to the rule, but it is How It Works. JRuby should behave similarly, full stop.

Freaky · on July 22, 2015

They're what MRI calls "svars", or special variables. They're thread and scope-local, and regardless of your feelings towards them, they're still used all over the place, in old and new code alike.

Confusion · on July 22, 2015

Your comment has little to do with JRuby 9000, as the bug reported there was observed in JRuby 1.7.20. Yes, JRuby has bugs, like MRI, like any interpreter or compiler.

Freaky · on July 22, 2015

The same bug still exists in JRuby 9000, and was in fact the first thing I ran into when trying it.

Considering threading is the main selling point, one of the commonest patterns of regexp use being completely and dangerously broken with them would have seemed like something of a showstopper.

eropple · on July 22, 2015

"Commonest" is a big claim. I'm not a JRuby user, but I sling a whole bunch of Ruby and I've never, not once, used the global regexp stuff--indeed, I didn't know it existed before just now.

I can see wanting it fixed, but it's a pretty odd hill to die on.

tinco · on July 22, 2015

I've been writing Ruby for 7 years now, and I've known about the regex svars for about as long. Never used them once. They're obviously bad style.

dragonwriter · on July 22, 2015

> one of the commonest patterns of regexp use

Is it really? Even fairly early in the Ruby 1.8.x era, most recommendations I saw were that the magic regexp globals (and many other magic globals) were a perlism that should generally be avoided.

jrochkind1 · on July 22, 2015

Huh, I use `$1` all the time, and see it all the time. Probably because the alternative with an explicit match object ends up being relatively a lot more code and a lot harder to read, really.

If avoiding `$1` has been often recommended for a while... I think it's a recommendation more often ignored than followed.

thescrewdriver · on July 23, 2015

I've never seen $1 used in production code.

Freaky · on July 23, 2015

Try grepping your lib directory sometime. In fact, let me do it for you, on a relatively clean install:

time, benchmark, irb, pry, rubygems, resolv, erb, rake, open-uri, debug, rack, cgi/util, optparse, getoptlong, bundler, activesupport, actionpack, activemodel, erubis, slim, haml, sass, sequel, roda, nokogiri, rugged, faraday, mime-types, thor, test-unit, tzinfo, mail, ffi

thescrewdriver · on July 29, 2015

Ouch. That's pretty awful.

VeejayRampay · on July 23, 2015

I don't think it's that common a pattern. All those perlisms make code unreadable anyway. I much prefer things like match with named captures, which generally make everything more obvious.

Confusion · on July 22, 2015

Ah, I didn't get that from your original post. So JRuby 9000 shipped with a known concurrency bug that, from the face of it, seems likely to be a problem in actual use. I agree that seems a bit worrying.

headius · on July 22, 2015

You have to ship some time. Noisy bugs get fixed...and patches are always accepted :-)

Confusion · on July 23, 2015

Of course is can be completely reasonable to ship with a known bug, which is why my initial reaction to the OP was: why are you bringing this up here?

However, not all bugs are made equal and this one seems relatively likely to actually cause problems for users. Wouldn't someone running some service and handling say 10 requests/second, while using the regex global variables, run into this bug on a daily basis?

So I guess I'm just interested in how such a decision is made: what bugs get shipped and which block a release?

If this bug would reasonably cause problems in such a situation and if the policy of JRuby is to ship anyway, that seems to be a relevant piece of information to consider for someone using JRuby in production and considering whether to upgrade.

Perhaps one should always check the list of open bugs for the version of a compiler/interpreter one intends to start using, but I've never done so, haven't been bitten by a bug yet (AFAIK) and yet this one seems one that could be a problem. The main problem is of course that this may just be 'the curse of knowledge' in play.

So I guess I was being a bit dismissive towards OP, then I was sympathetic and now I'm mostly thinking about how I should handle this as someone using JRuby in production. Perhaps I should just ignore it. Perhaps I should investigate the set of concurrency tests and contribute a few. Perhaps I should be conservative and only use 'proven' JRuby versions. Perhaps I should learn to stop worrying and love the bomb :)

VeejayRampay · on July 23, 2015

Of perhaps you should simply shy away from global perlisms in a Ruby app, which are really bad practice anyway.

gsnedders · on July 22, 2015

What VM doesn't ship with bugs likely to be a problem in actual use, though?

incepted · on July 22, 2015

I can make my code arbitrarily fast if it doesn't have to be correct.