Hacker News new | past | comments | ask | show | jobs | submit login

Concurrent threads using magic regexp vars like $1 stomp all over each other, quite nasty: https://github.com/jruby/jruby/issues/3031

Stumbling over a serious race condition in the first 5 minutes of trying it with real code makes me a bit wary. All the performance in the world isn't much good if it's randomly wrong :/




One of the things I've most appreciated about the JRuby community is that bugs get fixed in a hurry. My first experience with JRuby was trying it, something not working, jumping into IRC to ask about it, and headius had it diagnosed and patched 10 minutes later. From then on I was hooked.

The other major thing I like about JRuby is that it's much easier to hack on than MRI - the Java code is extremely clean and easy to understand, and it's easy to use tools like IntelliJ's debugger to diagnose issues quickly and easily. I've contributed dozens of commits to JRuby specifically because the barrier to entry is just a lot lower than it is in MRI.

The community is really, really good, and the project is extremely hackable - it's an embodiment of the best in open source, IMO, and I'm really excited for this release in the hopes that it gets more people using it.


This is not a hard one to fix but it didn't make the final release and had never been reported before, despite being largely the same for 9 years. It will be fixed as soon as we can get to it.

I will echo what others have said, though...closures capture and share a lot of other state. $~ and the related vars like $1 are supposed to be "special" but there's other state you're going to stomp on all Ruby impls. It's better to avoid using closure state if you know it's going to be called across threads, because most of that state will be shared on all Rubies.


> had never been reported before, despite being largely the same for 9 years

I'll admit to encountering weird "er, why is that randomly nil" errors quite regularly with JRuby when load testing webservers, which, er, I've never reported. It's never been obvious where the fault was, really. And it always seemed unlikely to be your fault, tsk ;)

> It's better to avoid using closure state if you know it's going to be called across threads

Yeah. In this case it was a hash of name -> lambda pairs which boiled down to variations on:

    ->(obj) do
      case obj.bla
        when /foo(.*)/ then "FOO_#{$1.upcase}"
        when /bla(.*)/ then $1
      end
    end
Called in lots of tight loops from a pool of worker threads. I refactored it into something neater, but it still should have worked fine :)


It would be worth proposing to ruby-core that captured closures are thread-local, but that would break a lot of code that actually depends on the sharing. Programming is hard :-(


Huh, if the local various was only scoped to the closure block, and not above it, I would never expect it to be shared. I would think "avoid using closure state" means exactly that, use only local variables scoped no higher than the closure block itself. (It's true this can sometimes be difficult to ensure in ruby; block local variables can help).

Do I understand things right, or am I wrong here?

I guess the question is where the regexp special vars are scoped to though, I see how that's not entirely clear.


You're largely right. The problem is that $~ (and related vars) and $_ are scoped to the nearest method body. If they were scoped to the closure itself, there'd be no problem.


You really shouldn't be relying on globals anyway. Using $1 etc. is a serious code smell. It shouldn't by definition be threadsafe in any event. The only way that would work would be if you rely on the threads not actually running at the same time as an implementation detail of the MRI.


$~ and friends are not global:

  def f
    p $~ # => nil
    "123" =~ /\w+/
    p $~ # => #<MatchData "123">
  end

  p $~ # => nil
  "abc" =~ /\w+/
  p $~ # => #<MatchData "abc">

  f

  p $~ # => #<MatchData "abc">


Matz himself has said he regrets putting these variables in the language, and some day they may disappear. We'll match behavior as much as possible, but they're a relic no matter how you slice it.


Well, it's fair to say "it works in MRI"; $1 being frame-local and thread-local is an exception to the rule, but it is How It Works. JRuby should behave similarly, full stop.


They're what MRI calls "svars", or special variables. They're thread and scope-local, and regardless of your feelings towards them, they're still used all over the place, in old and new code alike.


Your comment has little to do with JRuby 9000, as the bug reported there was observed in JRuby 1.7.20. Yes, JRuby has bugs, like MRI, like any interpreter or compiler.


The same bug still exists in JRuby 9000, and was in fact the first thing I ran into when trying it.

Considering threading is the main selling point, one of the commonest patterns of regexp use being completely and dangerously broken with them would have seemed like something of a showstopper.


"Commonest" is a big claim. I'm not a JRuby user, but I sling a whole bunch of Ruby and I've never, not once, used the global regexp stuff--indeed, I didn't know it existed before just now.

I can see wanting it fixed, but it's a pretty odd hill to die on.


I've been writing Ruby for 7 years now, and I've known about the regex svars for about as long. Never used them once. They're obviously bad style.


> one of the commonest patterns of regexp use

Is it really? Even fairly early in the Ruby 1.8.x era, most recommendations I saw were that the magic regexp globals (and many other magic globals) were a perlism that should generally be avoided.


Huh, I use `$1` all the time, and see it all the time. Probably because the alternative with an explicit match object ends up being relatively a lot more code and a lot harder to read, really.

If avoiding `$1` has been often recommended for a while... I think it's a recommendation more often ignored than followed.


I've never seen $1 used in production code.


Try grepping your lib directory sometime. In fact, let me do it for you, on a relatively clean install:

time, benchmark, irb, pry, rubygems, resolv, erb, rake, open-uri, debug, rack, cgi/util, optparse, getoptlong, bundler, activesupport, actionpack, activemodel, erubis, slim, haml, sass, sequel, roda, nokogiri, rugged, faraday, mime-types, thor, test-unit, tzinfo, mail, ffi


Ouch. That's pretty awful.


I don't think it's that common a pattern. All those perlisms make code unreadable anyway. I much prefer things like match with named captures, which generally make everything more obvious.


Ah, I didn't get that from your original post. So JRuby 9000 shipped with a known concurrency bug that, from the face of it, seems likely to be a problem in actual use. I agree that seems a bit worrying.


You have to ship some time. Noisy bugs get fixed...and patches are always accepted :-)


Of course is can be completely reasonable to ship with a known bug, which is why my initial reaction to the OP was: why are you bringing this up here?

However, not all bugs are made equal and this one seems relatively likely to actually cause problems for users. Wouldn't someone running some service and handling say 10 requests/second, while using the regex global variables, run into this bug on a daily basis?

So I guess I'm just interested in how such a decision is made: what bugs get shipped and which block a release?

If this bug would reasonably cause problems in such a situation and if the policy of JRuby is to ship anyway, that seems to be a relevant piece of information to consider for someone using JRuby in production and considering whether to upgrade.

Perhaps one should always check the list of open bugs for the version of a compiler/interpreter one intends to start using, but I've never done so, haven't been bitten by a bug yet (AFAIK) and yet this one seems one that could be a problem. The main problem is of course that this may just be 'the curse of knowledge' in play.

So I guess I was being a bit dismissive towards OP, then I was sympathetic and now I'm mostly thinking about how I should handle this as someone using JRuby in production. Perhaps I should just ignore it. Perhaps I should investigate the set of concurrency tests and contribute a few. Perhaps I should be conservative and only use 'proven' JRuby versions. Perhaps I should learn to stop worrying and love the bomb :)


Of perhaps you should simply shy away from global perlisms in a Ruby app, which are really bad practice anyway.


What VM doesn't ship with bugs likely to be a problem in actual use, though?


I can make my code arbitrarily fast if it doesn't have to be correct.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: