version instead but I use the `=~` almost entirely, so that would still be a big style change. Probably end up setting a global timeout per app and then overriding for individual checks as needed?
Yeah the implementation they've chosen seems totally perfect to me. Sane global default, easily overridable globally or locally.
No easy way to override it locally when using =~, but I can't imagine too many cases where you would want to use a local timeout anyway.... can just switch away from =~ syntax for those.
This is mostly a denial-of-service mitigation tool, something you'd just want to apply globally to avoid disasters spawned by malformed or malicious input. In practice, it's hard to imagine a use case where you'd really want to be twiddling the knobs on a regexp-by-regexp basis.
Yes good point, I was initially thinking that it would make sense to always ask yourself "how long should this take" and tune appropriately, but for the vast majority of regexes that's overkill, especially if you're not doing anything O(n^2). sticking a 1 second in there gives you a lot of headroom and you can just get more specific for any exceptions.
I was initially thinking that it would make sense
to always ask yourself "how long should this take"
and tune appropriately, but for the vast majority
of regexes that's overkill
More than being overkill, it's actually impossible right?
The execution time will also vary greatly based on base CPU performance, and current server load.
A regexp that takes 10ms to process right now might take 500ms tomorrow when your server is under heavy load. So we can't predict how much time each regexp "needs."
But, like you said, we can set a somewhat ridiculously high limit to help prevent regex-based oopsies or re-based DoS attacks from dragging us down =)
It does not make sense to me, best solution is to have an implementation like re2 that does not have those problems.
Adding a timeout is a bit strange, first because you don't know in advance how long it's going to take for large search. The timeout is a failsafe against something that should be fixed in the first place.
best solution is to have an implementation like re2 that does not have those problems.
By design RE2 isn't fully compatible with Onigmo. As another poster mentioned, a hybrid "use RE2 when possible; fall back to Onigmo otherwise" approach was considered and rejected for well-explained reasons https://bugs.ruby-lang.org/issues/18653
Maybe in addition to `Regexp.timeout = 1.0` there could also be a `Regexp.parser = :re2` option with `:onigmo` being the default.
I think a limit on stack/recursion/backtracking depth would be tad a bit more elegant than a timeout and would keep your code behaviour the same between different machines.
It'd be harder to control perceived performance of user facing applications though. If I can set the timeout, I can guarantee that something happens within X seconds, instead of within X iterations which could have different performance machine to machine.
No, because we’re not trying to find out if the computation will ever halt with unbounded time. It just determines if it will halt after a certain number of steps, which you can always determine by executing that number of steps and bailing out.
They could easily add another regex flag to indicate a "safe" regex. That really isn't the problem here. The problem is that adding a safe regex engine doesn't retroactively fix the thousands and thousands of regexes in gems which can't be unilaterally forced over to the new engine.
actually that is the problem. no they can't 'easily' add a new flag (or they would have). it doesn't fix the problem of being broken by default and any fundamental change in behavior will lead to problems as well.
the point is the language is broken fundamentally and because its syntax sugar there isn't anything you can do realistically. a function call can be replaced, deprecated, and then removed and all of that can be automated and be non-disruptive over a few releases.
The grand-parent post exhibits feigned confusion and using opinions as arguments. It also proposes an _optimal_ solution that will most likely not get implemented in place of pragmatic solution. We see this kind of argumentation when discussing crypto libraries.
The OP is not _confused_, they just don't like the solution.
It makes zero sense, if I'm completely honest. Why stop to regexps? What about every other long-running function? This is better handled on a layer above, in a general fashion - imagine a server running some queries and being able to kill/discard/interrupt them in case they seem stuck.
If you don’t stop long processes on a server, you expose yourself to denial of service attacks. It’s good practice to have a maximum duration for database queries as well.
Of course you should limit the query duration - just not on some arbitrary function-level like done here. To me this sounds like PHP-level incompetence on API design.
Would WAS compilation help solve the ruby / rails timeout problem?
https://devcenter.heroku.com/articles/h12-request-timeout-in...
> H12 - Request Timeout in Ruby (MRI)
> Rack-timeout limitations
> Due to the design of the Ruby programming language’s Thread#raise API, which timing out code depends on, there is a known bug with rack-timeout that can put your app into a bad state if rack-timeout fires too frequently.
I wonder why they didn't just include an option to use a non-backtracking algorithm, like re2's[1]. As far as I know, that would completely eliminate the possibility of catastrophic backtracking happening.
Wrapping RE2 with a fallback to the existing engine to try to maintain compatibility was explored; that, like the timeout approach, is pretty clearly a stopgap measure; actually implementing an RE2-style algorithm without the compatibility and toolchain warts of RE2 for Ruby's existing code and functionality is a bigger but more permanent solution, that I don't think has really been ruled out of explored.
One notable thing is the ruby apps in a single .wasm file. This may make ruby CLI apps easier, as well as eventually replacing things like docker or shipping your ruby code to a server.
You don't just execute a .wasm file, it requires a runtime which will JIT compile the code into machine code and handle the (wasi) system interfaces (e.g. read, write, stat, etc).
Yes this is true with all interpreted languages. But if you consider the use-case the OP was contrasting with (Docker) that not only has the Ruby runtime, but an entire Linux OS as well.
Well, vendor them and tar it up. You could make a ruby script in a few lines that would load the dependencies and then execute the app all as one file.
I'm fairly confident you could write a ruby script that would untar a folder and then execute the entry point in 2-3 lines. Then the trick is just making this script and the tar binary be in one single, executable file. You could do this trivially by encoding the binary in base64, but I'm fairly sure there would be a way to have it as straight binary in well, in a way that the ruby interpreter wouldn't complain about.
You're skipping the fun part. What's in that tarball?
In order to gather the dependencies into the tarball in the first place, they need to be installed. And not all your dependencies are handled by `bundler`. In any interesting project you're going to be depending on system libraries that Rubygems has no way to signal to the OS that it needs. You need to `apt install <foo>` for interesting values of `foo`.
That means you've got a few choices: either you document what those OS packages are and rely on the operator knowing that they need to pay attention, or you try to automagically install them when you unpack your tarball, or you include everything you need from the build system in the artefact. This means running the application and finding all the files it touches via `dlopen` and hoping you exercised all the interesting code paths.
Option 1: you're back to installing dependencies.
Option 2: you're wishing you'd just used `dpkg` to build yourself a .deb, and you're still installing dependencies.
Option 3: now you need to figure out how to swap out OS libraries after the ruby interpreter has started, and need to get all the configuration right to point into your untarred filesystem, and you're wishing you'd just used Docker.
I was thinking the same thing, isn't ruby particularly hard to package as it doesn't support static compilation? It would be nice to just sidestep all of that with a hermetic little WASM distribution.
Not sure it covers what you expect, but there are tools out there to convert a ruby script into a single binary executable, like https://github.com/loureirorg/rb2exe
This looks awesome! I've already played around with pyodide and coldbrew doing the same thing for CPython. I use it for an in-memory playground [0] of an open-source desktop app I build [1]. I've been waiting for Ruby, Julia, and R support to add them in too.
That said, I am not seeing a link in here about how to actually use this code. Is there a good tutorial/example somewhere?
What I can see here with WASM is that in the near future Ruby community could build their frontend frameworks so you can develop using 98% ruby in your app.
There were attempts in the past to use ruby in the frontend, by compiling it to JS (opalrb), so I looking forward to see how the ruby community exploits WASM.
While I don't know how good it will be it still interesting the possibilities that WASM bring to the table, no more forced to use one language for the browser, use whatever language you like!
WASI doesn't support sockets yet, and in any case there's no browser API available to implement socket-level networking. Practical use of this would require bridging over to JS to implement HTTP requests and the like,
What would really excite me is adding WASM support for native extensions. Installing gems in our CI for arm64/M1 takes about 40 times longer than the actual build.
Regarding WASM, what are the benefits to users? Is it that html/js/css loads 2-10 times faster?
Also, what's the impact on a typical rails/ruby dev? Do they have to learn anything new to enjoy improvements WASM brings, or will all the changes be 'under the hood' (i.e. in ruby and/or rails)?
In the browser, applications can run compute expensive tasks faster with WASM. For example, decryption of compressed 3D models (https://google.github.io/draco/). AFAIK Figma uses WASM for some expensive calculations. If there is already C/C++ code, it is especially useful. However, as Javascript is already really fast, for most use cases the overhead doesn´t pay off. WASM in the browser did not take off as expected by many.
WASM on the server is interesting for companies providing compute services like edge computing or traditional FAAS like AWS Lambda. As WASM has very good security features (sandboxed) it is much cheaper to provide a secure edge compute service for WASM only than one that supports Java, .NET, Node, Python, Go, ... However, neither the enduser nor the developer are the winner here. For most languages WASM is slower. For developers it is obviously easier to provide just your .NET code to the cloud provider than to compile it to WASM. Most (if not all) cloud providers also support NodeJS, so if you would need to develop an edge function I'd highly suggest just writing it in Typescript.
So, WASM can help cloud providers make even more money or for very demanding web applications. Otherwise, a lot of hype ...
Sounds useful for applications that have heavy computations client side, and for those which need better security when using Lambdas (or similar), but otherwise not particularly significant for users/devs that don't have those problems.
I've been very attracted to learn Ruby a couple of times, being exhausted of the JS ecosystem. Everybody who's used it seems to fall in love with it, but I can't get over just how slow it is... It takes a fresh installation of Discourse over 10 minutes to start-up again on a small underpowered VM and uses 10x as much RAM as an alternative platform such as Flarum.
I'm one of those people that fell in love with Ruby, and yeah the speed is the biggest downside. That said, a lot of the bulk is often Rails. I usually use Sinatra now and it's pretty light. On the smallest VM it usually starts quickly and runs fine for quite a while. One even survived an HN Hug. There are also some big improvements coming with Ruby 3 (if you aren't already upgraded to that) and more to come. But you definitely "pay" a fee in CPU/memory for the privilege of using Ruby. In most cases, it's way worth it IMHO. I've also been loving Elixir lately. It's got much the same feeling of beauty that Ruby does, and it's much lighter and lightning fast. I often measure response times in microseconds rather than milliseconds!
Expecting Discourse to give you an idea of a typical Ruby/Rails app will be very misleading. I've been working on Rails apps since 2008 and I've never had anything take more than ~10 *seconds* to boot up. For me, Ruby hasn't been slow for years. I work on both Rails and Node-based apps on a regular basis and the two are roughly comparable IMHO in terms of daily DX (perf-wise).
It’s funny, I’ve been writing Ruby code since Rails 1 and over the past few months I’ve been learning and using Node and Typescript and I find it incredibly productive compared to Rails. I get that Rails comes with a lot of great stuff out of the box, but the benefits of static typing for actually writing a lot of code are immense.
I've been doing ruby for the last year or so coming from statically typed languages and yes... 95% of our bugs in prod would not have happened if we had a good typing system.
It doesn't compile Ruby code to wasm code, it compiles the Ruby interpreter to wasm, so it'll be roughly the same performance as as the Ruby interpreter on Windows or Linux.
Also Rails is plenty quick these days, tons of people running it at massive scale.
The best case WASM performance is roughly 20-50% slower than native code, depending on the runtime and the type of code executed.
In the browser you have to also factor in the warmup time.
I'd imagine an interpreter will suffer a lot because certain C tricks like computed goto don't work directly. (This will hopefully be improved by future Wasm proposals)
(Note: that's still plenty fast enough for most use cases, and performance will improve)
This is by far my favorite tech talk of all time. It goes into why WASM can run faster than native code in many contexts, the reason being that it get's around the overhead of OS security rings
That’s not the right conclusion to take from that video.
WASM isn’t faster than native code. It’s that an operating system written from the ground up to use a language VM (for instance, wasm) to implement all memory protection, on the system — and most importantly, no other memory protection, including page tables or processor-level isolation that normally separates kernel code from user space — may end up being more performant than what we have now.
Running wasm on a standard OS kernel like Linux/windows/darwin is not going to give you the benefits. The benefits come from eliminating system call overhead associated with switching in and out of the kernel protection context, which is something you need if you’re executing raw machine code which can load/store any memory address. If you simply eliminate the ability to run arbitrary machine code, you can just let everything run in kernel space and use the language VM (like wasm) to protect memory. The result may or may not be faster overall, but it’s purely hypothetical today because essentially no operating system works this way. (Microsoft wrote a research OS back in the ‘00s to try this out but it ultimately didn’t turn into a shipping product. There may be other research OS’s out there that toy with this idea, but nothing in production… maybe the old symbolics lisp machines work on this principal but I’m not sure. There may have been some similar machines in the smalltalk days as well.)
> Microsoft wrote a research OS back in the ‘00s to try this out but it ultimately didn’t turn into a shipping product.
They called it SIPs (software isolated processes). You can see the benefit of WASM for this problem, though. It has gained significant traction, and it is significantly simpler than CLR. I really hope something comes of it.
WASM requires an interpreter which must be native.
The argument is that this interpreter can be smarter about what crosses OS security rings. But those same improvements could be done in the native compiler or interpreter.
The next argument could be that many things using the WASM target would focus more effort on improving it so all WASM targets benefit outpacing their individual optimizations.
This one is harder to dismiss outright, but instead of optimizing for machine code you are now optimizing your WASM output.
Also this intermediate byte code representation already exists for both LLVM and JVM, which many languages target.
It is difficult to see WASM magically improving performance at all and especially not dramatically enough to encourage people to switch to it for that reasoning.
This talk is about asm.js which is a precursor technology to wasm, parents logic seems to be "wasm is an improvement on asm.js". I have no idea if the kernel isolation benefits the garry bernhardt talk is about apply.
While the parent is does seem to be treading into Poe's law territory, it's not entirely correct to dismiss that talk's relationship to wasm based on the dates your quoting.
Bernhardt in the talk explicitly mentions asm.js which is the precursor to wasm (it's even mentioned in the wikipedia article you skimmed a bit too quickly). asm.js was released Feburary 2013.
I'm surprised HN has such a short memory, but the impetus for that talk was a clearly disturbing trend at the time implying that everything should be done in javascript. Node.js was gaining rapid popularity, people were discussing javascript as the new C for using as the language to write example code in, and while things like asm.js were exciting, they seemed to point towards the hilariously nightmarish future Bernhardt is discussing there.
asm.js was first mentioned in 2013. asm.js was eventually superseded by wasm and is pretty much the beginning of wasm as we know it. Didn't watch the talk, but could asm.js be the thing the presenter was talking about?
Yes, but they're not portable/interoperable in the way that a WASM version would be -- which is why the WASM version is exciting, right?
(Somebody correct me if I'm wrong; I know what WASM is but I'm not sure how it's employed in practice outside of in-browser tech demos of games and things)
Does the WASM version even support JIT? As far as I know, it‘s conceptually not possible in WASM (due to the runtime being essentially a Von Neumann machine).
What I mean is doing JIT compilation inside WASM, not a WASM implementation running WASM programs using JIT (which is definitely possible and being done, as far as I know).
I'm not sure how the former would work in a pure Von-Neumann-architecture, where it's not possible to reference code as data (and consequently also not possible to emit new code).
What would be interesting to JIT compile in the context of Ruby isn't the Ruby interpreter itself (both AOT and JIT should be fine for that), but the Ruby programs it would be running.
Not really. Opal is a source-to-source compiler that compiles Ruby to JavaScript. Ruby 3.2 compiles the whole Ruby VM and runtime to wasm, which then runs Ruby inside a real Ruby VM nested within the JS VM.
A good analogy is that Opal is like PureScript, whereas Ruby 3.2 is like GHCJS.
The syntax feels a little rough although I have no ideas how to make it better:
I think I would favor the: version instead but I use the `=~` almost entirely, so that would still be a big style change. Probably end up setting a global timeout per app and then overriding for individual checks as needed?