Hacker News new | past | comments | ask | show | jobs | submit login
A Case for Safe Eval (github.com/robert-j-webb)
58 points by haburka on June 2, 2018 | hide | past | favorite | 42 comments

Rather than trying to sanitize the input to 'eval', why not just write a separate expression evaluator? Then you can know for sure that there's no way to craft an input that causes an arbitrary function call, because your evaluator has no way to make an arbitrary function call. Also, you can define whatever operators you like.

This use case doesn't seem to me to be one that's likely to arise often.

Why stop at Perl? DSLs and limited environments are almost as old as Lisp.

Nobody said it was new.

if you had an evaluator which collected up all the mutations to the global state as a batched commit ala software transactional memory, imagine all the fun games you could play

> Additionally, (() => 5)() doesn't work because we don't allow open and close parens next to each other.

Not really, I can write

Using eval on data that are provided by a user is always a bad idea. You cannot be sure that your sanitiser will be safe with new syntax elements.

    '((/**/) => 5)(/**/)'.replace(/(\(\s*\)|[^0-9.()+\-*\/><=!&|?:])+/g, '')
    > "((/**/)=>5)(/**/)"
    > 5
Yup, this does indeed work.

> Using eval on data that are provided by a user is always a bad idea.

True. Most bugs with security impact seem to happen in the input handling layer, where user input is parsed into data structures. One of the most fundamental problems in today's networked world is how software must be able to safely communicate with virtually any other computer in the world. Servers are configured to indiscriminately accept connections from anyone who tries to connect; no concept of trust is involved.

Or simply use the replacer to generate what you want - include illegal characters like letters, and once they get removed, you get what you want: `((x)=>5)(x)` becomes `(()=>5)()` once the x is removed.

I don't see a way to exploit it, but that means very little (as shown by the fact that the author missed this one).

> include illegal characters like letters, and once they get removed

Removing characters you know to be illegal and then trying to run it anyway is asking for trouble. If it contains illegal characters, it should fail immediately.

If the author is rejecting ‘()’, they should probably also reject ‘/*’ and ‘//‘ as well.

> This is very quickly becoming a huge pain for me to write. I have to write a compiler, and a lexer. I have to continue extending this for each new kind of operator added.

Counterpoint: good luck getting that magical regex to scale to many new operators. Or other data types, for that matter. What if, for instance, you wanted to add support for strings? Suddenly, the entire original approach is useless and building a tiny compiler is the best and safest option.

It also the case that, as olegbulatov demonstrated, you're fighting an uphill battle trying to sanitize code with a regex. There's really not reasonable way to prove with 100% certainty that a regexp blocks _all possible expressions_ that could do nasty things. Writing a tiny compiler might be a pain in the ass, but you can be very sure of the safety that it offers....and it doesn't need to use eval().

Yup. Sanitizing arbitrary expressions is even harder than sanitizing sql expressions. After years of seeing how regex based sql injection prevention doesn't work we shouldn't apply the same things here.

This seems like just another form of string trepanation: https://thoughtstreams.io/glyph/string-trepanation/

You're trying to get the evil out of the string, at which point the remainder of the string is good. That's not how strings work. If it happens to work now, who knows what syntax will show up next year (and there are plenty of examples on this page of working around the regex). You're in a constant battle of figuring out whether bytes are good or evil when the actually answer is they're neither.

Full recognition before processing! http://langsec.org/occupy/

In the E family, cousins to ECMAScript, the eval() function is confinement-safe and cannot access objects which it hasn't been given explicit access to use. The proof actually can be extended to any lambda calculus which doesn't have global mutable state.

Edit: I should add that the crux of the technique involves a two-argument eval() with an explicit environment, as opposed to the single-argument eval() of ECMAScript.

Right. Access to the environment is the dangerous part, not the eval function. There's a proposal to tame the JS global environment along those lines: https://github.com/tc39/proposal-frozen-realms

Note however that the threat model here does not include denial of service. You have to address that at another level like OS processes or E vats.

I can't believe nobody else pointed out JSFuck.

Try the converter at http://www.jsfuck.com/ -- or browse the esolang entry: https://esolangs.org/wiki/JSFuck

Spoiler: it can do everything normal JS can. Just because it uses 20,000 characters to do a simple if statement doesn't mean your eval is 'safe'.

So we restrict it to a subset that you can’t figure out how to exploit? How reassuring.

Haha... exactly. Seems questionable to me.

This idea is neat, and the article does a good job of highlighting some of the potential evil and potential good of eval. However, as many other commenters have pointed out, it's far from a complete or trustable solution--and even if it's good enough for now, it may not be good enough for the JS of tomorrow (this language iterates bloody fast).

Far better would be a capability-based sandbox API in the browser (like the Node.JS one: https://nodejs.org/api/vm.html#vm_vm_createcontext_sandbox_o...) that also supports filters like "stack depth" to limit infinite recursion, and/or some sort of opcount to limit other nonterminating functions. Combine those two things, and you have at least a way to secure arbitrary code you want to eval.

I think that most people with a real/non-insane need for such a service would be willing to pay a fair amount of performance (e.g. introducing an Erlang-esque reduction count idea to code inside the sandbox) to get it working safely.

Mind you, even if that existed it'd be hard to use correctly; people would try to safe-eval code, fail, and open up over-promiscuous capabilities (just like redis: http://antirez.com/news/118), putting us back at square one. For many purposes, though, it would be a very interesting and powerful tool when wielded carefully.

While fun, I do hope this is a joke, since as other comments have said, it’s fairly straight-forward to make a DSL for the given problems that this could solve. Beyond that, this would never be safe from syntax mutations in new versions of ECMAScript and sadly would cause issues similar to the recent smoosh controversy (where a common library implemented `flatten` in a way that was incompatible with the new ES specification).

Title should be:

> A Case for Safe Eval

The author's argument is that there is at least one valid use case for eval, not that it belongs in all web projects.

Also, "in JavaScript" can be appended to make clear that this is specific to that programming language.

"Using JavaScript's eval function to evaluate limited arithmetic expressions safely"

At this point, why not just implement a restricted interpreter in JS with a simple language. As soon as you do anything non-trival you're going to want variables and function calls.

This is similar to what Tcl does for Safe Interpreters.


I see why it doesn’t allow their example of “(() => 5)()”

But what is preventing me from doing “((a) => 5)(0)” since the regex only requires at least one non space char between parens...

Or any function call really?

`(a)` isn't allowed because letters aren't allowed, and you can't put a number in the argument location (e.g. `((1) => 2)(3)` doesn't parse as valid JS).

A commenter above offered


HN formatting is messing this up. It should have two asterisks between each "//"

indent 2+ spaces for code-like formatting :)

  *plenty of asterisks*
Also simplifies ascii art since it's mono-spaced:

        |\__/,|   (`\
      _.|o o  |_   ) )

Maybe the author of this could provide a web page that allows anybody to enter "safe" JavaScript.

Provide a $10 prize for setting a cookie to value xyz, or for calling function zyx().

Repeat until safe.. (Makes me think how bug bounties really are useful but not that useful for security).

Need to raise the bounty quite a lot to make white hat more attractive than black hat.

Yet lots of people do security CTF competitions (https://ctftime.org/ctf-wtf/) for free. Offer less instead!

I agree that safe eval and safe JSON parsing should be higher class citizens in the JS world. Just curious, is this (one of the reasons) why companies like Salesforce introduce an alternate language such as VisualForce, to have more control over safety? It would be nice to not have to force users to learn a new language to enable dynamic runtime features. I feel that it’s near impossible to outsmart all bad actors, but it would be nice to enable full user creativity by allowing access to an existing language.

If you really wanted to do this you could avoid a whole bunch of badness by spawning a worker, clearing out all the timer and networking APIs and just evaluate the script there.

Infinite loops etc could still happen but they would not significantly effect your page and the worker gets killed when the user navigates.

As far as their regex goes I’m not sure it actually prevents the wonders of [] based programming.

Found it again — jsfuck.com does the translation automatically. I have not tested whether it’s doable yet

Python has ast.literal_eval built into the standard library, which is a pure Python lexer/parser/evaluator for Python syntax. So you don't have to write your own lexer and parser and so on. It's somewhat slower though, being written in pure Python.

Wouldn't doing `eval` in a service worker provide an even greater level of safety? Might even be enough to drop some of the character restrictions, if the use case was broader than the one here.

An answer on StackOverflow https://stackoverflow.com/a/26488003 claims that web workers count as same origin, but you can totally solve this use case with <iframe sandbox>. (And other answers say that sites like JSFiddle do exactly this.)

There's an example of using this for untrusted eval in the linked HTML5Rocks post: https://www.html5rocks.com/en/tutorials/security/sandboxed-i... The amount of code involved is pretty minimal and doesn't involve opaque regexes, so something like that is probably the actual best option for someone who wants safe eval.

That StackOverflow user also has their own library that does this using a sandboxed iframe and a web worker: https://github.com/asvd/jailed

This is what I do for a "learn to code" game I'm building. The sandboxed Web worker is served with CSP headers and only interacts with the main part of the page through worker messages.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact