Rather than trying to sanitize the input to 'eval', why not just write a separate expression evaluator? Then you can know for sure that there's no way to craft an input that causes an arbitrary function call, because your evaluator has no way to make an arbitrary function call. Also, you can define whatever operators you like.
This use case doesn't seem to me to be one that's likely to arise often.
if you had an evaluator which collected up all the mutations to the global state as a batched commit ala software transactional memory, imagine all the fun games you could play
> Using eval on data that are provided by a user is always a bad idea.
True. Most bugs with security impact seem to happen in the input handling layer, where user input is parsed into data structures. One of the most fundamental problems in today's networked world is how software must be able to safely communicate with virtually any other computer in the world. Servers are configured to indiscriminately accept connections from anyone who tries to connect; no concept of trust is involved.
Or simply use the replacer to generate what you want - include illegal characters like letters, and once they get removed, you get what you want: `((x)=>5)(x)` becomes `(()=>5)()` once the x is removed.
I don't see a way to exploit it, but that means very little (as shown by the fact that the author missed this one).
> include illegal characters like letters, and once they get removed
Removing characters you know to be illegal and then trying to run it anyway is asking for trouble. If it contains illegal characters, it should fail immediately.
> This is very quickly becoming a huge pain for me to write. I have to write a compiler, and a lexer. I have to continue extending this for each new kind of operator added.
Counterpoint: good luck getting that magical regex to scale to many new operators. Or other data types, for that matter. What if, for instance, you wanted to add support for strings? Suddenly, the entire original approach is useless and building a tiny compiler is the best and safest option.
It also the case that, as olegbulatov demonstrated, you're fighting an uphill battle trying to sanitize code with a regex. There's really not reasonable way to prove with 100% certainty that a regexp blocks _all possible expressions_ that could do nasty things. Writing a tiny compiler might be a pain in the ass, but you can be very sure of the safety that it offers....and it doesn't need to use eval().
Yup. Sanitizing arbitrary expressions is even harder than sanitizing sql expressions. After years of seeing how regex based sql injection prevention doesn't work we shouldn't apply the same things here.
You're trying to get the evil out of the string, at which point the remainder of the string is good. That's not how strings work. If it happens to work now, who knows what syntax will show up next year (and there are plenty of examples on this page of working around the regex). You're in a constant battle of figuring out whether bytes are good or evil when the actually answer is they're neither.
In the E family, cousins to ECMAScript, the eval() function is confinement-safe and cannot access objects which it hasn't been given explicit access to use. The proof actually can be extended to any lambda calculus which doesn't have global mutable state.
Edit: I should add that the crux of the technique involves a two-argument eval() with an explicit environment, as opposed to the single-argument eval() of ECMAScript.
Right. Access to the environment is the dangerous part, not the eval function. There's a proposal to tame the JS global environment along those lines: https://github.com/tc39/proposal-frozen-realms
Note however that the threat model here does not include denial of service. You have to address that at another level like OS processes or E vats.
This idea is neat, and the article does a good job of highlighting some of the potential evil and potential good of eval. However, as many other commenters have pointed out, it's far from a complete or trustable solution--and even if it's good enough for now, it may not be good enough for the JS of tomorrow (this language iterates bloody fast).
Far better would be a capability-based sandbox API in the browser (like the Node.JS one: https://nodejs.org/api/vm.html#vm_vm_createcontext_sandbox_o...) that also supports filters like "stack depth" to limit infinite recursion, and/or some sort of opcount to limit other nonterminating functions. Combine those two things, and you have at least a way to secure arbitrary code you want to eval.
I think that most people with a real/non-insane need for such a service would be willing to pay a fair amount of performance (e.g. introducing an Erlang-esque reduction count idea to code inside the sandbox) to get it working safely.
Mind you, even if that existed it'd be hard to use correctly; people would try to safe-eval code, fail, and open up over-promiscuous capabilities (just like redis: http://antirez.com/news/118), putting us back at square one. For many purposes, though, it would be a very interesting and powerful tool when wielded carefully.
While fun, I do hope this is a joke, since as other comments have said, it’s fairly straight-forward to make a DSL for the given problems that this could solve. Beyond that, this would never be safe from syntax mutations in new versions of ECMAScript and sadly would cause issues similar to the recent smoosh controversy (where a common library implemented `flatten` in a way that was incompatible with the new ES specification).
At this point, why not just implement a restricted interpreter in JS with a simple language. As soon as you do anything non-trival you're going to want variables and function calls.
`(a)` isn't allowed because letters aren't allowed, and you can't put a number in the argument location (e.g. `((1) => 2)(3)` doesn't parse as valid JS).
I agree that safe eval and safe JSON parsing should be higher class citizens in the JS world. Just curious, is this (one of the reasons) why companies like Salesforce introduce an alternate language such as VisualForce, to have more control over safety? It would be nice to not have to force users to learn a new language to enable dynamic runtime features. I feel that it’s near impossible to outsmart all bad actors, but it would be nice to enable full user creativity by allowing access to an existing language.
If you really wanted to do this you could avoid a whole bunch of badness by spawning a worker, clearing out all the timer and networking APIs and just evaluate the script there.
Infinite loops etc could still happen but they would not significantly effect your page and the worker gets killed when the user navigates.
As far as their regex goes I’m not sure it actually prevents the wonders of [] based programming.
Python has ast.literal_eval built into the standard library, which is a pure Python lexer/parser/evaluator for Python syntax. So you don't have to write your own lexer and parser and so on. It's somewhat slower though, being written in pure Python.
Wouldn't doing `eval` in a service worker provide an even greater level of safety? Might even be enough to drop some of the character restrictions, if the use case was broader than the one here.
An answer on StackOverflow https://stackoverflow.com/a/26488003 claims that web workers count as same origin, but you can totally solve this use case with <iframe sandbox>. (And other answers say that sites like JSFiddle do exactly this.)
There's an example of using this for untrusted eval in the linked HTML5Rocks post: https://www.html5rocks.com/en/tutorials/security/sandboxed-i... The amount of code involved is pretty minimal and doesn't involve opaque regexes, so something like that is probably the actual best option for someone who wants safe eval.
That StackOverflow user also has their own library that does this using a sandboxed iframe and a web worker: https://github.com/asvd/jailed
This is what I do for a "learn to code" game I'm building. The sandboxed Web worker is served with CSP headers and only interacts with the main part of the page through worker messages.
This use case doesn't seem to me to be one that's likely to arise often.