I recently experimented using Gemini 2 Flash and Gemini 2 Flash Thinking with unrestricted code execution in a local sandbox and open sourced the results:
https://github.com/gradion-ai/freeact. Here's an example that uses Gemini 2 Flash Thinking as agent that acts via code, executed in a sandbox based on IPython and Docker:
https://gist.github.com/krasserm/dcdae47f85ee9922e3284953d07...Gemini's code execution environment is restricted to selected Python libraries like NumPy or SymPy and prevents the model from installing new packages, besides other limitations. While this may be useful for agentic applications in restricted environments, it may prevent agents from adapting to new environments, especially agents that write their actions in code (see CodeAct paper https://arxiv.org/abs/2402.01030).
Has anyone else experimented with Gemini 2 as a CodeAct agent? I'd be particularly interested in hearing about approaches to unrestricted code execution.