Marginally less unsafe evaluation of LLM-generated code

When a `Template` generates higher-order objects like `Callable`s or `type`s that require some amount of Python evaluation to decode, we should do something a little less dangerous and error-prone than naively invoking Python's `builtins.eval`/`exec` on an arbitrary code string in the original application context.

There's no way to guarantee that any nontrivially expressive form of LLM-generated Python code execution is safe, but there are a few things we could look into that might help mitigate the worst failure modes mostly automatically:
- We temporarily removed the synthesis functionality in `handlers.llm` in part for this reason (#470). When it's added back evaluation should still only happen when a specific handler (or at least environment variable) is active, not by default.
- [`RestrictedPython`](https://github.com/zopefoundation/RestrictedPython) implements a more comprehensive and configurable suite of analyses and transformations at the level of `ast.AST` objects as well as drop-in replacements for `builtins.compile`/`exec`/`eval` aimed at preventing code from accessing the surrounding Python environment. We could potentially use one or both parts of this library directly.
- [`smolagents`](https://huggingface.co/docs/smolagents/index) implements a [meta-circular evaluator](https://github.com/huggingface/smolagents/blob/main/src/smolagents/local_python_executor.py#L1411) of Python `ast.AST`s for a subset of valid Python code. Using or adapting that code would be straightforward and it could have applications beyond the LLM module. A related idea would be to implement a custom interpreter at the level of CPython VM bytecode rather than `ast.AST`, as in #288 
- Type annotations (#451), doctests (#433) and string formatting may sneakily trigger calls to `builtins.eval` even inside a meta-circular evaluator. We might want to do something about these edge cases specifically, e.g. scrubbing annotations from generated code after typechecking.
- [`progent`](https://arxiv.org/abs/2504.11703) implements a tiny DSL for tool access policies that is not unrelated to our discussion of effect types in #448 
- All of the above are distinct from and complementary to sandboxing, where we run a Python process or subprocess in a fresh, isolated OS-level environment (e.g. a Docker container) or even a remote machine (e.g. AWS Lambda). Probably any deployed LLM application that is both connected to the outside world and executing generated code should be running in a container, something we could attempt to heuristically detect and warn a user about. However, deeply nesting sandboxes at the level of individual `Template` calls seems clunky and challenging to debug without much additional security upside.

None of these approaches comes with strong security guarantees in adversarial environments, but together they should be enough to keep our own LLM applications from blowing up themselves or their machines, and they are automatic and lightweight enough that they can provide rapid feedback to an LLM when proposed output violates their constraints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marginally less unsafe evaluation of LLM-generated code #490

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Marginally less unsafe evaluation of LLM-generated code #490

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions