Skip to content

Marginally less unsafe evaluation of LLM-generated code #490

@eb8680

Description

@eb8680

When a Template generates higher-order objects like Callables or types that require some amount of Python evaluation to decode, we should do something a little less dangerous and error-prone than naively invoking Python's builtins.eval/exec on an arbitrary code string in the original application context.

There's no way to guarantee that any nontrivially expressive form of LLM-generated Python code execution is safe, but there are a few things we could look into that might help mitigate the worst failure modes mostly automatically:

  • We temporarily removed the synthesis functionality in handlers.llm in part for this reason (handlers.llm.synthesis should be replaced with a stub in initial release #470). When it's added back evaluation should still only happen when a specific handler (or at least environment variable) is active, not by default.
  • RestrictedPython implements a more comprehensive and configurable suite of analyses and transformations at the level of ast.AST objects as well as drop-in replacements for builtins.compile/exec/eval aimed at preventing code from accessing the surrounding Python environment. We could potentially use one or both parts of this library directly.
  • smolagents implements a meta-circular evaluator of Python ast.ASTs for a subset of valid Python code. Using or adapting that code would be straightforward and it could have applications beyond the LLM module. A related idea would be to implement a custom interpreter at the level of CPython VM bytecode rather than ast.AST, as in Add a disassembler for reconstituting compiled generator expressions #288
  • Type annotations (Make __signature__ a lazily computed property #451), doctests (Support and exploit doctests in higher-order Template specifications #433) and string formatting may sneakily trigger calls to builtins.eval even inside a meta-circular evaluator. We might want to do something about these edge cases specifically, e.g. scrubbing annotations from generated code after typechecking.
  • progent implements a tiny DSL for tool access policies that is not unrelated to our discussion of effect types in Support effect type annotations #448
  • All of the above are distinct from and complementary to sandboxing, where we run a Python process or subprocess in a fresh, isolated OS-level environment (e.g. a Docker container) or even a remote machine (e.g. AWS Lambda). Probably any deployed LLM application that is both connected to the outside world and executing generated code should be running in a container, something we could attempt to heuristically detect and warn a user about. However, deeply nesting sandboxes at the level of individual Template calls seems clunky and challenging to debug without much additional security upside.

None of these approaches comes with strong security guarantees in adversarial environments, but together they should be enough to keep our own LLM applications from blowing up themselves or their machines, and they are automatic and lightweight enough that they can provide rapid feedback to an LLM when proposed output violates their constraints.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions