-
Notifications
You must be signed in to change notification settings - Fork 3
Description
When a Template generates higher-order objects like Callables or types that require some amount of Python evaluation to decode, we should do something a little less dangerous and error-prone than naively invoking Python's builtins.eval/exec on an arbitrary code string in the original application context.
There's no way to guarantee that any nontrivially expressive form of LLM-generated Python code execution is safe, but there are a few things we could look into that might help mitigate the worst failure modes mostly automatically:
- We temporarily removed the synthesis functionality in
handlers.llmin part for this reason (handlers.llm.synthesisshould be replaced with a stub in initial release #470). When it's added back evaluation should still only happen when a specific handler (or at least environment variable) is active, not by default. RestrictedPythonimplements a more comprehensive and configurable suite of analyses and transformations at the level ofast.ASTobjects as well as drop-in replacements forbuiltins.compile/exec/evalaimed at preventing code from accessing the surrounding Python environment. We could potentially use one or both parts of this library directly.smolagentsimplements a meta-circular evaluator of Pythonast.ASTs for a subset of valid Python code. Using or adapting that code would be straightforward and it could have applications beyond the LLM module. A related idea would be to implement a custom interpreter at the level of CPython VM bytecode rather thanast.AST, as in Add a disassembler for reconstituting compiled generator expressions #288- Type annotations (Make
__signature__a lazily computed property #451), doctests (Support and exploit doctests in higher-orderTemplatespecifications #433) and string formatting may sneakily trigger calls tobuiltins.evaleven inside a meta-circular evaluator. We might want to do something about these edge cases specifically, e.g. scrubbing annotations from generated code after typechecking. progentimplements a tiny DSL for tool access policies that is not unrelated to our discussion of effect types in Support effect type annotations #448- All of the above are distinct from and complementary to sandboxing, where we run a Python process or subprocess in a fresh, isolated OS-level environment (e.g. a Docker container) or even a remote machine (e.g. AWS Lambda). Probably any deployed LLM application that is both connected to the outside world and executing generated code should be running in a container, something we could attempt to heuristically detect and warn a user about. However, deeply nesting sandboxes at the level of individual
Templatecalls seems clunky and challenging to debug without much additional security upside.
None of these approaches comes with strong security guarantees in adversarial environments, but together they should be enough to keep our own LLM applications from blowing up themselves or their machines, and they are automatic and lightweight enough that they can provide rapid feedback to an LLM when proposed output violates their constraints.