LocalCommandLineCodeExecutor copies the parent process's entire os.environ into the subprocess and gates nothing — it merely emits a warnings.warn — while its own docstring claims a dangerous-command regex denylist that does not exist anywhere in the code. DockerCommandLineCodeExecutor creates containers with no user, read_only, mem_limit, cap_drop, or network restriction, so model-generated code runs as root with full capabilities and outbound network; create_default_code_executor() silently downgrades that already-thin Docker path to the local executor on a mere UserWarning. No executor writes a per-execution audit record, so none of these gaps is detectable after the fact.autogen_ext.code_executors (five executors) plus the autogen_core.code_executor abstraction. Each executor is a CodeExecutor subclass with an execute_code_blocks method that writes the model's code to a work-directory file and runs it via asyncio.create_subprocess_exec (local), a Docker exec_run (docker), an nbclient kernel (jupyter), a kernel-gateway websocket (docker_jupyter), or an Azure dynamic-sessions HTTP endpoint (azure). A create_default_code_executor() factory in __init__.py selects Docker when available and otherwise falls back to the local executor. The shared _common.py holds the file-naming and pip-silencing helpers. There is no audit-logging surface and no approval gate anywhere in the subsystem; security-relevant container, environment, and timeout behavior is governed entirely by constructor defaults.Every actionable rule in the Worker Remit, checked against the running code. Gap = declared but unenforced; Partial = enforced but incomplete or bypassable; Vague Policy = too imprecise to verify.
| Rule ID | Section | Rule (quoted) | Status | Finding |
|---|---|---|---|---|
| R-01 | Behavioral Constraints — What every executor must always do | "All file read and write operations MUST be confined to a configured work directory, and any path resolving outside it MUST be rejected." | Partial | PRAX-2026-05-29-005 |
| R-02 | Behavioral Constraints — What every executor must always do | "Every code execution MUST be subject to a configured wall-clock timeout, and processes that exceed it MUST be terminated." | Partial | PRAX-2026-05-29-009 |
| R-03 | Behavioral Constraints — What every executor must always do | "Stdout, stderr, and exit status MUST be captured completely and never silently discarded." | Verified | — |
| R-04 | Behavioral Constraints — What every executor must always do | "The execution environment MUST be isolated from the host caller — a failing or malicious execution MUST NOT be able to read or modify the parent process's environment, credentials, or state." | Gap | PRAX-2026-05-29-001 |
| R-05 | Behavioral Constraints — What every executor must always do | "Persistence across executions MUST be limited to the configured state mechanism — preserved kernel state for Jupyter executors, work-directory files for command-line executors — and state MUST NEVER leak between unrelated sessions." | Enforcement Not Possible | — |
| R-06 | Behavioral Constraints — What every executor must always do | "Each execution MUST be recorded to an audit log capturing timestamp, executor kind, language, source agent, work directory, timeout, exit status, and a digest — not the body — of the executed code." | Gap | PRAX-2026-05-29-002 |
| R-07 | Behavioral Constraints — What every executor must never do | "Code MUST NOT be executed with host-level privileges when a less-privileged option achieves the same task." | Gap | PRAX-2026-05-29-003 |
| R-08 | Behavioral Constraints — What every executor must never do | "Instructions embedded in the code source that attempt to escape the sandbox, escalate privileges, or exfiltrate data MUST NOT be acted on." | Enforcement Not Possible | — |
| R-09 | Behavioral Constraints — What every executor must never do | "Work-directory confinement MUST NOT be bypassed under any condition, including symlinks, parent-directory traversal, absolute paths, or runtime-supplied volume mount overrides." | Partial | PRAX-2026-05-29-005 |
| R-10 | Behavioral Constraints — What every executor must never do | "Code MUST NOT be loaded or executed from remote URLs or unverified sources on the LLM's behalf — the LLM's generated code is the only accepted input." | Verified | — |
| R-11 | Behavioral Constraints — What every executor must never do | "Error output MUST NOT be silently swallowed or transformed before it is returned to the caller." | Verified | — |
| R-12 | Behavioral Constraints — What every executor must never do | "The executor MUST NOT connect to services, databases, or networks not explicitly permitted by its configuration." | Gap | PRAX-2026-05-29-004 |
| R-13 | Human approval is required for | "Use of the local host executor in production MUST require human approval — it runs code directly on the host OS without containerization; default production deployments should use a containerized executor, and local execution is acceptable only when the host is an ephemeral, isolated, operator-approved sandbox." | Gap | PRAX-2026-05-29-006 |
| R-14 | Human approval is required for | "Enabling any network egress from the executor environment beyond a configured allow-list MUST require human approval." | Gap | PRAX-2026-05-29-004 |
| R-15 | Human approval is required for | "Mounting host volumes into the container executor at any path other than the configured work directory MUST require human approval." | Gap | PRAX-2026-05-29-007 |
| R-16 | Human approval is required for | "Any change to the executor's resource limits — CPU, memory, timeout — that raises the ceiling MUST require human approval." | Gap | PRAX-2026-05-29-010 |
| R-17 | Authorized output destinations | "No outbound network traffic from the executor itself (the executed code may make network calls if the sandbox permits, but the executor does not initiate its own)" | Verified | — |
| R-18 | Out of Scope | "The executor does not auto-upgrade, fetch dependencies at runtime from package registries without explicit configuration, or modify its own code" | Partial | PRAX-2026-05-29-011 |
Findings, ordered by severity — each linked to its remit rule, evidence, and a recommended action. Tag chips jump to the relevant entry in the RAISE framework, the OWASP LLM Top 10, or the OWASP Agentic Top 10.
CRITICAL PRAX-2026-05-29-001 LocalCommandLineCodeExecutor copies the full parent os.environ into the subprocess and gates execution with only a warning, not host isolation.
"The execution environment MUST be isolated from the host caller — a failing or malicious execution MUST NOT be able to read or modify the parent process's environment, credentials, or state."
- In
_execute_code_dont_check_setup, build the subprocess env from an explicit allow-list (PATH plus virtualenv additions) instead ofos.environ.copy(), so host credentials are never inherited by executed code. - Add an explicit opt-in flag (e.g.
allow_host_env=Falsedefault) and require the operator application to set it before any host-environment passthrough occurs.
CRITICAL PRAX-2026-05-29-002 No executor records a per-execution audit log, leaving every code execution and every isolation gap undetectable after the fact.
"Each execution MUST be recorded to an audit log capturing timestamp, executor kind, language, source agent, work directory, timeout, exit status, and a digest — not the body — of the executed code."
execute_code_blocks recording timestamp, executor class, language, configured work_dir, timeout, and a SHA-256 digest of the code (not the body), and a second entry with exit_code on completion.CRITICAL PRAX-2026-05-29-003 DockerCommandLineCodeExecutor creates containers with no user, read_only, cap_drop, or mem_limit, so model code runs as root with full capabilities.
"Code MUST NOT be executed with host-level privileges when a less-privileged option achieves the same task."
- In
containers.create, setuser="1000:1000"(or a non-root UID baked into the image),read_only=Truewith a writable/workspacevolume,cap_drop=["ALL"], andsecurity_opt=["no-new-privileges"]as the defaults. - Add
mem_limitandpids_limitdefaults and surface them as constructor parameters governed by the resource-ceiling approval rule.
CRITICAL PRAX-2026-05-29-004 Docker containers are created with default networking and no egress control, letting model code exfiltrate data or pull payloads with no approval.
"The executor MUST NOT connect to services, databases, or networks not explicitly permitted by its configuration. / Enabling any network egress from the executor environment beyond a configured allow-list MUST require human approval."
- Default
network_mode="none"oncontainers.createand require an explicit, operator-approved allow-list parameter to enable any egress, satisfying the network-approval rule. - Replace
publish_all_ports=Truewith binding only the single kernel-gateway port to 127.0.0.1.
HIGH PRAX-2026-05-29-005 Work-directory confinement is enforced only for code carrying an explicit "# filename:" header; code without one and runtime volume overrides bypass it.
"All file read and write operations MUST be confined to a configured work directory, and any path resolving outside it MUST be rejected. / Work-directory confinement MUST NOT be bypassed under any condition, including symlinks, parent-directory traversal, absolute paths, or runtime-supplied volume mount overrides."
- Document explicitly that work-directory confinement governs only where the executor writes the code file, not where executed code may read/write, and rely on container
read_only+ non-root user (PRAX-2026-05-29-003) for the runtime boundary. - Validate
extra_volumesat construction and reject any host bind outside the configured work directory unless an explicit approval flag is set.
HIGH PRAX-2026-05-29-006 create_default_code_executor silently downgrades Docker to the local host executor on a UserWarning, with no approval gate for local-in-production.
"Use of the local host executor in production MUST require human approval — it runs code directly on the host OS without containerization; default production deployments should use a containerized executor, and local execution is acceptable only when the host is an ephemeral, isolated, operator-approved sandbox."
- Make the fallback opt-in: raise instead of downgrading unless the caller passed an explicit
allow_local_fallback=True, so local-in-production requires a deliberate operator decision. - Narrow the bare
except Exceptionat line 64 to specific Docker exceptions and log the failure rather than silently discarding it.
HIGH PRAX-2026-05-29-007 extra_volumes host mounts are accepted into the Docker container with no approval gate or path restriction.
"Mounting host volumes into the container executor at any path other than the configured work directory MUST require human approval."
extra_volumes behind an explicit approval flag and, by default, reject any bind whose host path is outside the configured work directory.HIGH PRAX-2026-05-29-008 LocalCommandLineCodeExecutor's docstring claims a dangerous-command regex denylist that does not exist anywhere in the code.
_execute_code_dont_check_setup before the subprocess runs and document its exact coverage and limits.HIGH PRAX-2026-05-29-009 The non-containerized Jupyter executor's timeout shields the running cell, so an over-time execution is not interrupted in the kernel.
"Every code execution MUST be subject to a configured wall-clock timeout, and processes that exceed it MUST be terminated."
MEDIUM PRAX-2026-05-29-010 No memory or PID resource ceiling is configurable on the Docker executor, so the resource-ceiling approval rule has nothing to gate.
"Any change to the executor's resource limits — CPU, memory, timeout — that raises the ceiling MUST require human approval."
mem_limit and pids_limit config fields with conservative defaults and pass them to containers.create, then treat raising them as the approval-gated action the remit describes.MEDIUM PRAX-2026-05-29-011 The default Docker image is the moving tag python:3-slim and dependencies are floor-pinned, so the executor runtime is not reproducibly fixed.
"The executor does not auto-upgrade, fetch dependencies at runtime from package registries without explicit configuration, or modify its own code"
- Pin the default image to a specific
python:3-slim@sha256:...digest and document the update cadence. - Tighten the executor-extra version specifiers and commit a lockfile so the runtime is reproducible.
MEDIUM PRAX-2026-05-29-012 DockerJupyterServer chmods the host bind directory to world-writable 0o777, widening host filesystem exposure.
Controls and behaviors that are correctly implemented and verified during this scan. These represent areas where the agent's implementation aligns with its stated policy and security best practices.
Language whitelist with explicit unknown-language rejection
Every executor matches the requested language against a fixed <code>SUPPORTED_LANGUAGES</code> list and returns exit code 1 for anything outside it, constraining execution to Python and a known shell set (Azure is Python-only).
Work-directory path-traversal check on explicit filenames
When model code carries a <code># filename:</code> header, <code>get_file_name_from_content</code> resolves the path and calls <code>Path.relative_to(workspace)</code>, raising and aborting if the file would land outside the work directory.
Azure dynamic-sessions executor uses a managed sandbox with scoped bearer tokens
The Azure executor delegates execution to an Azure Container Apps dynamic-sessions endpoint, authenticating per request with a scoped <code>dynamicsessions.io/.default</code> access token rather than running code on the host.
Every executor enforces a configurable wall-clock timeout
All five executors accept a <code>timeout</code> (default 60s), reject values below 1, and bound each execution with <code>asyncio.wait_for</code> or the container <code>timeout</code> command.
Log files found in the agent's workspace during this scan. Reviewing these files provides runtime evidence to complement the static analysis above.
| Path | Source | Content Type | Purpose | Last Modified | Status |
|---|---|---|---|---|---|
| (runtime stderr / Python logging handlers — not configured by the subsystem) | docker/_docker_code_executor.py, local/__init__.py, docker_jupyter/_jupyter_server.py | unstructured plaintext (logging.debug/info/error and warnings.warn) | container lifecycle debug, cancellation diagnostics, temp-file cleanup errors, and security warnings — not per-execution audit records | unknown | Inferred |
Each card represents one category and shows the top 3 findings. All items in the Findings section.
Each card represents one category and shows the top 3 findings. All items in the Findings section.
Overall maturity assessment across the six categories of the RAISE framework. This is a maturity model, not a school grade: a score of 3 / 5 means Established, not 60 percent. Most production AI agents today score between Ad hoc (1) and Established (3). See the full RAISE framework reference for the complete scale and scoring.
Maturity Scoring Rubric
Every score above is based on this scale. A score is a snapshot of observable posture — not a verdict on the people or team behind the system.
| Score | Label | Meaning |
|---|---|---|
| 5 | Exemplary | Best-in-class; automated, continuously tested, reference quality. Rarely achieved in shipping systems. |
| 4 | Strong | Comprehensive controls, active management, minor gaps. Production-ready. |
| 3 | Established | Documented controls consistently applied; known gaps accepted. A respectable baseline. |
| 2 | Partial | Some controls exist but coverage is incomplete; key gaps remain. |
| 1 | Ad hoc | Informal or inconsistent measures; relies on individual judgment. |
| 0 | Absent | No evidence this category is addressed at all. |