PRAXEN
agent behavior verifier
Deep Agents CLI Analysis Report
Completed May 29, 2026
7Findings
3High
4Medium
RAISE maturity 2.15 / 5.0
Executive Summary
Agent Remit (as declared)
Deep Agents CLI is a local developer command-line tool that scaffolds, bundles, and ships a `deepagents` agent for deployment via three subcommands — `init`, `dev`, and `deploy` — running the bundle on a local `langgraph dev` server or pushing it to a managed deployment platform (LangSmith/LangGraph Platform) under the developer's own credentials. Its central promise is that the bundle faithfully and only reflects the developer's reviewed project: assembled solely from declared sources, with no surface added or exposed that the developer did not configure. The remit requires an explicit operator-confirmation gate before any deploy that would leave the API open without authentication, mandates that remote MCP servers be reached over TLS and pinned to a known-good version, and forbids credentials from ever being written, committed, logged, or folded into the seeded payload.
Behavior Summary (as observed)
The dominant pattern is *operative controls with honest, scoped gaps in the deploy path* — the bundler genuinely assembles only declared sources, skips `.env` from the seed payload, and validates config before shipping, but three guarantees the remit makes are enforced only partially. The headline gap is a conditional safety gate: the anonymous-open-API confirmation in `commands._deploy()` fires only when `[frontend].enabled`, so an anonymous-auth deploy with no frontend generates the permissive `auth.py` and ships an open API with no warning and no prompt. Alongside it, `_validate_mcp_for_deploy()` validates MCP transport type but never checks that http/sse URLs use TLS, and remote MCP servers are carried into the bundle with no version pin or integrity check — both direct divergences from Action-Boundary and Known-Good-Baseline rules. Underneath, the deploy tooling installs no logging and ships no red-team or security-disclosure artifacts, and the in-repo threat model is explicitly stale (pre-0.1.0 split).
Scope of Analysis
Python package `deepagents_cli` whose in-scope surface is the `deploy` module: `commands.py` (the `init`/`dev`/`deploy` handlers), `bundler.py` (assembles the build directory and `_seed.json`), `config.py` (parses and validates `deepagents.toml`), `templates.py` (string templates for the generated `deploy_graph.py`, `auth.py`, `pyproject.toml`, MCP loader), and `context_hub.py` (LangSmith Hub backend). `config.py:validate()` rejects unknown TOML sections and validates MCP transport, and `bundler._build_seed()` explicitly excludes `.env` from the seed payload. The anonymous-auth confirmation prompt lives in `commands._deploy()` but is guarded by a frontend-enabled condition; `_validate_mcp_for_deploy()` checks transport type but not URL scheme; no `logging` handler is ever configured; and the root `.mcp.json` declares two remote HTTP MCP servers (langchain docs/reference) with no version pin.
Remit Coverage

Every actionable rule in the Worker Remit, checked against the running code. Gap = declared but unenforced; Partial = enforced but incomplete or bypassable; Vague Policy = too imprecise to verify.

Verified: 7 Gap: 1 Partial: 6 Vague Policy: 0 Enforcement Not Possible: 0 Total Rules: 14
Rule ID Section Rule (quoted) Status Finding
R-01 Action Boundaries "The CLI MUST bundle only from the project's declared sources — its configuration, system prompt, skills, declared MCP servers, and declared subagents — and MUST NOT pull in undeclared content." Verified
R-02 Approval Requirements "A deploy that opens the deployed API without authentication MUST be confirmed by the operator before it runs; it MUST NOT proceed unattended." Partial PRAX-2026-05-29-001
R-03 Action Boundaries "Remote MCP servers carried into the bundle MUST be reached over TLS, and their transport configuration MUST be validated before the bundle is shipped." Partial PRAX-2026-05-29-002
R-04 Action Boundaries "Project configuration MUST be validated before bundling, and a bundle that fails validation MUST NOT be deployed." Verified
R-05 Forbidden Actions "Credentials — deployment-platform keys, model-provider API keys, hub and tracing tokens, frontend auth secrets — MUST NOT be written into the project, committed to version control, logged, or printed." Verified
R-06 Forbidden Actions "Secret material MUST NOT be embedded into bundle artifacts that are not meant to carry it — environment files travel as environment files, never folded into the seeded-memory or skills payload." Verified
R-07 Forbidden Actions "The CLI MUST NOT silently mutate, generate, or deploy any agent surface the developer did not declare in the project." Partial PRAX-2026-05-29-001
R-08 Behavioral Expectations "Before a deploy, the CLI MUST present the operator a clear summary of what the bundle contains and where it will be shipped — enough for the developer to recognise the surface being deployed." Verified
R-09 Behavioral Expectations "A dry run MUST generate the deployment artifacts without shipping them or contacting the deployment platform's mutating endpoints." Partial PRAX-2026-05-29-006
R-10 Escalation and Limits "The project SHOULD publish a threat model and a security-disclosure process, and SHOULD keep the threat model current with what the package actually ships — confirming that the bundler carries only declared sources, that secrets never land in the seeded payload, and that the unauthenticated-deploy confirmation gate genuinely fires." Partial PRAX-2026-05-29-005
R-11 Known Good Baseline "Dependencies MUST be version-controlled with a committed, pinned lockfile, pinned to compatible ranges, and the dependency tree kept small and reviewable." Verified
R-12 Known Good Baseline "The deployment bundle's own dependency set MUST be derived only from the project's declared model provider, declared MCP usage, declared sandbox provider, and declared auth provider — never hand-edited into the bundle out of band." Partial PRAX-2026-05-29-004
R-13 Known Good Baseline "Remote MCP servers declared in the project MUST be pinned to a known-good, integrity-checked version, so the deployed agent does not auto-install an unpinned server afresh." Gap PRAX-2026-05-29-003
R-14 Behavioral Expectations "The CLI operates under direct developer supervision as a one-shot command; it MUST NOT run as an unattended background service." Verified
Findings Register

Findings, ordered by severity — each linked to its remit rule, evidence, and a recommended action. Tag chips jump to the relevant entry in the RAISE framework, the OWASP LLM Top 10, or the OWASP Agentic Top 10.

HIGH PRAX-2026-05-29-001 The anonymous-open-API confirmation gate fires only when a frontend is configured, so an anonymous-auth deploy with no frontend ships an open API silently.
Policy Rule — R-02, R-07 (Worker Remit):
"A deploy that opens the deployed API without authentication MUST be confirmed by the operator before it runs; it MUST NOT proceed unattended. / The CLI MUST NOT silently mutate, generate, or deploy any agent surface the developer did not declare in the project."
libs/cli/deepagents_cli/deploy/commands.py:287 — is_anonymous requires config.frontend.enabled — lines 287-293; anonymous auth without a frontend bypasses the warning+confirm block at 293-326 entirely. libs/cli/deepagents_cli/deploy/bundler.py:246 — auth.py is written whenever auth_provider is not None (line 246), so the permissive AUTH_BLOCK_ANONYMOUS handler ships even when no frontend triggered the confirm.
Recommended Action
  • In `commands._deploy()` compute the anonymous-API condition from `config.auth.provider == "anonymous"` alone (drop the `frontend.enabled` requirement) so the warning and confirmation fire for every anonymous deploy.
  • Add a unit test asserting that an anonymous-auth, no-frontend deploy raises the confirmation prompt and aborts on a non-`y` answer.
HIGH PRAX-2026-05-29-002 Deploy validates MCP transport type but never checks that http/sse MCP endpoints use TLS, so a plaintext http:// MCP server passes validation and ships.
Policy Rule — R-03 (Worker Remit):
"Remote MCP servers carried into the bundle MUST be reached over TLS, and their transport configuration MUST be validated before the bundle is shipped."
libs/cli/deepagents_cli/deploy/config.py:293 — _validate_mcp_for_deploy loops servers checking only `transport == "stdio"` (lines 293-299); no check that server_config["url"] starts with https://. libs/cli/deepagents_cli/deploy/templates.py:446 — _load_mcp_tools accepts any http/sse url via _expand(cfg["url"]) (lines 446-466) with no TLS scheme guard before MultiServerMCPClient connects.
Recommended Action
  • In `_validate_mcp_for_deploy()`, after expanding `${VAR}` references, reject any http/sse server whose `url` does not begin with `https://` (allowing `http://127.0.0.1`/`localhost` only if a local-loopback exception is intended).
  • Add a unit test that a plaintext `http://` MCP url produces a validation error and aborts the deploy.
HIGH PRAX-2026-05-29-003 Remote MCP servers are copied into the bundle by URL only, with no version pin or integrity check, so the deployed agent resolves an unpinned server at runtime.
Policy Rule — R-13 (Worker Remit):
"Remote MCP servers declared in the project MUST be pinned to a known-good, integrity-checked version, so the deployed agent does not auto-install an unpinned server afresh."
.mcp.json:2 — Two remote http MCP servers declared as bare urls (docs-langchain, reference-langchain) — no version, digest, or pin field. libs/cli/deepagents_cli/deploy/bundler.py:194 — mcp.json copied verbatim to _mcp.json (lines 194-197) with no integrity or version processing before it ships in the bundle.
Recommended Action
  • Require declared remote MCP servers to carry a pinned, integrity-checkable reference (e.g. a content digest or version field) and verify it in `_validate_mcp_for_deploy()` before copying into the bundle.
  • Record the resolved MCP server identity/digest in the bundle summary so the operator sees exactly which MCP surface is being shipped.
MEDIUM PRAX-2026-05-29-004 The generated bundle's inferred dependencies are emitted as bare unpinned package names with no committed bundle lockfile.
Policy Rule — R-12 (Worker Remit):
"The deployment bundle's own dependency set MUST be derived only from the project's declared model provider, declared MCP usage, declared sandbox provider, and declared auth provider — never hand-edited into the bundle out of band."
libs/cli/deepagents_cli/deploy/bundler.py:480 — deps list built from bare provider/sandbox/auth package names (lines 480-512) with no version specifier appended. libs/cli/deepagents_cli/deploy/templates.py:1129 — PYPROJECT_TEMPLATE pins only deepagents==0.5.3; {extra_deps} are interpolated as unpinned strings and no lockfile is generated for the bundle.
Recommended Action
Pin each inferred dependency in `_render_pyproject()` to a tested compatible range and emit (or generate) a lockfile alongside the bundle's `pyproject.toml`.
MEDIUM PRAX-2026-05-29-005 The deploy tooling installs no logging and the bundled threat model is stale, leaving deploy actions recorded only as transient print output.
Policy Rule — R-10 (Worker Remit):
"The project SHOULD publish a threat model and a security-disclosure process, and SHOULD keep the threat model current with what the package actually ships — confirming that the bundler carries only declared sources, that secrets never land in the seeded payload, and that the unauthenticated-deploy confirmation gate genuinely fires."
libs/cli/deepagents_cli/deploy/commands.py:572 — Deploy progress reported via print() ("Deploying to LangSmith Deployments...", lines 572-582); no logging handler is configured anywhere in deepagents_cli. libs/cli/THREAT_MODEL.md:5 — Banner (lines 5-13) states the doc predates the 0.1.0 split and its REPL/MCP/sandbox sections no longer apply — stale versus what the package now ships.
Recommended Action
  • Configure a structured logger (JSON or key-value, file or stderr handler) for the deploy path that records each bundle and deploy action with timestamp, agent name, auth mode, and destination.
  • Regenerate `THREAT_MODEL.md` against the current `deepagents_cli.deploy` surface and add a SECURITY.md disclosure process.
MEDIUM PRAX-2026-05-29-006 The local `dev` command seeds the remote LangSmith hub repo before starting the local server, contacting a mutating platform endpoint during a nominally local run.
Policy Rule — R-09 (Worker Remit):
"A dry run MUST generate the deployment artifacts without shipping them or contacting the deployment platform's mutating endpoints."
libs/cli/deepagents_cli/deploy/commands.py:415 — _dev calls _seed_hub_repo(config, build_dir) (line 415) before the langgraph dev launch, with no dry-run/local-only guard. libs/cli/deepagents_cli/deploy/commands.py:533 — _seed_hub_repo performs backend.upload_files(batch) (line 533) — a mutating commit to the remote LangSmith Hub repo, default backend="hub".
Recommended Action
Gate `_seed_hub_repo()` in `_dev()` behind an explicit opt-in (or route `dev` to the local store backend) so a local development run does not create or mutate a remote hub repo by default.
MEDIUM PRAX-2026-05-29-007 No adversarial-testing artifact or security-scanning CI exists for the deploy package; testing is functional only.
libs/cli/tests/unit_tests/deploy — Test suite asserts seed/config/auth-present shapes only; no test covers the anonymous-no-frontend open-API path or a non-TLS MCP url. .github/workflows — Workflows cover lint/test/lockfile/version checks; no codeql/bandit/semgrep/pip-audit security-scanning job present.
Recommended Action
Add negative-path tests for the security gates (anonymous-no-frontend confirmation, non-TLS MCP rejection, unpinned-MCP rejection) and wire a static-analysis/dependency-audit job into CI.
What's Working Well

Controls and behaviors that are correctly implemented and verified during this scan. These represent areas where the agent's implementation aligns with its stated policy and security best practices.

Bundle assembled solely from declared project sources

`bundler._build_seed` and `bundle()` read only the canonical declared layout (AGENTS.md, skills/, mcp.json, user/, subagents/) and never pull in undeclared content, satisfying the CLI's core declared-sources-only guarantee.

libs/cli/deepagents_cli/deploy/bundler.py:328

Secrets excluded from the seeded-memory payload

The bundler builds `_seed.json` from memory and skills only and copies `.env` as a standalone file, so model/platform credentials never get folded into the seeded-memory or skills payload.

libs/cli/deepagents_cli/deploy/bundler.py:199

Config validated before any bundle is produced

`DeployConfig.validate` checks required files, MCP transport, sandbox/auth/memories settings, and credentials, and `_deploy`/`_dev` abort on any error before bundling — a failing config is never shipped.

libs/cli/deepagents_cli/deploy/config.py:204

Committed pinned lockfile and dependency-hygiene CI for the CLI

The CLI ships a committed `uv.lock` and the repo runs dependabot plus `check_lockfiles` and `check_sdk_pin` workflows, keeping the CLI's own dependency tree pinned and reviewable.

libs/cli/uv.lock:1

Pre-deploy bundle summary presented to the operator

`print_bundle_summary` shows the agent name, model, auth mode (explicitly flagging "anonymous (API open to anyone)"), seeded files, MCP presence, sandbox, and destination before a deploy proceeds.

libs/cli/deepagents_cli/deploy/bundler.py:520
Discovered Log Files

Log files found in the agent's workspace during this scan. Reviewing these files provides runtime evidence to complement the static analysis above.

No logging handler is configured anywhere in `deepagents_cli` (no basicConfig/FileHandler/dictConfig); deploy progress is emitted only via print(), and the generated bundle ships no logging config — see PRAX-2026-05-29-005.
OWASP LLM Top 10 (2025) Coverage

Each card represents one category and shows the top 3 findings. All items in the Findings section.

OWASP Agentic Top 10 (2026) Coverage

Each card represents one category and shows the top 3 findings. All items in the Findings section.

RAISE Maturity Posture

Overall maturity assessment across the six categories of the RAISE framework. This is a maturity model, not a school grade: a score of 3 / 5 means Established, not 60 percent. Most production AI agents today score between Ad hoc (1) and Established (3). See the full RAISE framework reference for the complete scale and scoring.

2.15 / 5.0
Weighted Maturity Score · Partial
Partial. Deep Agents CLI has a coherent, narrowly-scoped deploy surface with several real controls that run on the agent's path — declared-source-only bundling, a secrets-excluded seed payload, pre-bundle config validation, a committed pinned lockfile for the CLI itself, and an anonymous-deploy confirmation prompt — so it is not a hobby-grade target. But its strongest remit promises are enforced only partially: the open-API confirmation gate is conditional on a frontend being configured, MCP TLS is never checked, and the deployed bundle's own dependencies (and the remote MCP servers it carries) ship unpinned. Zero Trust and the testing/monitoring categories carry the weight: no logging is installed, there is no adversarial-testing or security-disclosure evidence, and the bundled threat model is stale.
Limit Your Domain
3/ 5
Confidence: High  |  Weight: 15%  |  Weighted: 0.45
The CLI exposes exactly three subcommands (`init`/`dev`/`deploy`) with bare invocation redirecting to the separate `deepagents-code` package, and `config._parse_config` rejects unknown TOML sections and keys, so the deploy surface is tightly and code-enforced scoped to the remit's Known Good Baseline.
Balance Your Knowledge Base
3/ 5
Confidence: Medium  |  Weight: 15%  |  Weighted: 0.45
The CLI does not ingest external content into an LLM context — it bundles only the project's declared sources — and `bundler._build_seed` deliberately omits `.env` so secrets never enter the seeded memory payload; the residual concern is that declared MCP/skills content is carried verbatim and the in-repo threat model is stale.
Implement Zero Trust
2/ 5
Confidence: High  |  Weight: 25%  |  Weighted: 0.50
`config.validate` runs before any bundle and the anonymous-deploy confirmation prompt exists, but the prompt is gated on `[frontend].enabled` (so an anonymous deploy with no frontend ships an open API silently) and `_validate_mcp_for_deploy` checks transport type without ever verifying http/sse URLs use TLS — real controls with exploitable gaps.
Manage Your Supply Chain
3/ 5
Confidence: High  |  Weight: 15%  |  Weighted: 0.45
The CLI ships a committed `uv.lock` plus dependabot and `check_lockfiles`/`check_sdk_pin` CI, but the generated bundle's `pyproject.toml` adds model/sandbox/auth partner deps as bare unpinned names with no bundle lockfile, and remote MCP servers are carried into the bundle with no version pin or integrity check.
Build an AI Red Team
1/ 5
Confidence: Medium  |  Weight: 15%  |  Weighted: 0.15
Functional unit and integration tests exist for the bundler, config, and hub backend, but there is no adversarial-testing artifact, no security-scanning CI workflow (no codeql/bandit/semgrep/pip-audit), no SECURITY.md disclosure process, and the bundled `THREAT_MODEL.md` is explicitly stale (pre-0.1.0 split).
Monitor Continuously
1/ 5
Confidence: High  |  Weight: 15%  |  Weighted: 0.15
Modules obtain `logging.getLogger(__name__)` loggers but no `basicConfig`, `FileHandler`, or `dictConfig` is ever installed by the deploy tooling, and the generated bundle ships no logging configuration — deploy progress is conveyed only through `print()` statements, leaving no durable, structured action record.

Maturity Scoring Rubric

Every score above is based on this scale. A score is a snapshot of observable posture — not a verdict on the people or team behind the system.

Score Label Meaning
5 Exemplary Best-in-class; automated, continuously tested, reference quality. Rarely achieved in shipping systems.
4 Strong Comprehensive controls, active management, minor gaps. Production-ready.
3 Established Documented controls consistently applied; known gaps accepted. A respectable baseline.
2 Partial Some controls exist but coverage is incomplete; key gaps remain.
1 Ad hoc Informal or inconsistent measures; relies on individual judgment.
0 Absent No evidence this category is addressed at all.
Weighting: the weighted overall above is the sum of each category's score × weight (the per-category weights are shown on each card). Zero Trust carries double weight by design; see the RAISE framework reference for the rationale.