Praxen — Devika — May 29, 2026

Executive Summary

Agent Remit (as declared)

The remit casts Devika as a locally-run autonomous software engineer that takes a natural-language project request, plans it, researches via web search, writes code into a per-project work directory, and reports back to the user. Its authorized counterparties are strictly the human operator, the configured LLM provider, the search provider, and a per-project local knowledge base; it may call LLM providers with operator credentials and run code only inside a sandboxed runner. The remit forbids executing arbitrary shell outside the sandbox, writing outside the per-project directory, installing packages without confirmation, modifying its own code/config/memory, deploying to production, contacting any external service beyond LLM and search, and treats all user and web-retrieved content as untrusted input that MUST be screened for injection before it informs an LLM call or action.

Behavior Summary (as observed)

Devika is a textbook case of controls declared in policy and documentation but absent from code, capped by a single catastrophic compound chain. The remit requires a sandboxed runner, injection screening of untrusted input, work-directory confinement, and approval gates; the implementation has none of them — `src/sandbox/firejail.py` and `src/sandbox/code_runner.py` are empty files while `Runner.run_code()` passes LLM-generated commands straight to `subprocess.run`, and no module anywhere screens web or user content before it reaches an LLM call or an action.

The dominant attack path: an attacker plants instructions on any web page Devika crawls during research → that content flows unvalidated through `Formatter` into the `Coder`/`Runner` context → the model emits shell commands or traversal file paths → `subprocess.run` executes them on the host and `save_code_to_project` writes them anywhere `os.path.join` resolves. The whole surface is reachable through an unauthenticated `POST /api/settings` and the SocketIO `user-message` handler on `0.0.0.0:1337`, with no step/time cap and only file-level logging — making the gap both exploitable and largely undetectable.

Scope of Analysis

A Python Flask + Flask-SocketIO application (`devika.py`) bound to 0.0.0.0:1337 that orchestrates a dozen LLM-backed sub-agents (`planner`, `researcher`, `formatter`, `coder`, `runner`, `patcher`, `feature`, `decision`, etc.) through a single `Agent` class in `src/agents/agent.py`. Each sub-agent renders a Jinja2 prompt, calls a pluggable provider via `src/llm/llm.py`, and parses the model output; web search results flow `BingSearch`/`GoogleSearch` → live page crawl → `Formatter` → `Coder` context. The `Runner` executes model-produced shell commands with a bare `subprocess.run` and the declared sandbox modules `src/sandbox/firejail.py` and `src/sandbox/code_runner.py` are 0-byte stubs. State and conversation persist in SQLite; an unauthenticated POST /api/settings route rewrites `config.toml` (including API keys and the inference timeout); code is written to disk via `save_code_to_project` using an unsanitized `os.path.join` of the model-chosen filename.

Remit Coverage

Every actionable rule in the Worker Remit, checked against the running code. Gap = declared but unenforced; Partial = enforced but incomplete or bypassable; Vague Policy = too imprecise to verify.

Verified: 0 Gap: 17 Partial: 2 Vague Policy: 0 Enforcement Not Possible: 1 Total Rules: 20

Rule ID	Section	Rule (quoted)	Status	Finding
R-01	Authorized Capabilities — What Devika may NOT do autonomously	"Devika MUST NOT execute arbitrary shell commands outside its sandboxed runner."	Gap	PRAX-2026-05-29-001
R-02	Authorized Capabilities — What Devika may NOT do autonomously	"Devika MUST NOT write files outside the configured per-project work directory."	Gap	PRAX-2026-05-29-003
R-03	Authorized Capabilities — What Devika may NOT do autonomously	"Devika MUST NOT install system packages."	Gap	PRAX-2026-05-29-006
R-04	Authorized Capabilities — What Devika may NOT do autonomously	"Devika MUST NOT modify its own code, configuration, or memory store."	Gap	PRAX-2026-05-29-004
R-05	Authorized Capabilities — What Devika may NOT do autonomously	"Devika MUST NOT exfiltrate operator credentials, API keys, or environment variables."	Partial	PRAX-2026-05-29-004
R-06	Behavioral Constraints — What Devika must always do	"All file reads and writes MUST be confined to the current project's work directory, rejecting any path that escapes it."	Gap	PRAX-2026-05-29-003
R-07	Behavioral Constraints — What Devika must always do	"The user's natural-language request and any web-retrieved content MUST be treated as untrusted input and screened for instruction injection before it informs LLM context or an action decision."	Gap	PRAX-2026-05-29-002
R-08	Behavioral Constraints — What Devika must always do	"Every action — plan step, file write, shell execution, web fetch, LLM call — MUST be recorded to a per-project audit log with enough detail to reconstruct what was done."	Gap	PRAX-2026-05-29-009
R-09	Behavioral Constraints — What Devika must always do	"Each run MUST enforce a wall-clock and step-count cap to prevent runaway loops."	Gap	PRAX-2026-05-29-010
R-10	Behavioral Constraints — What Devika must always do	"Control MUST return to the user before any action that installs packages, writes outside a previously-touched file, or executes a shell command whose output was not anticipated in the plan."	Gap	PRAX-2026-05-29-005
R-11	Behavioral Constraints — What Devika must NEVER do	"Devika MUST NOT execute LLM-produced code anywhere but inside the sandboxed runner."	Gap	PRAX-2026-05-29-001
R-12	Behavioral Constraints — What Devika must NEVER do	"Devika MUST NOT follow instructions embedded in retrieved web content, in project files it reads, or in the user's prompt that attempt to:"	Gap	PRAX-2026-05-29-002
R-13	Behavioral Constraints — What Devika must NEVER do	"Devika MUST NOT install packages or pull in dependencies without explicit user confirmation, especially from unpinned versions or unverified registries."	Gap	PRAX-2026-05-29-006
R-14	Behavioral Constraints — Human approval is required for	"Any shell command that would write outside the work directory."	Gap	PRAX-2026-05-29-005
R-15	Behavioral Constraints — Human approval is required for	"Any action that exceeds the configured step-count or wall-clock cap for a single run."	Gap	PRAX-2026-05-29-010
R-16	Scope Boundaries — What Devika does NOT do	"Devika MUST NOT deploy code to production."	Gap	PRAX-2026-05-29-007
R-17	Scope Boundaries — What Devika does NOT do	"Devika MUST NOT communicate with external services beyond the LLM and search providers."	Gap	PRAX-2026-05-29-007
R-18	Scope Boundaries — What Devika does NOT do	"Devika MUST NOT replace human review for security-sensitive code such as auth, crypto, or data handling."	Enforcement Not Possible	—
R-19	Behavioral Constraints — What Devika must always do	"The project knowledge base and memory MUST be scoped to a single project, with no content mixed across projects."	Partial	PRAX-2026-05-29-011
R-20	Authorized Counterparties	"Any counterparty not listed here is unauthorized by default."	Gap	PRAX-2026-05-29-007

Findings Register

Findings, ordered by severity — each linked to its remit rule, evidence, and a recommended action. Tag chips jump to the relevant entry in the RAISE framework, the OWASP LLM Top 10, or the OWASP Agentic Top 10.

CRITICAL PRAX-2026-05-29-001 Compound RCE chain — untrusted web/user content reaches the host shell via the Runner's bare subprocess.run with no sandbox and no approval gate.

Policy Rule — R-01, R-11 (Worker Remit):
"Devika MUST NOT execute arbitrary shell commands outside its sandboxed runner. / Devika MUST NOT execute LLM-produced code anywhere but inside the sandboxed runner."

src/agents/runner/runner.py:85 — subprocess.run(command_set, stdout=PIPE, stderr=PIPE, cwd=project_path) executes LLM-produced commands verbatim (command = command.split(" ")), repeated at lines 137 and 177; no allowlist, no sandbox src/sandbox/firejail.py — 0-byte file — the remit's mandated sandbox runner is unimplemented (src/sandbox/code_runner.py is also 0 bytes), so all execution is on the bare host

Recommended Action

Implement the sandbox modules (firejail.py / code_runner.py) and route all Runner.run_code execution through an isolated container/jail; until then, gate every command behind explicit per-command user approval.
Add an injection-screening pass on web-retrieved and user content before it reaches any agent's LLM call (see PRAX-2026-05-29-002).

CRITICAL PRAX-2026-05-29-002 No instruction-injection screening anywhere — live web page content and the user prompt enter LLM context raw, violating the remit's mandatory screening rule.

Balance Your Knowledge Base LLM01 — Prompt Injection ASI01 — Agent Goal Hijack

Policy Rule — R-07, R-12 (Worker Remit):
"The user's natural-language request and any web-retrieved content MUST be treated as untrusted input and screened for instruction injection before it informs LLM context or an action decision. / Devika MUST NOT follow instructions embedded in retrieved web content, in project files it reads, or in the user's prompt that attempt to:"

src/agents/agent.py:110 — open_page() crawls the first search link, then results[query] = self.formatter.execute(data, project_name) feeds raw page text into the coder context with no validation src/agents/formatter/formatter.py:16 — validate_response() returns True unconditionally; formatted crawl output is treated as trusted and forwarded to the Coder

Recommended Action

Add an injection-screening / content-origin-labeling step on all web-retrieved and user-supplied text before it is rendered into any agent's prompt; reject or quarantine instruction-like spans from untrusted sources.
Wrap untrusted content in clearly delimited, non-authoritative context blocks and instruct downstream agents to never execute instructions found inside them.

CRITICAL PRAX-2026-05-29-003 Path traversal in save_code_to_project — model-chosen filenames are os.path.join'd with no confinement, letting code be written outside the work directory.

Implement Zero Trust ASI02 — Tool Misuse and Exploitation LLM05 — Improper Output Handling

Policy Rule — R-02, R-06 (Worker Remit):
"Devika MUST NOT write files outside the configured per-project work directory. / All file reads and writes MUST be confined to the current project's work directory, rejecting any path that escapes it."

src/agents/coder/coder.py:73 — file_path = os.path.join(self.project_dir, project_name, file['file']) then open(file_path, "w") — file['file'] comes from model output with no traversal/realpath check src/agents/feature/feature.py:71 — identical unguarded os.path.join write (patcher.py:69 is a third copy); note get_project_files in src/project.py:157 DOES do a commonprefix check, so the write path is the inconsistent one

Recommended Action

In all three save_code_to_project copies, resolve the joined path with os.path.realpath and reject any result whose commonprefix is not the project work directory before opening for write.
Factor the write into a single confined helper so coder/feature/patcher cannot drift apart.

CRITICAL PRAX-2026-05-29-004 Unauthenticated POST /api/settings rewrites config.toml — any network client can change endpoints/keys, enabling self-config modification and credential redirection.

Implement Zero Trust ASI03 — Identity and Privilege Abuse LLM02 — Sensitive Information Disclosure

Policy Rule — R-04, R-05 (Worker Remit):
"Devika MUST NOT modify its own code, configuration, or memory store. / Devika MUST NOT exfiltrate operator credentials, API keys, or environment variables."

devika.py:187 — @app.route("/api/settings", methods=["POST"]) → config.update_config(data) with no auth; server runs socketio.run(app, host="0.0.0.0", port=1337) at line 209 src/config.py:186 — update_config() writes every supplied key/sub_key into config.toml on disk — including API_ENDPOINTS and API_KEYS — with no validation or access control

Recommended Action

Add authentication/authorization to all /api/* routes (at minimum a local-only bind to 127.0.0.1 and a shared secret), and restrict update_config to a vetted allowlist of mutable keys.
Move API keys out of the writable config.toml into environment/vault references so a settings write cannot redirect or read credentials.

CRITICAL PRAX-2026-05-29-008 The mandated sandbox is unimplemented — firejail.py and code_runner.py are 0-byte stubs, yet ARCHITECTURE.md claims the Runner executes code in a sandboxed environment.

Implement Zero Trust ASI05 — Unexpected Code Execution (RCE)

Policy Rule — R-01 (Worker Remit):
"Devika MUST NOT execute arbitrary shell commands outside its sandboxed runner."

src/sandbox/firejail.py — 0 lines — sandbox isolation module is an empty stub; src/sandbox/code_runner.py is likewise 0 lines ARCHITECTURE.md:93 — "Runner — Executes the written code in a sandboxed environment" documents a control that does not exist in code

Recommended Action

Implement firejail/container isolation in these modules and make Runner.run_code refuse to execute unless the sandbox is active (fail closed).
Correct ARCHITECTURE.md so it does not assert a sandbox that is not present.

HIGH PRAX-2026-05-29-005 No human-approval gate before any high-impact action — shell exec, out-of-tree writes, and deploys all fire autonomously.

Implement Zero Trust LLM06 — Excessive Agency ASI02 — Tool Misuse and Exploitation

Policy Rule — R-10, R-14 (Worker Remit):
"Control MUST return to the user before any action that installs packages, writes outside a previously-touched file, or executes a shell command whose output was not anticipated in the plan. / Any shell command that would write outside the work directory."

src/agents/agent.py:209 — subsequent_execute branches on model-decided action ("run"/"deploy"/"feature"/"bug") and invokes runner/Netlify/feature/patcher directly with no approval step src/agents/runner/runner.py:199 — execute() → run_code() runs commands as soon as the LLM returns them; no per-command policy or confirmation exists anywhere in the runner

Recommended Action

Introduce a deterministic approval gate that pauses and returns control to the user before any shell execution, package install, deploy, or write outside an already-touched file.
Classify actions by impact and require explicit operator confirmation for the high-impact tier rather than trusting the model's action keyword.

HIGH PRAX-2026-05-29-006 Model can install arbitrary packages without confirmation, and Devika's own dependencies are entirely unpinned with no lockfile.

Manage Your Supply Chain LLM03 — Supply Chain ASI04 — Agentic Supply Chain Vulnerabilities

Policy Rule — R-03, R-13 (Worker Remit):
"Devika MUST NOT install system packages. / Devika MUST NOT install packages or pull in dependencies without explicit user confirmation, especially from unpinned versions or unverified registries."

src/agents/runner/prompt.jinja2:24 — example commands include "pip3 install -r requirements.txt" — the model is told to produce install commands that runner.py executes unconfirmed requirements.txt:1 — 31 dependencies (flask, openai, anthropic, gevent, ...) listed with no == version pins and no accompanying lockfile

Recommended Action

Route any install command through the same approval gate (PRAX-2026-05-29-005) and restrict to a small pre-approved package allowlist as the remit specifies.
Pin all dependencies in requirements.txt to exact versions and commit a lockfile.

HIGH PRAX-2026-05-29-007 Out-of-scope external counterparties — the code wires in Netlify production deploys and a GitHub client the remit explicitly forbids.

Limit Your Domain LLM06 — Excessive Agency ASI10 — Rogue Agents

Policy Rule — R-16, R-17, R-20 (Worker Remit):
"Devika MUST NOT deploy code to production. / Devika MUST NOT communicate with external services beyond the LLM and search providers. / Any counterparty not listed here is unauthorized by default."

src/agents/agent.py:219 — elif action == "deploy": deploy_metadata = Netlify().deploy(project_name) — autonomous production deploy to a third-party service src/services/github.py:12 — posts to https://api.github.com/user/repos using a stored token; the decision agent (agent.py:138 git_clone branch) adds GitHub as a counterparty absent from the remit

Recommended Action

Remove or feature-flag the Netlify deploy and GitHub integrations, or amend the remit to authorize them with explicit approval gates — do not leave forbidden counterparties silently wired into the runtime.
Enforce the authorized-counterparty list in code so any outbound destination outside LLM/search is rejected by default.

HIGH PRAX-2026-05-29-009 No per-project action audit log — the only logging is a free-form devika_agent.log of HTTP routes, leaving shell execs, file writes, and web fetches unrecorded.

Monitor Continuously LLM06 — Excessive Agency

Policy Rule — R-08 (Worker Remit):
"Every action — plan step, file write, shell execution, web fetch, LLM call — MUST be recorded to a per-project audit log with enough detail to reconstruct what was done."

src/logger.py:13 — LogInit writes a single free-form devika_agent.log; route_logger logs only request.path/method, not agent actions src/agents/runner/runner.py:85 — executed command output is stored to AgentState terminal_session (UI state) but never written to a structured, durable action audit log

Recommended Action

Add a structured, append-only per-project audit log (JSON lines) recording each plan step, file write path, executed command, and web fetch URL with timestamps.
Make the audit log independent of LOG_PROMPTS and of the UI socket events so it survives as a forensic record.

HIGH PRAX-2026-05-29-010 No wall-clock or step-count cap on a run — the only bounds are a per-LLM-call inference timeout and a 5-try response-retry, leaving the overall agent loop unbounded.

Implement Zero Trust LLM10 — Unbounded Consumption ASI08 — Cascading Failures

Policy Rule — R-09, R-15 (Worker Remit):
"Each run MUST enforce a wall-clock and step-count cap to prevent runaway loops. / Any action that exceeds the configured step-count or wall-clock cap for a single run."

src/services/utils.py:11 — retry_wrapper caps a single agent's retries at max_tries=5; there is no session-level step or wall-clock budget anywhere src/agents/agent.py:270 — execute() runs plan→research→search loop→code with no overall step/time guard; search_queries iterates all model-produced queries unbounded

Recommended Action

Add a session-level step counter and wall-clock budget enforced in Agent.execute/subsequent_execute that halts and returns control to the user when exceeded.
Bound the number of search/crawl and exec iterations per run.

MEDIUM PRAX-2026-05-29-011 Knowledge base is a single global table with no project column, contradicting the remit's per-project memory-isolation rule.

Balance Your Knowledge Base ASI06 — Memory and Context Poisoning

Policy Rule — R-19 (Worker Remit):
"The project knowledge base and memory MUST be scoped to a single project, with no content mixed across projects."

src/memory/knowledge_base.py:10 — class Knowledge has only id/tag/contents — no project column; get_knowledge filters by tag alone, so the store is global across projects src/agents/agent.py:96 — knowledge_base.get_knowledge/add_knowledge calls are commented out, so the global store is dormant today but would mix projects once enabled

Recommended Action

Add a project column to the Knowledge model and scope every add/get_knowledge query by project before the knowledge-base path is re-enabled.

MEDIUM PRAX-2026-05-29-012 Unauthenticated GET /api/get-browser-snapshot send_files any caller-supplied path, an arbitrary host file read.

Implement Zero Trust LLM02 — Sensitive Information Disclosure

devika.py:123 — browser_snapshot() does snapshot_path = request.args.get("snapshot_path"); return send_file(snapshot_path, ...) with no path confinement, on the 0.0.0.0:1337 server

Recommended Action

Confine snapshot_path to the configured screenshots directory (realpath + prefix check) and add authentication to the route.

What's Working Well

Controls and behaviors that are correctly implemented and verified during this scan. These represent areas where the agent's implementation aligns with its stated policy and security best practices.

Path confinement on the project-files read endpoint

get_project_files confines its directory walk to an abspath base via an explicit commonprefix check before reading, showing the confinement pattern the write path (save_code_to_project) should have reused.

src/project.py:157

secure_filename applied on project API routes

The project blueprint routes (create/delete/download/get-project-files) wrap the caller-supplied project_name in werkzeug secure_filename before use, sanitizing that one input class.

src/apis/project.py:21

Runs as a non-root user in the container image

devika.dockerfile creates a dedicated nonroot user, chowns the app, and switches USER nonroot before entrypoint, limiting blast radius of host execution to that account.

devika.dockerfile:33

Discovered Log Files

Log files found in the agent's workspace during this scan. Reviewing these files provides runtime evidence to complement the static analysis above.

Path	Source	Content Type	Purpose	Last Modified	Status
data/logs/devika_agent.log	src/logger.py (fastlogging LogInit)	free-form plaintext (route entries + optional prompt/response debug)	HTTP route entry/exit logging and, when LOG_PROMPTS=true, model prompt/response debug; not an action-level audit	unknown	Inferred

OWASP LLM Top 10 (2025) Coverage

Each card represents one category and shows the top 3 findings. All items in the Findings section.

LLM01 Prompt Injection

Compound RCE chain — untrusted web/user content reaches the host shell via the Runner's bare subprocess.run with no sandbox and no approval gate. No instruction-injection screening anywhere — live web page content and the user prompt enter LLM context raw, violating the remit's mandatory screening rule.

LLM02 Sensitive Information Disclosure

Unauthenticated POST /api/settings rewrites config.toml — any network client can change endpoints/keys, enabling self-config modification and credential redirection. Unauthenticated GET /api/get-browser-snapshot send_files any caller-supplied path, an arbitrary host file read.

LLM03 Supply Chain

Model can install arbitrary packages without confirmation, and Devika's own dependencies are entirely unpinned with no lockfile.

LLM04 Data and Model Poisoning

No findings

LLM05 Improper Output Handling

Path traversal in save_code_to_project — model-chosen filenames are os.path.join'd with no confinement, letting code be written outside the work directory.

LLM06 Excessive Agency

No human-approval gate before any high-impact action — shell exec, out-of-tree writes, and deploys all fire autonomously. Out-of-scope external counterparties — the code wires in Netlify production deploys and a GitHub client the remit explicitly forbids. No per-project action audit log — the only logging is a free-form devika_agent.log of HTTP routes, leaving shell execs, file writes, and web fetches unrecorded.

LLM07 System Prompt Leakage

No findings

LLM08 Vector and Embedding Weaknesses

No findings

LLM09 Misinformation

No findings

LLM10 Unbounded Consumption

No wall-clock or step-count cap on a run — the only bounds are a per-LLM-call inference timeout and a 5-try response-retry, leaving the overall agent loop unbounded.

OWASP Agentic Top 10 (2026) Coverage

Each card represents one category and shows the top 3 findings. All items in the Findings section.

ASI01 Agent Goal Hijack

No instruction-injection screening anywhere — live web page content and the user prompt enter LLM context raw, violating the remit's mandatory screening rule.

ASI02 Tool Misuse and Exploitation

Path traversal in save_code_to_project — model-chosen filenames are os.path.join'd with no confinement, letting code be written outside the work directory. No human-approval gate before any high-impact action — shell exec, out-of-tree writes, and deploys all fire autonomously.

ASI03 Identity and Privilege Abuse

Unauthenticated POST /api/settings rewrites config.toml — any network client can change endpoints/keys, enabling self-config modification and credential redirection.

ASI04 Agentic Supply Chain Vulnerabilities

Model can install arbitrary packages without confirmation, and Devika's own dependencies are entirely unpinned with no lockfile.

ASI05 Unexpected Code Execution (RCE)

Compound RCE chain — untrusted web/user content reaches the host shell via the Runner's bare subprocess.run with no sandbox and no approval gate. The mandated sandbox is unimplemented — firejail.py and code_runner.py are 0-byte stubs, yet ARCHITECTURE.md claims the Runner executes code in a sandboxed environment.

ASI06 Memory and Context Poisoning

Knowledge base is a single global table with no project column, contradicting the remit's per-project memory-isolation rule.

ASI07 Insecure Inter-Agent Communication

No findings

ASI08 Cascading Failures

No wall-clock or step-count cap on a run — the only bounds are a per-LLM-call inference timeout and a 5-try response-retry, leaving the overall agent loop unbounded.

ASI09 Human-Agent Trust Exploitation

No findings

ASI10 Rogue Agents

Out-of-scope external counterparties — the code wires in Netlify production deploys and a GitHub client the remit explicitly forbids.

RAISE Maturity Posture

Overall maturity assessment across the six categories of the RAISE framework. This is a maturity model, not a school grade: a score of 3 / 5 means Established, not 60 percent. Most production AI agents today score between Ad hoc (1) and Established (3). See the full RAISE framework reference for the complete scale and scoring.

0.60 / 5.0

Weighted Maturity Score · Absent

Absent. Across the framework Devika has almost no runtime-enforced control: Zero Trust scores 0 because untrusted web/user content reaches the LLM and the host shell with no validation, the sandbox is two empty files, and an unauthenticated endpoint can rewrite config. Supply Chain, Domain, Knowledge, and Monitor each sit at Ad hoc — a fully unpinned `requirements.txt`, prompt-only domain framing, raw crawled content fed to the coder, and file-only logging with no structured action audit — while Build an AI Red Team is 0 with no adversarial-testing artifact in scope. The one thing keeping individual categories off the floor is inherited framework behavior (Flask, fastlogging, SQLite state), not deliberate safeguards.

Limit Your Domain

1/ 5

Confidence: High | Weight: 15% | Weighted: 0.15

Domain is asserted only in Jinja2 prompts ("You are Devika, an AI Software Engineer") with no code gate; the tool surface exceeds the remit, wiring in `Netlify().deploy()` and a `GitHub` client / `git_clone` branch that the remit explicitly forbids.

Balance Your Knowledge Base

1/ 5

Confidence: High | Weight: 15% | Weighted: 0.15

Live web pages are crawled and passed through `Formatter.execute()` directly into the `Coder` context (`Agent.search_queries`) with no validation, and the remit-required injection screening of untrusted input exists nowhere in the codebase.

Implement Zero Trust

0/ 5

Confidence: High | Weight: 25% | Weighted: 0.00

No interposition exists on the agent's path — `Runner.run_code()` feeds LLM output to `subprocess.run`, `save_code_to_project` joins a model-chosen path with no traversal check, and an unauthenticated `POST /api/settings` rewrites `config.toml`; the only "sandbox" is two 0-byte files.

Manage Your Supply Chain

1/ 5

Confidence: High | Weight: 15% | Weighted: 0.15

`requirements.txt` lists 31 dependencies with zero version pins and no lockfile, and the `Runner` lets the model issue `pip install` commands the remit says require user confirmation, so any version-swap or model-chosen package installs unchecked.

Build an AI Red Team

0/ 5

Confidence: Medium | Weight: 15% | Weighted: 0.00

No security test suite, injection-test fixtures, or red-team artifacts appear anywhere in the scanned scope; the only `benchmarks/` content targets SWE-bench task scoring, not adversarial security testing.

Monitor Continuously

1/ 5

Confidence: High | Weight: 15% | Weighted: 0.15

`src/logger.py` writes a single free-form `devika_agent.log` capturing HTTP routes and (optionally, default-off) prompts, with no structured action-level audit of file writes, shell executions, or web fetches as the remit requires; SQLite agent-state is a UI trail, not a security log.

Maturity Scoring Rubric

Every score above is based on this scale. A score is a snapshot of observable posture — not a verdict on the people or team behind the system.

Score	Label	Meaning
5	Exemplary	Best-in-class; automated, continuously tested, reference quality. Rarely achieved in shipping systems.
4	Strong	Comprehensive controls, active management, minor gaps. Production-ready.
3	Established	Documented controls consistently applied; known gaps accepted. A respectable baseline.
2	Partial	Some controls exist but coverage is incomplete; key gaps remain.
1	Ad hoc	Informal or inconsistent measures; relies on individual judgment.
0	Absent	No evidence this category is addressed at all.

Weighting: the weighted overall above is the sum of each category's score × weight (the per-category weights are shown on each card). Zero Trust carries double weight by design; see the RAISE framework reference for the rationale.