OWASP Coverage Across Praxen Baseline Targets

How each OWASP category actually shows up across 12 real-world AI agents — as a finding's primary risk (solid) or a secondary, co-occurring concern (hatched).

12targets analyzed

114total findings

78LLM-classified

54Agentic-classified

Companion views · RAISE coverage · Suite Health — popularity & freshness

Targets analyzed 12 Praxen scans

Each card links to the agent's source repository and its per-target Praxen analysis report. Counts show how many of that agent's findings fall primarily under an LLM or Agentic OWASP category.

FinBot

OWASP Agentic AI CTF — invoice processor

15 findings 12 LLM 7 Agentic

Source repo ↗ Analysis report ↗

HelperBot

Damn Vulnerable AI Agent — training agent

11 findings 8 LLM 3 Agentic

Source repo ↗ Analysis report ↗

OpenAI Customer Service

OpenAI Agents SDK example

10 findings 6 LLM 5 Agentic

Source repo ↗ Analysis report ↗

AutoGen Code Executor

Microsoft AutoGen code-executor family

11 findings 9 LLM 9 Agentic

Source repo ↗ Analysis report ↗

Aider

Interactive pair-programming agent

6 findings 6 LLM 4 Agentic

Source repo ↗ Analysis report ↗

OpenHands

Autonomous software-engineering platform

11 findings 7 LLM 3 Agentic

Source repo ↗ Analysis report ↗

Deep Agents CLI

LangChain agent harness (MCP coverage)

5 findings 3 LLM 1 Agentic

Source repo ↗ Analysis report ↗

yaah

Yet Another Agent Harness (MCP coverage)

8 findings 4 LLM 5 Agentic

Source repo ↗ Analysis report ↗

Hermes (Agent + Desktop)

Multi-component LLM agent + desktop control layer

5 findings 4 LLM 4 Agentic

Source repo ↗ Analysis report ↗

CraftBot

Self-hosted general-purpose agent that builds and operates its own SaaS tools

14 findings 11 LLM 7 Agentic

Source repo ↗ Analysis report ↗

uAgents

Fetch.ai decorator-based autonomous multi-agent framework runtime

8 findings 3 LLM 4 Agentic

Source repo ↗ Analysis report ↗

Agentforce Help Agent

Salesforce Agentforce customer-service agent (Knowledge-article RAG)

10 findings 5 LLM 2 Agentic

Source repo ↗ Analysis report ↗

OWASP LLM Top 10 — coverage by category

How the OWASP Top 10 for LLM Applications 2025 categories apply across these agents. Solid = the finding's primary category; hatched = a category it also touches. Empty rows are categories these apps don't exercise.

primary — the finding's main category secondary — a category it also touches hover any bar to see the findings behind it

LLM01Prompt Injection

Primary — 13 findings

FinBot — Untrusted vendor invoice text flows into the LLM context and drives an uncondition
FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s
FinBot — Vendor-supplied invoice description enters the LLM context unvalidated, providing
FinBot — Prompt-injection and fraud detection rely on a bypassable regex denylist, and its
HelperBot — HelperBot obeys injected "ignore previous instructions" directives instead of decl
HelperBot — HelperBot accepts fabricated conversation history and role claims, proceeding on "
Airline Customer Service Agent (multi-agent) — No input or output guardrails are wired, so untrusted customer free-text and tool
+6 more

Secondary — 1 finding

HAA Help Agent — The knowledge action's grounding and citation parameters are marked is_user_input,

LLM02Sensitive Information Disclosure

Primary — 16 findings

FinBot — A Flask SECRET_KEY is hardcoded in source rather than loaded from the environment.
HelperBot — Literal API keys and passwords are hardcoded in source and embedded in HelperBot's
HelperBot — Unauthenticated /health and /info endpoints expose HelperBot's tool list and vulne
AutoGen Code Executor — LocalCommandLineCodeExecutor passes the parent process's full environment to execu
Aider — PostHog analytics enables exception autocapture, which can send stack traces conta
Aider — No secret scanning or redaction before repository content enters LLM context, prop
OpenHands (Autonomous Software-Engineering Agent) — OSS defaults compose into an unauthenticated, cross-origin read of plaintext-at-re
+9 more

Secondary — 2 findings

HelperBot — HelperBot discloses its full system prompt and configuration on request, directly
yaah (Yet Another Agent Harness) — On Codex, the command guard and secret scanner ship only as advisory model-callabl

LLM03Supply Chain

Primary — 8 findings

FinBot — Python dependencies are floor-pinned with >= and no lockfile is committed.
HelperBot — LLM-SDK dependencies use caret ranges rather than exact pins, allowing silent mino
AutoGen Code Executor — The default Docker execution image python:3-slim is a floating tag with no digest
Aider — Self-upgrade path installs unpinned code directly from the aider git main branch,
Deep Agents CLI (deepagents-cli) — Shipped `pyproject.toml` floor-pins dependencies and leaves `deepagents` and `lang
yaah (Yet Another Agent Harness) — Default third-party MCP server `context7` is fetched unpinned via `npx -y @context
CraftBot — Third-party MCP servers and imported code run without isolation or provenance vett
+1 more

LLM04Data and Model Poisoning

Primary — 1 finding

Hermes Agent (with Hermes Desktop) — Writable, session-loaded memory and agent-created skills are an ASI06 persistence

LLM05Improper Output Handling

Primary — 4 findings

AutoGen Code Executor — LocalCommandLineCodeExecutor docstring claims dangerous-command sanitization that
yaah (Yet Another Agent Harness) — Command guard blocks `rm -rf /` but not flag-order, long-form, or equivalent catas
Hermes Agent (with Hermes Desktop) — Dangerous-command approval relies on a regex denylist that is structurally incompl
CraftBot — Model-generated Python is executed on the host via exec()/subprocess with no isola

Secondary — 6 findings

AutoGen Code Executor — create_default_code_executor silently downgrades to unisolated host execution with
AutoGen Code Executor — approval_func defaults to None, so all LLM-generated code is auto-executed with no
Aider — No code-level repository-root confinement on writes; absolute or ../ edit paths es
Hermes Agent (with Hermes Desktop) — Dangerous shell commands auto-approve (fail open) in a headless non-interactive co
Hermes Agent (with Hermes Desktop) — The default terminal backend runs LLM-emitted commands directly on the host, so sa
CraftBot — Untrusted external content reaches the LLM context unsanitized and the model's out

LLM06Excessive Agency

Primary — 23 findings

FinBot — Approval requirements — manual-review threshold, fraud-risk routing, fraud-ran pre
FinBot — Fraud detection can be disabled via a config flag, after which detection returns e
FinBot — The fallback rule engine explicitly auto-approves above-threshold and injection-fl
FinBot — The agent never verifies vendor registered-and-approved status before approving, a
FinBot — The confidence_threshold config is declared and settable but never consulted by an
HelperBot — read_file/write_file/search_web are advertised to the model but unimplemented on t
Airline Customer Service Agent (multi-agent) — update_seat mutates reservation state with no identity or confirmation-number veri
+16 more

Secondary — 6 findings

FinBot — Untrusted vendor invoice text flows into the LLM context and drives an uncondition
FinBot — Prompt-injection and fraud detection rely on a bypassable regex denylist, and its
yaah (Yet Another Agent Harness) — Command guard blocks `rm -rf /` but not flag-order, long-form, or equivalent catas
Hermes Agent (with Hermes Desktop) — Dangerous-command approval relies on a regex denylist that is structurally incompl
CraftBot — Model-generated Python is executed on the host via exec()/subprocess with no isola
CraftBot — Untrusted external content reaches the LLM context unsanitized and the model's out

LLM07System Prompt Leakage

Primary — 2 findings

HelperBot — HelperBot discloses its full system prompt and configuration on request, directly
HAA Help Agent — System-prompt and configuration secrecy rests entirely on a system-prompt deny-lis

LLM08Vector and Embedding Weaknesses

LLM09Misinformation

Primary — 5 findings

Airline Customer Service Agent (multi-agent) — The seat-booking handoff fabricates an authoritative flight number at runtime with
Airline Customer Service Agent (multi-agent) — FAQ grounding is prompt-only — nothing in code prevents the FAQ agent from answeri
HAA Help Agent — The agent's no-hallucination rule is enforced only by a system-prompt clause; no c
HAA Help Agent — The off_topic topic offers to escalate the user to a human agent, directly contrad
HAA Help Agent — As shipped, RAG grounding and citations are unconfigured — citations are disabled

Secondary — 1 finding

FinBot — The confidence_threshold config is declared and settable but never consulted by an

LLM10Unbounded Consumption

Primary — 6 findings

FinBot — The public invoice-submission endpoint triggers LLM processing with no rate limiti
HelperBot — No rate limiting, per-session tool-call cap, or tool-loop detection exists, so a r
AutoGen Code Executor — Docker executor sets no CPU or memory limits, bounding executions only by wall-clo
OpenHands (Autonomous Software-Engineering Agent) — Per-task budget cap is disabled by default, so only the iteration cap bounds a run
OpenHands (Autonomous Software-Engineering Agent) — The global rate limiter is in-memory and keyed on client IP, so it is neither dura
uAgents Framework Runtime — No default inbound rate limiting on the message endpoint, leaving agents open to r

OWASP Agentic Top 10 — coverage by category

How the OWASP Top 10 for Agentic AI Applications 2026 categories apply. Outcome categories — Cascading Failures, Rogue Agents — appear hatched-only: they're real concerns, but a more specific category is usually the primary one.

ASI01Agent Goal Hijack

Primary — 9 findings

FinBot — Untrusted vendor invoice text flows into the LLM context and drives an uncondition
FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s
FinBot — Vendor-supplied invoice description enters the LLM context unvalidated, providing
HelperBot — HelperBot obeys injected "ignore previous instructions" directives instead of decl
HelperBot — HelperBot accepts fabricated conversation history and role claims, proceeding on "
Airline Customer Service Agent (multi-agent) — No input or output guardrails are wired, so untrusted customer free-text and tool
Airline Customer Service Agent (multi-agent) — Triage domain scope is prompt-only and generic, with no topic guard restricting th
+2 more

ASI02Tool Misuse and Exploitation

Primary — 10 findings

FinBot — Approval requirements — manual-review threshold, fraud-risk routing, fraud-ran pre
FinBot — The fallback rule engine explicitly auto-approves above-threshold and injection-fl
HelperBot — read_file/write_file/search_web are advertised to the model but unimplemented on t
Airline Customer Service Agent (multi-agent) — update_seat accepts any seat string with no check that the seat exists on the flig
Airline Customer Service Agent (multi-agent) — The only runtime precondition on the seat mutation is a Python assert, which is st
Aider — No code-level repository-root confinement on writes; absolute or ../ edit paths es
Aider — --yes-always silently auto-approves package installs, Playwright install, and self
+3 more

ASI03Identity and Privilege Abuse

Primary — 11 findings

FinBot — All /admin/finbot config, goals, vendor-trust, and review endpoints are unauthenti
FinBot — The agent never verifies vendor registered-and-approved status before approving, a
Airline Customer Service Agent (multi-agent) — update_seat mutates reservation state with no identity or confirmation-number veri
AutoGen Code Executor — LocalCommandLineCodeExecutor passes the parent process's full environment to execu
OpenHands (Autonomous Software-Engineering Agent) — OSS defaults compose into an unauthenticated, cross-origin read of plaintext-at-re
OpenHands (Autonomous Software-Engineering Agent) — In the OSS default the app-server API attaches no authentication, so secrets/sandb
CraftBot — Inbound messaging defaults to auto_reply=true with no sender-identity check, so an
+4 more

Secondary — 2 findings

HelperBot — HelperBot accepts fabricated conversation history and role claims, proceeding on "
CraftBot — Operator secrets are stored as plaintext JSON at rest — integration tokens in .cre

ASI04Agentic Supply Chain Vulnerabilities

Primary — 4 findings

AutoGen Code Executor — The default Docker execution image python:3-slim is a floating tag with no digest
Aider — Self-upgrade path installs unpinned code directly from the aider git main branch,
yaah (Yet Another Agent Harness) — Default third-party MCP server `context7` is fetched unpinned via `npx -y @context
CraftBot — Third-party MCP servers and imported code run without isolation or provenance vett

ASI05Unexpected Code Execution (RCE)

Primary — 13 findings

AutoGen Code Executor — create_default_code_executor silently downgrades to unisolated host execution with
AutoGen Code Executor — approval_func defaults to None, so all LLM-generated code is auto-executed with no
AutoGen Code Executor — LocalCommandLineCodeExecutor docstring claims dangerous-command sanitization that
AutoGen Code Executor — DockerCommandLineCodeExecutor creates containers with default networking and expos
AutoGen Code Executor — extra_volumes and device_requests are accepted and applied with no gate or warning
AutoGen Code Executor — Working-directory confinement is enforced only for the optional # filename: header
yaah (Yet Another Agent Harness) — On Codex, the command guard and secret scanner ship only as advisory model-callabl
+6 more

Secondary — 3 findings

AutoGen Code Executor — The sources counterparty filter defaults to None, so code from any group-chat agen
Aider — --yes-always silently auto-approves package installs, Playwright install, and self
CraftBot — Writable system-prompt identity file (SOUL.md) and auto-written MEMORY.md plus the

ASI06Memory and Context Poisoning

Primary — 4 findings

yaah (Yet Another Agent Harness) — Generated session-loaded config files are freely writable with no control preventi
Hermes Agent (with Hermes Desktop) — Writable, session-loaded memory and agent-created skills are an ASI06 persistence
CraftBot — Writable system-prompt identity file (SOUL.md) and auto-written MEMORY.md plus the
CraftBot — The agent can rewrite its own identity/policy files (SOUL.md, AGENT.md, USER.md) a

Secondary — 1 finding

FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s

ASI07Insecure Inter-Agent Communication

Primary — 3 findings

AutoGen Code Executor — The sources counterparty filter defaults to None, so code from any group-chat agen
Deep Agents CLI (deepagents-cli) — Remote MCP server URLs are never validated as TLS — `mcp-servers add`/`update` and
uAgents Framework Runtime — Inbound signed envelopes are never checked against their expires/nonce fields, so

ASI08Cascading Failures

ASI09Human-Agent Trust Exploitation

Secondary — 1 finding

CraftBot — Inbound messaging defaults to auto_reply=true with no sender-identity check, so an

ASI10Rogue Agents

Secondary — 5 findings

FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s
FinBot — Fraud detection can be disabled via a config flag, after which detection returns e
OpenHands (Autonomous Software-Engineering Agent) — Human-approval gates for merge, force-push, and cross-repo writes are inert unless
CraftBot — Writable system-prompt identity file (SOUL.md) and auto-written MEMORY.md plus the
CraftBot — The agent can rewrite its own identity/policy files (SOUL.md, AGENT.md, USER.md) a

Where LLM and Agentic risks meet co-occurrence heat map

Every square counts the findings tagged with both that LLM category (row) and that Agentic category (column) — primary or secondary. It shows how a model-layer weakness and an agent-layer weakness combine in the same finding: 46 of 114 findings span both layers, lighting 20 of 100 pairings. Blank squares are pairings that never co-occur; hotter squares occur more often (peak 13).

LLM ↓
ASI →

ASI01ASI01 — Agent Goal Hijack

ASI02ASI02 — Tool Misuse and Exploitation

ASI03ASI03 — Identity and Privilege Abuse

ASI04ASI04 — Agentic Supply Chain Vulnerabilities

ASI05ASI05 — Unexpected Code Execution (RCE)

ASI06ASI06 — Memory and Context Poisoning

ASI07ASI07 — Insecure Inter-Agent Communication

ASI08ASI08 — Cascading Failures

ASI09ASI09 — Human-Agent Trust Exploitation

ASI10ASI10 — Rogue Agents

LLM01LLM01 — Prompt Injection

8LLM01 × ASI01 — 8 findings

FinBot — Untrusted vendor invoice text flows into the LLM context and drives an uncondition
FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s
FinBot — Vendor-supplied invoice description enters the LLM context unvalidated, providing
HelperBot — HelperBot obeys injected "ignore previous instructions" directives instead of decl
HelperBot — HelperBot accepts fabricated conversation history and role claims, proceeding on "
Airline Customer Service Agent (multi-agent) — No input or output guardrails are wired, so untrusted customer free-text and tool
Aider — Untrusted content (scraped pages, AI!/AI? comments, git history) enters LLM contex
+1 more

1LLM01 × ASI02 — 1 finding

HAA Help Agent — The knowledge action's grounding and citation parameters are marked is_user_input,

2LLM01 × ASI03 — 2 findings

HelperBot — HelperBot accepts fabricated conversation history and role claims, proceeding on "
CraftBot — Inbound messaging defaults to auto_reply=true with no sender-identity check, so an

2LLM01 × ASI05 — 2 findings

CraftBot — Untrusted external content reaches the LLM context unsanitized and the model's out
CraftBot — Writable system-prompt identity file (SOUL.md) and auto-written MEMORY.md plus the

2LLM01 × ASI06 — 2 findings

FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s
CraftBot — Writable system-prompt identity file (SOUL.md) and auto-written MEMORY.md plus the

1LLM01 × ASI09 — 1 finding

CraftBot — Inbound messaging defaults to auto_reply=true with no sender-identity check, so an

2LLM01 × ASI10 — 2 findings

FinBot — Runtime goal injection — custom_goals is stored unvalidated and spliced into the s
CraftBot — Writable system-prompt identity file (SOUL.md) and auto-written MEMORY.md plus the

LLM02LLM02 — Sensitive Information Disclosure

4LLM02 × ASI03 — 4 findings

AutoGen Code Executor — LocalCommandLineCodeExecutor passes the parent process's full environment to execu
OpenHands (Autonomous Software-Engineering Agent) — OSS defaults compose into an unauthenticated, cross-origin read of plaintext-at-re
CraftBot — Operator secrets are stored as plaintext JSON at rest — integration tokens in .cre
uAgents Framework Runtime — Name-based agents persist identity and wallet private keys to plaintext private_ke

1LLM02 × ASI05 — 1 finding

yaah (Yet Another Agent Harness) — On Codex, the command guard and secret scanner ship only as advisory model-callabl

1LLM02 × ASI07 — 1 finding

Deep Agents CLI (deepagents-cli) — Remote MCP server URLs are never validated as TLS — `mcp-servers add`/`update` and

LLM03LLM03 — Supply Chain

4LLM03 × ASI04 — 4 findings

AutoGen Code Executor — The default Docker execution image python:3-slim is a floating tag with no digest
Aider — Self-upgrade path installs unpinned code directly from the aider git main branch,
yaah (Yet Another Agent Harness) — Default third-party MCP server `context7` is fetched unpinned via `npx -y @context
CraftBot — Third-party MCP servers and imported code run without isolation or provenance vett

LLM04LLM04 — Data and Model Poisoning

1LLM04 × ASI06 — 1 finding

Hermes Agent (with Hermes Desktop) — Writable, session-loaded memory and agent-created skills are an ASI06 persistence

LLM05LLM05 — Improper Output Handling

1LLM05 × ASI02 — 1 finding

Aider — No code-level repository-root confinement on writes; absolute or ../ edit paths es

9LLM05 × ASI05 — 9 findings

AutoGen Code Executor — create_default_code_executor silently downgrades to unisolated host execution with
AutoGen Code Executor — approval_func defaults to None, so all LLM-generated code is auto-executed with no
AutoGen Code Executor — LocalCommandLineCodeExecutor docstring claims dangerous-command sanitization that
yaah (Yet Another Agent Harness) — Command guard blocks `rm -rf /` but not flag-order, long-form, or equivalent catas
Hermes Agent (with Hermes Desktop) — Dangerous shell commands auto-approve (fail open) in a headless non-interactive co
Hermes Agent (with Hermes Desktop) — Dangerous-command approval relies on a regex denylist that is structurally incompl
Hermes Agent (with Hermes Desktop) — The default terminal backend runs LLM-emitted commands directly on the host, so sa
+2 more

LLM06LLM06 — Excessive Agency

1LLM06 × ASI01 — 1 finding

FinBot — Untrusted vendor invoice text flows into the LLM context and drives an uncondition

8LLM06 × ASI02 — 8 findings

FinBot — Approval requirements — manual-review threshold, fraud-risk routing, fraud-ran pre
FinBot — The fallback rule engine explicitly auto-approves above-threshold and injection-fl
HelperBot — read_file/write_file/search_web are advertised to the model but unimplemented on t
Airline Customer Service Agent (multi-agent) — The only runtime precondition on the seat mutation is a Python assert, which is st
Aider — No code-level repository-root confinement on writes; absolute or ../ edit paths es
Aider — --yes-always silently auto-approves package installs, Playwright install, and self
OpenHands (Autonomous Software-Engineering Agent) — Human-approval gates for merge, force-push, and cross-repo writes are inert unless
+1 more

2LLM06 × ASI03 — 2 findings

FinBot — The agent never verifies vendor registered-and-approved status before approving, a
Airline Customer Service Agent (multi-agent) — update_seat mutates reservation state with no identity or confirmation-number veri

13LLM06 × ASI05 — 13 findings

AutoGen Code Executor — create_default_code_executor silently downgrades to unisolated host execution with
AutoGen Code Executor — approval_func defaults to None, so all LLM-generated code is auto-executed with no
AutoGen Code Executor — DockerCommandLineCodeExecutor creates containers with default networking and expos
AutoGen Code Executor — extra_volumes and device_requests are accepted and applied with no gate or warning
AutoGen Code Executor — Working-directory confinement is enforced only for the optional # filename: header
Aider — --yes-always silently auto-approves package installs, Playwright install, and self
yaah (Yet Another Agent Harness) — On Codex, the command guard and secret scanner ship only as advisory model-callabl
+6 more

1LLM06 × ASI06 — 1 finding

CraftBot — The agent can rewrite its own identity/policy files (SOUL.md, AGENT.md, USER.md) a

3LLM06 × ASI10 — 3 findings

FinBot — Fraud detection can be disabled via a config flag, after which detection returns e
OpenHands (Autonomous Software-Engineering Agent) — Human-approval gates for merge, force-push, and cross-repo writes are inert unless
CraftBot — The agent can rewrite its own identity/policy files (SOUL.md, AGENT.md, USER.md) a

LLM07LLM07 — System Prompt Leakage

LLM08LLM08 — Vector and Embedding Weaknesses

LLM09LLM09 — Misinformation

LLM10LLM10 — Unbounded Consumption

no co-occurrence 1cooler → hotter13 hover a cell for the findings behind it

How to read this primary vs secondary

Every finding is classified against the OWASP Top 10 by its primary risk — the single category that best captures what an attacker could actually do — and may note secondary categories it also touches. The solid bar counts the primary classification; the hatched extension shows the secondary, co-occurring concerns, kept separate so a headline number reflects only where a category is genuinely the dominant risk, not merely implicated. Some real findings have no OWASP home at all — a missing audit trail, for instance, is something the Top 10 treats as a defensive gap to close rather than a vulnerability to classify. Those are left unclassified and appear in neither chart. Where the taxonomy reaches, and where it doesn't, is itself part of what this view measures. For how these categories are applied, see the OWASP Gen AI Security guide.