Praxen Documentation
Praxen is the open-source reference implementation of Agent Behavior Verification (ABV) — a proactive control model for AI agents and digital workers. It compares an AI agent's declared policy (a Worker Remit) against whatever evidence is available about that agent — source code, live deployment state, behavioral artifacts, governance docs, or any mix — and reports where observed behavior diverges from declared intent.
Make sure your agent does its job — and only its job.
Praxen is a project sponsored by Exabeam.
Where to start
| If you are… | Read this first |
|---|---|
| Setting up Praxen for the first time | Installation |
| Trying it out for the first time | Quickstart — first report against the bundled finbot example in five minutes |
| Ready to run your first real analysis | Usage |
| Writing a Worker Remit for an agent | Writing Worker Remits |
| Looking at a report and trying to understand it | Interpreting Reports |
| Disagreeing with a finding or wanting to revise it | Challenging and Revising Findings |
| Wondering why two runs gave slightly different scores | Understanding Run-to-Run Variability |
| Hit a problem on a first run | Usage § Troubleshooting |
| Trying to understand the OWASP frameworks Praxen tags against | OWASP Gen AI Security |
| Trying to understand the RAISE maturity scoring | The RAISE Framework |
How Praxen Works (in 90 seconds)
Praxen reduces agent verification to a single comparison:
- You declare what the agent is supposed to do in a Worker Remit. This is the only artifact you customize per agent.
- You point Praxen at evidence about the agent — its source code, live deployment files, conversation logs, or any combination.
- Praxen reads, compares, reports. Every finding traces to a specific rule in the Worker Remit it violates, with evidence cited from the input.
flowchart LR
WR["Worker Remit<br/>(declared policy)"] --> P{{"behavior-verifier<br/>skill"}}
EV["Evidence<br/>(source · deployment · behavior · governance)"] --> P
P --> JSON["findings.json<br/>(canonical)"]
JSON --> R["render.py"]
R --> HTML["analysis.html"]
R --> TXT["analysis.txt"]
The output is a self-contained HTML analysis report, a machine-readable JSON findings file, and a plain-text summary. Open the HTML in a browser; ingest the JSON in your pipeline.
Four Input Shapes
Praxen is not just a source-code analyzer. Any of these — alone or in combination — are valid input:
- Source repository — a project directory, GitHub repo, or plugin source tree.
- Running deployment — live memory and bootstrap files (
MEMORY.md,SOUL.md), operational logs (action reports, session JSONL, audit trails, escalation logs), live config. - Behavioral artifacts — chat transcripts, email histories, conversation logs, decision records.
- Governance & methodology docs — red-team reports, threat models, runbooks, incident retrospectives, dependency-management policy. These feed the maturity-oriented RAISE categories (Build an AI Red Team, Monitor Continuously, Manage Your Supply Chain) that source code alone can't speak to.
The methodology adapts. Categories the input doesn't cover are scored at lower confidence and explicitly noted in the report. See Usage for how to point Praxen at each type.
Frameworks
Every finding Praxen produces is classified against four industry-standard frameworks simultaneously:
- OWASP Top 10 for LLM Applications 2025 —
LLM01–LLM10tags - OWASP Top 10 for Agentic AI Applications 2026 —
ASI01–ASI10tags - OWASP Secure MCP Server Development Guide 2026 — applied when MCP configuration is found
- RAISE Framework — six-category 0–5 maturity score; see RAISE
For an overview of the OWASP Gen AI Security Project and a one-line gloss on each LLM, Agentic, and MCP risk, see OWASP Gen AI Security. Or browse the live OWASP Coverage Report — aggregate LLM and Agentic Top-10 coverage across Praxen's example suite, with links into each per-target analysis.
Quick reference
- Install:
claude plugin marketplace add open-agent-ai-security/praxenthenclaude plugin install praxen@open-agent-ai-security(or the in-session/plugin ...equivalents — see Installation) - Skill name:
behavior-verifier - Output directory:
./reports/relative to where you run the analysis - Output files:
<agent>-analysis-<timestamp>.html,<agent>-findings-<date>.json,<agent>-analysis-<timestamp>.txt
For the full specification, see PRAXEN_SPEC.md at the repo root.
