Make sure your agent does its job —
and only its job.
Praxen compares an AI agent's declared policy against real evidence — its code, live deployment state, and behavioral logs — and reports exactly where observed behavior drifts from intent, before it becomes a risk.
The loop
Define the job. Test against the job.
Agent Behavior Verification focuses on what's most critical — declared intent versus observed reality.
Define the job
Write a Worker Remit — the agent's mission, authorized tools, approved channels, counterparties, and forbidden actions.
Test reality
Point Praxen at evidence — source code, deployment files, or behavioral history — and it reads the workspace the way an auditor would.
Find the gap
Every finding answers one question — does observed behavior match declared intent? — and cites the exact rule and evidence.
Report locally
A self-contained HTML report, machine-readable JSON, and a plain-text summary land in ./reports/. Nothing leaves your machine.
In 30 seconds
One sentence to your coding agent
That's the whole interface: "Run a Praxen behavior analysis on ./my-agent." Praxen does the rest.
Declare intent
Write a Worker Remit by hand, or have Praxen draft one from your description or docs. It's the only artifact you customize per agent.
Point at evidence
Source code, live deployment state, conversation logs, governance docs — any mix. Praxen finds the remit, reads the workspace, compares.
Read the gap
Findings tag against OWASP and RAISE, chain into compound attack paths, and trace to the exact remit rule they violate.
Verification patterns
What Praxen catches
Every analysis runs a battery of named detection patterns — not just prompt-injection screening or known-bad code signatures.
- Policy-implementation divergence — the code or behavior doesn't do what the policy document says.
- Credential exposure — secrets surfacing in unexpected locations across the workspace.
- Configuration gaps — auto-approved exec, disabled loop detection, missing rate limits.
- Capability drift — new tools or outbound destinations not in the authorized baseline.
- Compound signal reasoning — individual findings chained when they combine into a high-severity path.
- Secondary prompt discovery — session-loaded identity files like
SOUL.md/AGENTS.mdaudited as system prompts.
See it in action
From one report to the whole suite
Start with a complete analysis, then see how Praxen's twelve-agent baseline scores against the OWASP Top 10 and the RAISE maturity model.
A full report
Walk a complete Praxen analysis end to end — FinBot, the invoice processor from the OWASP Agentic AI CTF.
Coverage summary
Findings mapped to the OWASP Top 10 for LLM Applications (2025) and Agentic AI (2026), aggregated by category across the baseline suite.
Maturity scores
A six-category 0–5 maturity model for an agent's security posture, weighted into a single overall score per target.
Get started in minutes
Install the Claude Code plugin
Praxen runs as a plugin in your coding agent. No pip install — the report renderer is Python-stdlib-only. Add the marketplace, install, and point it at an agent.
Full guide: Installation · Quickstart
# 1 · add the Praxen marketplace > /plugin marketplace add open-agent-ai-security/praxen # 2 · install the plugin > /plugin install praxen@open-agent-ai-security # 3 · run an analysis > Run a Praxen behavior analysis on ./my-agent
Verify your agent before it ships.
Praxen runs pre-deployment and on every release — open source, built for the community.