Open source · runs locally · nothing phones home

Make sure your agent does its job —
and only its job.

Praxen compares an AI agent's declared policy against real evidence — its code, live deployment state, and behavioral logs — and reports exactly where observed behavior drifts from intent, before it becomes a risk.

Get started → See how agents score

Praxy, the Praxen fox, inspecting a small robot agent

In the news See all coverage →

The loop

Define the job. Test against the job.

Agent Behavior Verification focuses on what's most critical — declared intent versus observed reality.

Step 1

Define the job

Write a Worker Remit — the agent's mission, authorized tools, approved channels, counterparties, and forbidden actions.

Step 2

Test reality

Point Praxen at evidence — source code, deployment files, or behavioral history — and it reads the workspace the way an auditor would.

Step 3

Find the gap

Every finding answers one question — does observed behavior match declared intent? — and cites the exact rule and evidence.

Step 4

Report locally

A self-contained HTML report, machine-readable JSON, and a plain-text summary land in ./reports/. Nothing leaves your machine.

In 30 seconds

One sentence to your coding agent

That's the whole interface: "Run a Praxen behavior analysis on ./my-agent." Praxen does the rest.

Declare intent

Write a Worker Remit by hand, or have Praxen draft one from your description or docs. It's the only artifact you customize per agent.

Point at evidence

Source code, live deployment state, conversation logs, governance docs — any mix. Praxen finds the remit, reads the workspace, compares.

Read the gap

Findings tag against OWASP and RAISE, chain into compound attack paths, and trace to the exact remit rule they violate.

Verification patterns

What Praxen catches

Every analysis runs a battery of named detection patterns — not just prompt-injection screening or known-bad code signatures.

Policy-implementation divergence — the code or behavior doesn't do what the policy document says.
Credential exposure — secrets surfacing in unexpected locations across the workspace.
Configuration gaps — auto-approved exec, disabled loop detection, missing rate limits.
Capability drift — new tools or outbound destinations not in the authorized baseline.
Compound signal reasoning — individual findings chained when they combine into a high-severity path.
Secondary prompt discovery — session-loaded identity files like SOUL.md / AGENTS.md audited as system prompts.

See it in action

From one report to the whole suite

Start with a complete analysis, then see how Praxen's twelve-agent baseline scores against the OWASP Top 10 and the RAISE maturity model.

Worked example

A full report

Walk a complete Praxen analysis end to end — FinBot, the invoice processor from the OWASP Agentic AI CTF.

14findings · self-contained HTML

Open example report → OWASP Top 10

Coverage summary

Findings mapped to the OWASP Top 10 for LLM Applications (2025) and Agentic AI (2026), aggregated by category across the baseline suite.

107findings · 12 agents classified

OWASP coverage report → RAISE Framework

Maturity scores

A six-category 0–5 maturity model for an agent's security posture, weighted into a single overall score per target.

1.55avg weighted · 12 agents scored

RAISE score distribution →

Get started in minutes

Install the plugin

Praxen runs as a plugin in your coding agent (such as Claude Code). No pip install needed. One command adds the marketplace and installs it — then point Praxen at an agent to evaluate.

Copied ✓

Full guide: Installation · Quickstart
Using OpenAI Codex? Steps are in the Installation guide.

Terminal

# install Praxen in your terminal
$ claude plugin marketplace add open-agent-ai-security/praxen
$ claude plugin install praxen@open-agent-ai-security
$ claude plugin list

# then, in Claude Code, point it at an agent
> Run a Praxen behavior analysis on ./my-agent

Verify your agent before it ships.

Praxen runs pre-deployment and on every release — open source, built for the community.

Read the Quickstart → Star on GitHub

Make sure your agent does its job —and only its job.