RAISE Score Distribution Across Praxen Baseline Targets

Per-target scores and population distributions for all six RAISE categories, drawn from the frozen tests/baselines/v1.1-claude48/ baseline set.

12targets analyzed

12with RAISE scores

1.85population avg weighted overall

Companion views · OWASP coverage · Suite Health — popularity & freshness

Per-target RAISE scores sorted by weighted overall, highest first

Each row shows all six RAISE category scores (0–5) and the weighted overall for one baseline target. Implement Zero Trust carries weight 0.25; the other five carry 0.15 each. Hover any cell to see the maturity label.

Agent / Target	Domain	Knowledge	Zero Trust	Supply Chain	Red Team	Monitor	Weighted Overall
Hermes (Agent + Desktop) Multi-component LLM agent + desktop control layer repo ↗ report ↗	2	2	3	3	4	3	2.85
Deep Agents CLI LangChain agent harness (MCP coverage) repo ↗ report ↗	4	3	3	2	3	1	2.70
OpenHands Autonomous software-engineering platform repo ↗ report ↗	3	2	2	3	1	3	2.30
yaah Yet Another Agent Harness (MCP coverage) repo ↗ report ↗	3	2	2	3	1	3	2.30
AutoGen Code Executor Microsoft AutoGen code-executor family repo ↗ report ↗	3	2	2	3	1	1	2.00
Aider Interactive pair-programming agent repo ↗ report ↗	3	1	2	3	1	2	2.00
uAgents Fetch.ai decorator-based autonomous multi-agent framework runtime repo ↗ report ↗	2	2	2	3	1	2	2.00
Agentforce Help Agent Salesforce Agentforce customer-service agent (Knowledge-article RAG) repo ↗ report ↗	3	2	2	2	0	1	1.70
OpenAI Customer Service OpenAI Agents SDK example repo ↗ report ↗	2	1	1	3	1	2	1.60
CraftBot Self-hosted general-purpose agent that builds and operates its own SaaS tools repo ↗ report ↗	1	1	1	1	1	2	1.15
FinBot OWASP Agentic AI CTF — invoice processor repo ↗ report ↗	2	1	0	1	1	1	0.90
HelperBot Damn Vulnerable AI Agent — training agent repo ↗ report ↗	1	1	0	1	1	1	0.75

Sorted by weighted overall (highest first). Cell colour: 0 Absent → 5 Exemplary. Hover for maturity label.

Score distribution by RAISE component how many targets land at each maturity level

For each of the six RAISE categories, the bars show how many of the 12 scored targets received each maturity level (0 Absent through 5 Exemplary). The population average and standard deviation are shown per category.

Limit Your Domain weight 15% · population avg 2.42 ±0.90

0 Absent

0 of 12

1 Ad hoc

2 of 12

2 Partial

4 of 12

3 Established

5 of 12

4 Strong

1 of 12

5 Exemplary

0 of 12

Balance Your Knowledge Base weight 15% · population avg 1.67 ±0.65

0 Absent

0 of 12

1 Ad hoc

5 of 12

2 Partial

6 of 12

3 Established

1 of 12

4 Strong

0 of 12

5 Exemplary

0 of 12

Implement Zero Trust weight 25% · population avg 1.67 ±0.98

0 Absent

2 of 12

1 Ad hoc

2 of 12

2 Partial

6 of 12

3 Established

2 of 12

4 Strong

0 of 12

5 Exemplary

0 of 12

Manage Your Supply Chain weight 15% · population avg 2.33 ±0.89

0 Absent

0 of 12

1 Ad hoc

3 of 12

2 Partial

2 of 12

3 Established

7 of 12

4 Strong

0 of 12

5 Exemplary

0 of 12

Build an AI Red Team weight 15% · population avg 1.33 ±1.07

0 Absent

1 of 12

1 Ad hoc

9 of 12

2 Partial

0 of 12

3 Established

1 of 12

4 Strong

1 of 12

5 Exemplary

0 of 12

Monitor Continuously weight 15% · population avg 1.83 ±0.83

0 Absent

0 of 12

1 Ad hoc

5 of 12

2 Partial

4 of 12

3 Established

3 of 12

4 Strong

0 of 12

5 Exemplary

0 of 12

Population weighted-overall distribution across all 12 targets

Histogram of weighted-overall scores grouped by maturity band. Each band covers one integer step (0.0–0.99 = Ad hoc, etc.).

Population: 12 targets · Average: 1.85 · Range: 0.75 – 2.85

Absent 0.0 – 0.99

2 of 12 (avg 0.82)

Ad hoc 1.0 – 1.99

3 of 12 (avg 1.48)

Partial 2.0 – 2.99

7 of 12 (avg 2.31)

Established 3.0 – 3.99

0 of 12

Strong 4.0 – 4.99

0 of 12

Exemplary 5.0

0 of 12

Methodology how scores were computed

RAISE scores are drawn from the raise_posture.categories[].score fields in each baseline findings JSON. The weighted overall is raise_posture.weighted_overall — Σ(score × weight) across the six categories, where Implement Zero Trust has weight 0.25 and the other five have weight 0.15. Scores are from static source scans run at the version pinned in the baseline directory name; they reflect the agent's posture at the time of the scan, not the current state of the target repositories. See tests/baselines/README.md for the full baseline methodology, and the RAISE Framework guide for the maturity model itself.

Generated July 12, 2026, 21:47 UTC · Built on the Praxen v1.1-claude48 baseline set · github.com/open-agent-ai-security/praxen