RAISE Score Distribution Across Praxen Baseline Targets

Per-target scores and population distributions for all six RAISE categories, drawn from the frozen tests/baselines/v0.7.7-claude48/ baseline set.

12targets analyzed
12with RAISE scores
1.55population avg weighted overall

Per-target RAISE scores sorted by weighted overall, highest first

Each row shows all six RAISE category scores (0–5) and the weighted overall for one baseline target. Implement Zero Trust carries weight 0.25; the other five carry 0.15 each. Hover any cell to see the maturity label.

Agent / Target DomainKnowledgeZero TrustSupply ChainRed TeamMonitor Weighted
Overall
Hermes (Agent + Desktop)
Multi-component LLM agent + desktop control layer
333433 3.15
yaah
Yet Another Agent Harness (MCP coverage)
322313 2.30
Aider
Interactive pair-programming agent
322312 2.15
Deep Agents CLI
LangChain agent harness (MCP coverage)
332311 2.15
OpenHands
Autonomous software-engineering platform
311312 1.75
LangChain SQL Agent
create_sql_agent toolkit
221211 1.45
AutoGen Code Executor
Microsoft AutoGen code-executor family
321210 1.45
Sweep
GitHub issue-to-code agent
211202 1.30
OpenAI Customer Service
OpenAI Agents SDK example
220211 1.20
HelperBot
Damn Vulnerable AI Agent — training agent
110110 0.60
Devika
Autonomous software engineer
110101 0.60
FinBot
OWASP Agentic AI CTF — invoice processor
100110 0.45

Sorted by weighted overall (highest first). Cell colour: 0 Absent5 Exemplary. Hover for maturity label.

Score distribution by RAISE component how many targets land at each maturity level

For each of the six RAISE categories, the bars show how many of the 12 scored targets received each maturity level (0 Absent through 5 Exemplary). The population average and standard deviation are shown per category.

Limit Your Domain weight 15%  ·  population avg 2.25 ±0.87
0 Absent
0 of 12
1 Ad hoc
3 of 12
2 Partial
3 of 12
3 Established
6 of 12
4 Strong
0 of 12
5 Exemplary
0 of 12
Balance Your Knowledge Base weight 15%  ·  population avg 1.67 ±0.89
0 Absent
1 of 12
1 Ad hoc
4 of 12
2 Partial
5 of 12
3 Established
2 of 12
4 Strong
0 of 12
5 Exemplary
0 of 12
Implement Zero Trust weight 25%  ·  population avg 1.08 ±1.00
0 Absent
4 of 12
1 Ad hoc
4 of 12
2 Partial
3 of 12
3 Established
1 of 12
4 Strong
0 of 12
5 Exemplary
0 of 12
Manage Your Supply Chain weight 15%  ·  population avg 2.25 ±0.97
0 Absent
0 of 12
1 Ad hoc
3 of 12
2 Partial
4 of 12
3 Established
4 of 12
4 Strong
1 of 12
5 Exemplary
0 of 12
Build an AI Red Team weight 15%  ·  population avg 1.00 ±0.74
0 Absent
2 of 12
1 Ad hoc
9 of 12
2 Partial
0 of 12
3 Established
1 of 12
4 Strong
0 of 12
5 Exemplary
0 of 12
Monitor Continuously weight 15%  ·  population avg 1.33 ±1.07
0 Absent
3 of 12
1 Ad hoc
4 of 12
2 Partial
3 of 12
3 Established
2 of 12
4 Strong
0 of 12
5 Exemplary
0 of 12

Population weighted-overall distribution across all 12 targets

Histogram of weighted-overall scores grouped by maturity band. Each band covers one integer step (0.0–0.99 = Ad hoc, etc.).

Population: 12 targets  ·  Average: 1.55  ·  Range: 0.453.15
Absent 0.0 – 0.99
3 of 12 (avg 0.55)
Ad hoc 1.0 – 1.99
5 of 12 (avg 1.43)
Partial 2.0 – 2.99
3 of 12 (avg 2.20)
Established 3.0 – 3.99
1 of 12 (avg 3.15)
Strong 4.0 – 4.99
0 of 12
Exemplary 5.0
0 of 12

Methodology how scores were computed

RAISE scores are drawn from the raise_posture.categories[].score fields in each baseline findings JSON. The weighted overall is raise_posture.weighted_overall — Σ(score × weight) across the six categories, where Implement Zero Trust has weight 0.25 and the other five have weight 0.15. Scores are from static source scans run at the version pinned in the baseline directory name; they reflect the agent's posture at the time of the scan, not the current state of the target repositories. See tests/baselines/README.md for the full baseline methodology, and the RAISE Framework guide for the maturity model itself.