Per-target scores and population distributions for all six RAISE categories,
drawn from the frozen tests/baselines/v0.7.7-claude48/ baseline set.
Each row shows all six RAISE category scores (0–5) and the weighted overall for one baseline target. Implement Zero Trust carries weight 0.25; the other five carry 0.15 each. Hover any cell to see the maturity label.
| Agent / Target | Domain | Knowledge | Zero Trust | Supply Chain | Red Team | Monitor | Weighted Overall |
|---|---|---|---|---|---|---|---|
|
Hermes (Agent + Desktop)
Multi-component LLM agent + desktop control layer
|
3 | 3 | 3 | 4 | 3 | 3 | 3.15 |
|
yaah
Yet Another Agent Harness (MCP coverage)
|
3 | 2 | 2 | 3 | 1 | 3 | 2.30 |
|
Aider
Interactive pair-programming agent
|
3 | 2 | 2 | 3 | 1 | 2 | 2.15 |
|
Deep Agents CLI
LangChain agent harness (MCP coverage)
|
3 | 3 | 2 | 3 | 1 | 1 | 2.15 |
|
OpenHands
Autonomous software-engineering platform
|
3 | 1 | 1 | 3 | 1 | 2 | 1.75 |
|
LangChain SQL Agent
create_sql_agent toolkit
|
2 | 2 | 1 | 2 | 1 | 1 | 1.45 |
|
AutoGen Code Executor
Microsoft AutoGen code-executor family
|
3 | 2 | 1 | 2 | 1 | 0 | 1.45 |
|
Sweep
GitHub issue-to-code agent
|
2 | 1 | 1 | 2 | 0 | 2 | 1.30 |
|
OpenAI Customer Service
OpenAI Agents SDK example
|
2 | 2 | 0 | 2 | 1 | 1 | 1.20 |
|
HelperBot
Damn Vulnerable AI Agent — training agent
|
1 | 1 | 0 | 1 | 1 | 0 | 0.60 |
|
Devika
Autonomous software engineer
|
1 | 1 | 0 | 1 | 0 | 1 | 0.60 |
|
FinBot
OWASP Agentic AI CTF — invoice processor
|
1 | 0 | 0 | 1 | 1 | 0 | 0.45 |
Sorted by weighted overall (highest first). Cell colour: 0 Absent → 5 Exemplary. Hover for maturity label.
For each of the six RAISE categories, the bars show how many of the 12 scored targets received each maturity level (0 Absent through 5 Exemplary). The population average and standard deviation are shown per category.
Histogram of weighted-overall scores grouped by maturity band. Each band covers one integer step (0.0–0.99 = Ad hoc, etc.).
RAISE scores are drawn from the raise_posture.categories[].score fields in each
baseline findings JSON. The weighted overall is
raise_posture.weighted_overall — Σ(score × weight) across the six
categories, where Implement Zero Trust has weight 0.25 and the other five
have weight 0.15. Scores are from static source scans run at the version pinned in the
baseline directory name; they reflect the agent's posture at the time of the scan, not the
current state of the target repositories.
See tests/baselines/README.md for the full baseline methodology,
and the RAISE Framework guide for the maturity model itself.