praxen

Changelog

All notable changes to Praxen will be recorded here. Format roughly follows Keep a Changelog and Semantic Versioning. Entries for versions prior to 0.7.0 describe the project under its former name, Praxa, and at its former home, github.com/Exabeam/deckard. Issue references in those entries link back to Exabeam/deckard because that is where the issues themselves live — the 0.7.0 rename was a fresh-repo cutover, not a transfer.


[0.7.8] — 2026-05-31

Opus 4.8 reference re-baseline, a twelfth (multi-component) baseline target, a docs-first remit generator, and dev/main drift guards. The scan engine is unchangedschema.py, render.py, manifest_to_findings.py, and the four knowledge bases are byte-identical to 0.7.7, and schema_version stays "2.0". The one SKILL.md change is confined to the Pre-flight remit-authoring guidance (how the skill drafts a remit on request) and is scan-orthogonal: it does not change the 12-step analysis procedure, the committed baselines, or any scan’s output.

Added

Changed

Unchanged on purpose

Notes

[0.7.7] — 2026-05-29

SKILL polish + a fresh baseline set. Two non-breaking SKILL improvements (multi-component remit guidance + source-inferred log files), an additive schema change, and a cold full-suite re-scan against all eleven targets to verify the SKILL deltas land non-breakingly. Findings engine remains the same shape: manifest_to_findings.py and the four knowledge bases are byte-identical to 0.7.6; the new behavior is calibration-only.

Added

Changed

Unchanged on purpose

Calibration notes

[0.7.6] — 2026-05-28

OWASP LLM and Agentic Top 10 coverage visualizations. Every Praxen report now carries two full-bleed 5×2 coverage grid sections — one per framework — showing the top-three most-severe findings per category as anchored chips, with empty cells rendered as “No findings” so the grid reads as a coverage map rather than a hit list. A new cross-baseline aggregate report tool ships under tests/baselines/ for reviewing the suite’s coverage across all eleven targets. No findings-engine changeSKILL.md, schema.py, manifest_to_findings.py, the knowledge bases, and every committed findings JSON are byte-identical to 0.7.5; the grids are a new view over data that has been in the canonical JSON since schema 2.0. No migration required.

Added

Changed

Unchanged on purpose

Notes


[0.7.5] — 2026-05-27

GitHub org rename: open-ai-securityopen-agent-ai-security. Trademark-driven rename of the org, isolated in its own release so the migration is unambiguous. No functional changes — no schema change, no scoring change, no SKILL change, no renderer-logic change. The Praxen pipeline behaves identically to 0.7.4; only the canonical URLs that Praxen emits and the plugin marketplace identifier change.

Migration for existing installs. The plugin install identifier changes from praxen@open-ai-securitypraxen@open-agent-ai-security. GitHub auto-redirects the old URLs, so existing installs keep working, but to land on the canonical identifier:

/plugin uninstall praxen@open-ai-security
/plugin marketplace remove open-ai-security
/plugin marketplace add open-agent-ai-security/praxen
/plugin install praxen@open-agent-ai-security

Changed

Unchanged on purpose

Notes


[0.7.4] — 2026-05-27

Deterministic Step 10 + Step 9.9 emission discipline + version-source cleanup + v0.7.4 re-baseline. Three workstreams in one release. (1) SKILL.md Step 10 converts from LLM-composed JSON to a deterministic stdlib Python converter (manifest_to_findings.py) that mechanically translates the Step 9.9 draft manifest into the canonical findings JSON. This eliminates the historical principal stall site at scale (LLM JSON-emission bursts under load) and the class of arithmetic-mismatch bugs surfaced by the v0.7.3 external-validation work (Lobot r1 weighted_overall mismatch; Lobot r2 stat_counts mismatch). (2) Step 9.9 picks up chunked-write emission discipline (skeleton + Edit-append + heartbeats) and Step 3 adds a one-line source-read pacing guard — together they close the remaining 600 s subagent-watchdog vectors at the SKILL layer. (3) praxen_version and schema_version are now populated by the converter from canonical sources (.claude-plugin/plugin.json and schema.SCHEMA_VERSION respectively); the SKILL no longer writes them, and build.sh sanity-checks that PRAXEN_SPEC.md, plugin.json, and marketplace.json agree. Re-baselined as tests/baselines/v0.7.4-sequential/; the v0.7.0 baseline is retired.

No schema change. No scoring-formula change. findings.schema.json and schema.py are unchanged in shape; the RAISE weights, the per-rule audit methodology, and the renderer’s analytical logic are all preserved. The baselines move because the SKILL’s calibration discipline tightened (Medium-tier preservation, compound-signal escalation), not because the scoring math changed.

Added

Changed

Fixed

Notes


[0.7.3] — 2026-05-25

Subagent watchdog stall fix + skill-assisted Worker Remit authoring + HTML report v2 polish. Four workstreams in one release, all upstream of the canonical findings JSON — the schema, the scoring, and the analysis methodology are unchanged. The headline is the SKILL emission discipline that eliminates the silent-compose bursts that historically tripped the 600 s subagent no-progress watchdog on long scans; the same SKILL now also drives Worker Remit authoring from source/docs/description in addition to consuming an existing remit. The report layer picks up a masthead, jump-nav, and collapsible finding cards. The test plan is codified into three named tiers with a committed home for pre-release Full Suite Runs.

Added

Changed

Notes


[0.7.2] — 2026-05-22

Reporting-layer overhaul — a redesigned HTML report. The findings engine, the canonical JSON schema, and the RAISE scoring are unchanged; this release reworks how the report looks and links. Because the renderer and template output changed (intentionally), the committed golden fixtures and all eleven v0.7.0-sequential regression baselines were re-rendered from their unchanged JSON, and the two published examples were freshly re-scanned under 0.7.2.

Added

Changed

Notes


[0.7.1] — 2026-05-22

Three batches of operator field feedback rolled in as a single patch release. No scan logic, schema shape, or scoring behaviour changed — all changes are documentation and a small renderer tweak. The eleven regression baselines re-render byte-identical from their JSON (the renderer change is stdout-only). Patches against 0.7.0; same plugin name, same install path, same canonical findings JSON.

Added

Changed

Notes


[0.7.0] — 2026-05-20

First public soft-launch release. The project has been renamed Praxa → Praxen and moved to its new home at github.com/open-ai-security/praxen. No scan logic, schema shape, or scoring behaviour changed — this release is the rename + relocation + a small batch of pre-launch test-suite improvements that arrived alongside it. The eleven regression baselines were re-frozen under the new name (same cold runs, same findings).

Renamed

Moved

Internal

[0.6.3] — 2026-05-19

Draft manifest, plus two fixes from field testing. The headline is the draft manifest: a long scan can exhaust the coding agent’s context window and auto-compact mid-analysis, silently degrading the report — findings gathered early get summarized away before the JSON is written. The skill now checkpoints its full synthesis to disk before writing the report, so a compacted run is recoverable rather than silently incomplete (a partial mitigation for the single-pass “unsupported arc”, issue #27). Alongside it, two bugs that field scans surfaced: praxa_version was read from the analyzed agent’s plugin.json instead of recording Praxa’s own version, and policy_rule_ids / policy_rule_text were mandatory on every finding even when a finding doesn’t trace to a remit rule. The findings schema is unchanged (still "2.0").

Added

Changed

Fixed

Notes

[0.6.2] — 2026-05-18

Plugin-marketplace install fix, plus the v0.6.1 field-review cheap wins. /plugin marketplace add open-ai-security/praxen was rejected by the Claude Code marketplace schema validator — .claude-plugin/marketplace.json declared the plugin source as a bare "." where the schema requires a "./"-prefixed relative path — so the marketplace-install path silently never worked for any tagged release (the unzip-the-release path was unaffected). 0.6.2 fixes that, and bundles in the small robustness and clarity fixes from the v0.6.1 field review (one executing-LLM ran the full pipeline against a workspace and wrote up what it hit). No changes to detection logic, RAISE scoring, the Worker Remit structure, or the findings schema (still "2.0").

Added

Changed

Fixed

Internal

Notes

[0.6.1] — 2026-05-12

MCP coverage + render robustness. The MCP Server Evaluation path — discovery → knowledge/KB_MCP_SECURITY.md → the MCP minimum-bar checklist → mcp-tagged findings, the machinery itself introduced with the knowledge base in 0.3.0 — is now widened beyond Claude-style filenames, exercised end-to-end against two real repos, and held under regression; the renderer is hardened against HTML entities in prose; and the test harness now validates every committed regression baseline. No changes to the detection logic, RAISE scoring, Worker Remit structure, or the findings schema (still "2.0").

Added

Changed

Notes

[0.6.0] — 2026-05-12

Relicensed to Apache-2.0. Praxa moves from the Exabeam commercial / by-permission license to the Apache License, Version 2.0 — it’s now open source. No functional changes to the skill, detection logic, RAISE scoring, Worker Remit structure, or the findings schema (still "2.0"); this is a licensing / metadata release. (praxa_version bumps 0.5.00.6.0.)

Changed

Added

[0.5.0] — 2026-05-11

Phase 3 of the V2 harvest: GitHub Actions CI + release automation, golden-file render fixtures. No changes to the skill, the detection logic, the RAISE scoring, the Worker Remit structure, the findings schema (still "2.0"), or the report — this is a tooling / repo-infrastructure release. (There is no 0.4.0: Phase 2’s parallel map-reduce analysis path was prototyped and gated, found slower / less accurate / ~6× more expensive than the sequential pipeline, and dropped — see tests/baselines/v0.4-parallel/GATE-NOTES.md and design/DEFERRED.md.)

Added

[0.3.0] — 2026-05-11

Phase 1 of the V2 harvest: merged findings schema (schema_version: "2.0").

First implementation phase of design/V2_HARVEST_PLAN.md — adopts the better-structured findings model from PR #1 onto the v0.2.0 pipeline, while keeping the parts of the v0.2.0 schema PR #1 dropped. Substance of detection / RAISE scoring / Worker Remit structure / the report’s section order is unchanged; this is a JSON-shape release.

Added

Changed (BREAKING — see migration note below)

Migration

A v0.2.0 (schema_version: "1.0") findings JSON does not load in v0.3.0. For each finding:

  1. Replace "evidence": ["file:line — snippet", ...] with "evidence": [{ "file": "...", "line": <int or null>, "snippet": "..." }, ...].
  2. Replace "recommended_action": "..." with "recommended_actions": ["..."].
  3. Bump "schema_version" to "2.0" and "praxa_version" to "0.3.0".
  4. Optionally add a "description" field (omit if you have nothing beyond summary).

Updated

Unchanged


[0.2.0] — 2026-05-11

Render-pipeline refactor: the report is now generated by code, not by the LLM.

The skill used to produce all three output files itself, including hand-substituting ~30 placeholders and several repeat blocks into the 800-line HTML template — slow (8–12 min/render), unreliable (mid-render stalls), and a poor use of LLM tokens. It now emits a single canonical findings JSON and a bundled deterministic Python renderer turns that JSON into the HTML report and the .txt summary. Same JSON in, byte-identical output, every time. Findings counts, severity rules, RAISE scoring model, OWASP mappings, and Worker-Remit structure are unchanged in substance; the RAISE scoring discipline was refined (see below).

Added

Changed

Unchanged (explicitly preserved)


[0.1.0] — 2026-05-01

First internal Praxa release.

This release is the rebranding of the project formerly known as the Exabeam Deckard Agent Security Scanner to Praxa — agent behavior verifier. No detection logic, severity rules, RAISE scoring, OWASP mappings, or evaluation behavior changed in this release. The change is naming, terminology, and documentation only.

Renamed

Terminology

Repositioned

Unchanged (explicitly preserved)

Deferred to a later release


Pre-Praxa history (legacy: Deckard releases)

For reference. Detail kept short — these were internal releases under the prior name.

[0.7.0] — 2026-04-24 (Deckard)

[0.6.6] — 2026-04-24 (Deckard)

[0.6.5] — 2026-04-23 (Deckard)

[0.6.4] and earlier (Deckard)