Praxen is an LLM-driven analyzer working from the evidence you provide. It will sometimes produce findings that are wrong, miscalibrated, or inapplicable to your context. This page describes what to do when that happens — and how to tell whether the problem is the analysis, the remit, or the agent.
There are four kinds of wrong:
The analysis missed context. Praxen only sees the evidence you point it at. If the input doesn’t include a security control that exists elsewhere — in a sibling repository, a runtime configuration injected at deploy time, a sidecar proxy — Praxen will report the gap honestly. The finding is technically correct given the input but not a real risk.
The remit is vague. A Praxen finding labeled “Vague Policy” means the rule isn’t specific enough to verify. The fix is to tighten the rule, not to dispute the finding. See Writing Worker Remits.
The reasoning is incorrect. Less common, but real. The LLM occasionally misreads code, conflates two similar patterns, or escalates a Medium-severity issue to High based on a chain that doesn’t actually exist. These are bugs in the specific analysis, not in Praxen as a whole.
You accept the risk. The finding is correct, the rule is specific, the implementation gap is real — but you’ve decided the risk is acceptable for this agent in this deployment. That’s a legitimate operator decision; it doesn’t make the finding wrong.
Each of these has a different remediation path.
If the finding is correct given the evidence Praxen saw but wrong because Praxen didn’t see something it should have:
Enforcement Not Possible (in code) rather than as a Gap.If a rule shows up as Vague Policy in the Remit Coverage section, or if findings on that rule have low severity-confidence calibration:
A common pattern: an early-draft remit produces ten Vague Policy entries. After three iterations of tightening, the count drops to one or two — and the genuine implementation gaps become much sharper.
If the finding misreads the code, miscategorizes the issue, or asserts a chain that doesn’t actually exist:
Verify the cited evidence. Open the file at the line numbers cited in the finding. If the code Praxen describes doesn’t match what’s actually there, the finding is wrong.
Re-run the analysis. LLM analyses are not perfectly deterministic. A second run with the same inputs will sometimes produce a corrected finding (or a different incorrect finding — both happen).
Tighten the input. If the evidence directory contains noise (test fixtures with deliberate vulnerabilities, archived old code, vendored dependencies), exclude it. Praxen will reason more sharply over a focused workspace.
If the issue persists across runs, the analysis methodology may have a calibration bug. File it through whatever channel the project’s release notes name. Include the finding ID, the cited evidence, what’s actually in the code, and what you believe the correct severity (or absence of finding) should be.
Do not edit the JSON or HTML output to “correct” the finding — those are the analysis artifacts and downstream consumers depend on them being unedited.
If the finding is correct but you’ve decided the risk is acceptable:
additionalProperties: false, validated by schema.py — so adding fields to a committed findings file would fail the next time anything re-validates it. Track risk acceptances in the consumer of choice: a sidecar <agent>-risk-acceptances.md, your ticketing system (Jira / Linear / GitHub issues with the PRAX-… finding ID in the title), or your governance / compliance register. Record who accepted, when, and why.Praxen is calibrated to highlight real risk; the methodology continues to evolve. Across releases you may see:
docs/RAISE.md. Updates to the rubric are noted in the project changelog and surface as score deltas across re-runs.A regression in a re-run analysis (new findings appearing, scores moving) is not necessarily a regression in the agent. Compare against the changelog before assuming the agent got worse.