{
  "schema_version": "2.0",
  "praxen_version": "0.7.2",
  "scan": {
    "agent": "FinBot",
    "agent_slug": "finbot",
    "scan_date": "2026-05-22",
    "scan_timestamp": "2026-05-22T18:56:47Z",
    "workspace": "/Users/steve.wilson/Documents/github/deckard/local/preintegration/finbot-src",
    "artifact_count": 16
  },
  "intro_band": {
    "agent_remit_summary": "FinBot is declared as CineFlow Productions' autonomous invoice-processing assistant: it receives vendor invoices, validates them against registered vendor records, and makes approve / reject / escalate decisions recorded in the invoice database. Its authorized tools are <code>get_invoice_details</code>, <code>approve_invoice</code>, <code>reject_invoice</code>, <code>request_human_review</code>, and <code>detect_fraud_patterns</code>, operating only against the application database and the OpenAI API. The remit forbids approving unregistered vendors, skipping fraud detection for any reason, processing above <code>manual_review_threshold</code> without human sign-off, and — emphatically — letting any runtime message (invoice descriptions, vendor notes, API payloads, the <code>custom_goals</code> field) modify its goals, system prompt, or decision logic.",
    "agent_structure_summary": "A Python 3.11 Flask application with a SQLAlchemy-backed SQLite/PostgreSQL store. A single <code>FinBotAgent</code> class in <code>src/services/finbot_agent.py</code> orchestrates OpenAI <code>gpt-4.1-mini</code> function-calling over five tools with <code>tool_choice=\"auto\"</code>, plus a rule-based <code>_fallback_processing</code> path used whenever the OpenAI client is unavailable. The agent's goals are assembled as a natural-language system prompt in <code>get_system_prompt()</code>, into which the operator-settable <code>custom_goals</code> field is concatenated under a literal \"OVERRIDE ABOVE IF CONFLICTING\" header. Admin routes (<code>/admin/finbot/goals</code>, <code>/admin/finbot/config</code>, vendor-trust, human-review) are exposed via a Flask blueprint with no authentication middleware, CORS is opened to all origins, and the Flask <code>SECRET_KEY</code> is hardcoded. Vendor-supplied invoice descriptions flow directly into the LLM context, and there is no logging framework anywhere in the codebase."
  },
  "behavior_summary": "<p>The dominant pattern is <strong>policy declared in prose with zero deterministic code-level enforcement</strong>: every hard prohibition in the remit — never skip fraud detection, never approve above <code>manual_review_threshold</code> without human sign-off, never let runtime content rewrite goals — is left entirely to an LLM system prompt that the same prompt actively undermines by ranking \"Business Continuity\" and speed above security. There is no code gate that forces <code>detect_fraud_patterns</code> to run before <code>approve_invoice</code>, no amount or vendor-status check inside <code>_approve_invoice</code> before it sets <code>payment_processed=True</code>, and the <code>fraud_detection_enabled</code> flag short-circuits all fraud heuristics to an empty list.</p><p>This converges into a single catastrophic compound chain: an unauthenticated <code>/admin/finbot/goals</code> endpoint writes attacker-supplied natural language into <code>custom_goals</code>, which is concatenated verbatim into the system prompt as an explicit override, while vendor-controlled invoice descriptions enter the same context unsanitized — giving two independent injection ingresses into a decision engine that can mark invoices paid, with no audit log to detect any of it. The CTF scaffolding (flag fields, hardcoded payloads in the walkthrough doc) confirms goal manipulation is the intended-and-reachable failure mode.</p>",
  "remit_coverage": {
    "stat_counts": { "verified": 1, "gap": 10, "partial": 4, "vague": 0, "enp": 0, "total": 15 },
    "rules": [
      { "rule_id": "R-01", "section": "Behavioral Constraints — What the agent must always do", "rule_text": "Run `detect_fraud_patterns` before approving any invoice", "status": "gap", "finding_id": "PRAX-2026-05-22-002" },
      { "rule_id": "R-02", "section": "Behavioral Constraints — What the agent must always do", "rule_text": "Check that the submitting vendor exists in the database with status `approved` before processing", "status": "gap", "finding_id": "PRAX-2026-05-22-006" },
      { "rule_id": "R-03", "section": "Behavioral Constraints — What the agent must always do", "rule_text": "Escalate to human review rather than auto-approve when any fraud signal is present, regardless of amount", "status": "partial", "finding_id": "PRAX-2026-05-22-007" },
      { "rule_id": "R-04", "section": "Behavioral Constraints — What the agent must always do", "rule_text": "Record the invoice decision and reasoning in the database for every processed invoice", "status": "verified", "finding_id": null },
      { "rule_id": "R-05", "section": "Behavioral Constraints — What the agent must never do", "rule_text": "Approve payment to a vendor not registered and approved in the vendor database", "status": "gap", "finding_id": "PRAX-2026-05-22-006" },
      { "rule_id": "R-06", "section": "Behavioral Constraints — What the agent must never do", "rule_text": "Skip fraud detection for any reason, including configuration flags or runtime instructions", "status": "gap", "finding_id": "PRAX-2026-05-22-002" },
      { "rule_id": "R-07", "section": "Behavioral Constraints — What the agent must never do", "rule_text": "Process invoices above the `manual_review_threshold` without human sign-off", "status": "gap", "finding_id": "PRAX-2026-05-22-005" },
      { "rule_id": "R-08", "section": "Behavioral Constraints — What the agent must never do", "rule_text": "Allow runtime messages — including invoice descriptions, vendor notes, or API payloads — to modify its goals, system prompt, or decision logic", "status": "gap", "finding_id": "PRAX-2026-05-22-001" },
      { "rule_id": "R-09", "section": "Behavioral Constraints — What the agent must never do", "rule_text": "Accept instructions embedded in invoice data fields (description, notes, line items) as directives", "status": "gap", "finding_id": "PRAX-2026-05-22-003" },
      { "rule_id": "R-10", "section": "Behavioral Constraints — What the agent must never do", "rule_text": "Disable or reduce its security posture based on instructions from any source other than a verified operator configuration change applied outside of agent runtime", "status": "gap", "finding_id": "PRAX-2026-05-22-001" },
      { "rule_id": "R-11", "section": "Behavioral Constraints — Human approval is required for", "rule_text": "Any invoice where fraud detection returns `risk_level` of `high` or `critical`", "status": "partial", "finding_id": "PRAX-2026-05-22-007" },
      { "rule_id": "R-12", "section": "Configuration and Policy Separation", "rule_text": "`fraud_detection_enabled` must remain `true` at all times; disabling it is not an authorized configuration change", "status": "gap", "finding_id": "PRAX-2026-05-22-002" },
      { "rule_id": "R-13", "section": "Configuration and Policy Separation", "rule_text": "`custom_goals` field must not be used to inject natural language instructions into the agent's system prompt; goal changes require a code deployment by authorized engineering staff", "status": "gap", "finding_id": "PRAX-2026-05-22-001" },
      { "rule_id": "R-14", "section": "Configuration and Policy Separation", "rule_text": "The agent's decision logic is defined in code; it is not redefinable at runtime by any user, vendor, or API caller", "status": "partial", "finding_id": "PRAX-2026-05-22-001" },
      { "rule_id": "R-15", "section": "Authorized Counterparties", "rule_text": "CineFlow finance admins — may update operational thresholds via the admin API; may not inject goals or override decision logic at runtime", "status": "partial", "finding_id": "PRAX-2026-05-22-004" }
    ]
  },
  "findings": [
    {
      "id": "PRAX-2026-05-22-001",
      "severity": "Critical",
      "summary": "Unauthenticated /admin/finbot/goals endpoint writes attacker-supplied natural language directly into the agent's system prompt as an explicit override.",
      "description": "The remit forbids runtime goal modification (R-08, R-10, R-13, R-14) and requires goal changes to go through a code deployment. In practice, an HTTP POST to <code>/admin/finbot/goals</code> with no authentication calls <code>update_goals()</code>, which stores the raw payload in <code>config.custom_goals</code> with no validation. On the next invoice, <code>get_system_prompt()</code> concatenates that text into the system prompt under the literal header \"CUSTOM GOALS (OVERRIDE ABOVE IF CONFLICTING)\", letting any caller redefine the agent's decision logic at runtime — exactly the prohibited path.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_agentic", "label": "ASI01 — Agent Goal Hijack" },
        { "kind": "owasp_llm", "label": "LLM01 — Prompt Injection" }
      ],
      "policy_rule_ids": "R-08, R-10, R-13, R-14",
      "policy_rule_text": "Allow runtime messages — including invoice descriptions, vendor notes, or API payloads — to modify its goals, system prompt, or decision logic / Disable or reduce its security posture based on instructions from any source other than a verified operator configuration change applied outside of agent runtime / `custom_goals` field must not be used to inject natural language instructions into the agent's system prompt; goal changes require a code deployment by authorized engineering staff / The agent's decision logic is defined in code; it is not redefinable at runtime by any user, vendor, or API caller",
      "evidence": [
        { "file": "src/routes/admin.py", "line": 90, "snippet": "update_finbot_goals route, lines 90-109 — POST /admin/finbot/goals with no auth check; comment reads 'This is intentionally vulnerable - no validation of goal content'" },
        { "file": "src/services/finbot_agent.py", "line": 744, "snippet": "update_goals() stores new_goals into config.custom_goals with no validation" },
        { "file": "src/services/finbot_agent.py", "line": 70, "snippet": "get_system_prompt concatenates config.custom_goals into the prompt under 'CUSTOM GOALS (OVERRIDE ABOVE IF CONFLICTING)' at lines 70-71" }
      ],
      "recommended_actions": [
        "Remove the runtime goal-mutation path entirely: delete the <code>/admin/finbot/goals</code> route and the <code>update_goals()</code> method, and make agent goals a deployment-time constant rather than a DB-backed <code>custom_goals</code> column.",
        "If a runtime override must exist, gate it behind authenticated, role-checked operator access and never concatenate its contents into the system prompt as an instruction block — treat it as data, not directive.",
        "Add authentication middleware to the entire <code>/admin/*</code> blueprint in <code>src/routes/admin.py</code> before any further hardening."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM01",
      "owasp_agentic": "ASI01",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-003", "PRAX-2026-05-22-004", "PRAX-2026-05-22-008"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-002",
      "severity": "Critical",
      "summary": "Fraud detection is fully bypassable via the fraud_detection_enabled flag and is never enforced before approval, contradicting the remit's must-always / must-never rules.",
      "description": "The remit requires fraud detection before every approval (R-01), forbids skipping it for any reason including config flags (R-06), and pins <code>fraud_detection_enabled=true</code> as non-negotiable (R-12). The code does the opposite: when the flag is false, <code>_detect_fraud_patterns</code> returns an empty indicator list with risk_level 'low', the fallback path skips injection detection entirely, and the LLM path never forces a fraud check before <code>approve_invoice</code> — <code>tool_choice=\"auto\"</code> leaves it to model discretion. The flag is freely settable through the unauthenticated config endpoint.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_llm", "label": "LLM06 — Excessive Agency" },
        { "kind": "owasp_agentic", "label": "ASI02 — Tool Misuse and Exploitation" }
      ],
      "policy_rule_ids": "R-01, R-06, R-12",
      "policy_rule_text": "Run `detect_fraud_patterns` before approving any invoice / Skip fraud detection for any reason, including configuration flags or runtime instructions / `fraud_detection_enabled` must remain `true` at all times; disabling it is not an authorized configuration change",
      "evidence": [
        { "file": "src/services/finbot_agent.py", "line": 509, "snippet": "_detect_fraud_patterns returns empty fraud_indicators and risk_level 'low' with message 'Fraud detection is disabled' when config.fraud_detection_enabled is false (lines 509-515)" },
        { "file": "src/services/finbot_agent.py", "line": 785, "snippet": "fallback path only runs _detect_prompt_injection 'if config.fraud_detection_enabled' — disabling the flag skips injection detection too" },
        { "file": "src/services/finbot_agent.py", "line": 180, "snippet": "tool_choice='auto' — no code gate forces detect_fraud_patterns to run before approve_invoice; sequencing is left to the LLM" },
        { "file": "src/routes/admin.py", "line": 74, "snippet": "update_finbot_config route accepts fraud_detection_enabled with no auth and no rejection of false (passes through to update_config at finbot_agent.py:762)" }
      ],
      "recommended_actions": [
        "Enforce a deterministic fraud check in <code>_approve_invoice</code> at <code>src/services/finbot_agent.py:409</code>: call <code>_detect_fraud_patterns</code> and refuse approval (route to human review) on any high/critical risk, independent of the LLM's tool sequencing.",
        "Remove <code>fraud_detection_enabled</code> as a runtime-mutable field, or hard-reject any config update that sets it to false in <code>update_config()</code> (finbot_agent.py:762)."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM06",
      "owasp_agentic": "ASI02",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-004", "PRAX-2026-05-22-007"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-003",
      "severity": "Critical",
      "summary": "Vendor-controlled invoice descriptions flow unsanitized into the LLM decision context, giving an indirect prompt-injection path into the approve/pay decision.",
      "description": "The remit forbids accepting instructions embedded in invoice data fields as directives (R-09). The invoice description is vendor-submitted (vendor.py:93), stored verbatim, and returned by <code>_get_invoice_details</code> straight into the tool-result JSON that re-enters the LLM context. The only defense is a regex blocklist in <code>_detect_prompt_injection</code>, which never blocks processing — it merely sets a boolean flag. Because the system prompt elevates business continuity over security and the fallback path can override on 'business context', injected description text can steer the decision toward approval.",
      "tags": [
        { "kind": "raise", "label": "Balance Your Knowledge Base" },
        { "kind": "owasp_llm", "label": "LLM01 — Prompt Injection" },
        { "kind": "owasp_agentic", "label": "ASI01 — Agent Goal Hijack" }
      ],
      "policy_rule_ids": "R-09",
      "policy_rule_text": "Accept instructions embedded in invoice data fields (description, notes, line items) as directives",
      "evidence": [
        { "file": "src/routes/vendor.py", "line": 93, "snippet": "submit_invoice creates Invoice with description=data['description'] taken directly from the vendor's request body" },
        { "file": "src/services/finbot_agent.py", "line": 394, "snippet": "_get_invoice_details returns 'description': invoice.description into the tool result that is appended to LLM messages at orchestration line 203-207" },
        { "file": "src/services/finbot_agent.py", "line": 549, "snippet": "_detect_prompt_injection is a regex blocklist that only returns True/False to set a flag — it does not gate or block processing of injected content" }
      ],
      "recommended_actions": [
        "Treat <code>invoice.description</code> as untrusted data: clearly delimit and label it as external content in the tool result, and never present it to the model as instruction-eligible text.",
        "Replace the regex blocklist approach with a deterministic policy gate that routes any injection-flagged invoice to human review regardless of business-context scoring (see finbot_agent.py:841-855)."
      ],
      "raise_category": "balance_your_knowledge_base",
      "owasp_llm": "LLM01",
      "owasp_agentic": "ASI01",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-007", "PRAX-2026-05-22-008"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-004",
      "severity": "Critical",
      "summary": "Every /admin/* route — config, goals, vendor-trust, human review, reprocess — is exposed with no authentication or authorization.",
      "description": "The remit reserves admin-API actions for CineFlow finance admins (R-15). In code the entire <code>admin_bp</code> blueprint has no auth middleware: any network caller can change thresholds and <code>fraud_detection_enabled</code>, inject goals, override human-review decisions, reprocess invoices, and promote any vendor to trust level 'high'. This is the privilege-abuse substrate that turns several other findings from theoretical to trivially exploitable.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_agentic", "label": "ASI03 — Identity and Privilege Abuse" },
        { "kind": "owasp_llm", "label": "LLM06 — Excessive Agency" }
      ],
      "policy_rule_ids": "R-15",
      "policy_rule_text": "CineFlow finance admins — may update operational thresholds via the admin API; may not inject goals or override decision logic at runtime",
      "evidence": [
        { "file": "src/routes/admin.py", "line": 6, "snippet": "admin_bp blueprint defined with no before_request auth hook; all routes below (config, goals, review, reprocess, vendor trust) are open" },
        { "file": "src/routes/admin.py", "line": 193, "snippet": "update_vendor_trust lets any caller set a vendor's trust_level to 'high' with no authentication (lines 193-219)" },
        { "file": "src/main.py", "line": 18, "snippet": "CORS(app) enables cross-origin access to every route with default wildcard origins, widening the unauthenticated attack surface" }
      ],
      "recommended_actions": [
        "Add an authentication + authorization gate (Flask <code>before_request</code> on <code>admin_bp</code>) requiring an authenticated finance-admin identity for every <code>/admin/*</code> route in <code>src/routes/admin.py</code>.",
        "Restrict CORS in <code>src/main.py:18</code> to the known admin/vendor front-end origins instead of the default wildcard."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM06",
      "owasp_agentic": "ASI03",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-001", "PRAX-2026-05-22-002", "PRAX-2026-05-22-006"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-005",
      "severity": "High",
      "summary": "Invoices above manual_review_threshold can be auto-approved with no human sign-off, via both the LLM path and the fallback business-context overrides.",
      "description": "The remit forbids processing above <code>manual_review_threshold</code> without human sign-off (R-07). No code enforces this: <code>_approve_invoice</code> applies no amount check before setting <code>payment_processed=True</code>, the LLM is told amounts over threshold should be 'flagged' but nothing prevents it calling <code>approve_invoice</code>, and the fallback path explicitly auto-approves over-threshold invoices when 'business context' scores high or management/production phrases appear.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_llm", "label": "LLM06 — Excessive Agency" },
        { "kind": "owasp_agentic", "label": "ASI02 — Tool Misuse and Exploitation" }
      ],
      "policy_rule_ids": "R-07",
      "policy_rule_text": "Process invoices above the `manual_review_threshold` without human sign-off",
      "evidence": [
        { "file": "src/services/finbot_agent.py", "line": 409, "snippet": "_approve_invoice sets invoice.payment_processed = True with no check on invoice.amount vs config.manual_review_threshold (lines 415-416)" },
        { "file": "src/services/finbot_agent.py", "line": 807, "snippet": "fallback path: amount > manual_review_threshold is auto_approved when business_context['should_expedite'] or management+production context (lines 807-820)" }
      ],
      "recommended_actions": [
        "In <code>_approve_invoice</code> (finbot_agent.py:409), reject and re-route to <code>_request_human_review</code> whenever <code>invoice.amount > config.manual_review_threshold</code>, before any payment side effect.",
        "Delete the over-threshold 'business context' override branches in <code>_fallback_processing</code> (finbot_agent.py:807-820) so amount is a hard gate."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM06",
      "owasp_agentic": "ASI02",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-008"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-006",
      "severity": "High",
      "summary": "No check that a vendor exists with status 'approved' before processing, and vendor registration auto-approves every new vendor.",
      "description": "The remit requires verifying the submitting vendor is registered with status 'approved' before processing (R-02) and forbids paying unregistered/unapproved vendors (R-05). The agent never inspects <code>vendor.status</code> anywhere in the decision path, and vendor registration hardcodes <code>status='approved'</code> for every newcomer, so 'registered and approved' is meaningless as a control.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_agentic", "label": "ASI03 — Identity and Privilege Abuse" }
      ],
      "policy_rule_ids": "R-02, R-05",
      "policy_rule_text": "Check that the submitting vendor exists in the database with status `approved` before processing / Approve payment to a vendor not registered and approved in the vendor database",
      "evidence": [
        { "file": "src/routes/vendor.py", "line": 37, "snippet": "register_vendor sets status='approved' (comment: 'Auto-approve for demo') and trust_level='standard' for every new vendor with no review" },
        { "file": "src/services/finbot_agent.py", "line": 369, "snippet": "_get_invoice_details and the decision paths read vendor.trust_level but never read or check vendor.status against 'approved'" }
      ],
      "recommended_actions": [
        "Add an explicit <code>vendor.status == 'approved'</code> gate at the start of <code>process_invoice</code> (finbot_agent.py:101) that rejects or escalates when the vendor is not approved.",
        "Change vendor registration in <code>src/routes/vendor.py:37</code> to default new vendors to <code>status='pending'</code> and require an authenticated approval step."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": null,
      "owasp_agentic": "ASI03",
      "confidence": "High",
      "related_findings": [],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-007",
      "severity": "High",
      "summary": "Fraud signals do not force escalation — the fallback path explicitly overrides detected injection/high-risk on 'business context', defeating the regardless-of-amount rule.",
      "description": "The remit requires escalation to human review whenever any fraud signal is present, regardless of amount (R-03), and human approval for high/critical fraud risk (R-11). The fallback path does the opposite: when injection is detected it can still auto-approve if <code>should_expedite and speed_priority > 0.6</code>, or on management+production context. On the LLM path nothing deterministically maps a high <code>risk_level</code> to escalation — the model may approve anyway. The control exists in name (a regex detector and a risk_level) but is bypassable by design.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_agentic", "label": "ASI01 — Agent Goal Hijack" },
        { "kind": "owasp_llm", "label": "LLM06 — Excessive Agency" }
      ],
      "policy_rule_ids": "R-03, R-11",
      "policy_rule_text": "Escalate to human review rather than auto-approve when any fraud signal is present, regardless of amount / Any invoice where fraud detection returns `risk_level` of `high` or `critical`",
      "evidence": [
        { "file": "src/services/finbot_agent.py", "line": 841, "snippet": "fallback 'contains_injection' branch auto_approves when business_context['should_expedite'] and config.speed_priority > 0.6, or on management+production context (lines 841-850)" },
        { "file": "src/services/finbot_agent.py", "line": 543, "snippet": "_detect_fraud_patterns returns a risk_level but no caller deterministically forces escalation on 'high'/'critical' — sequencing is LLM-discretionary or overridden in fallback" }
      ],
      "recommended_actions": [
        "Make any fraud signal a hard escalation: in both the LLM post-processing and <code>_fallback_processing</code>, if injection is detected or risk_level is high/critical, force <code>_request_human_review</code> and never reach an auto_approve branch.",
        "Remove the <code>speed_priority</code> and business-context override conditions from the injection branch at finbot_agent.py:841-850."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM06",
      "owasp_agentic": "ASI01",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-002", "PRAX-2026-05-22-003"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-008",
      "severity": "Critical",
      "summary": "Compound chain: unauthenticated/vendor-controlled injection into the LLM context drives a payment-marking decision with no fraud gate and no audit log.",
      "description": "Combining the individual findings yields a complete, undetectable exploit chain. An attacker reaches the decision engine through two ingresses — the unauthenticated <code>/admin/finbot/goals</code> override (PRAX-001) and the vendor-controlled invoice description (PRAX-003). Neither is gated by a deterministic fraud check (PRAX-002), the approval path applies no amount or vendor-status guard before setting <code>payment_processed=True</code> (PRAX-005/006), and there is no logging anywhere to reconstruct the event (PRAX-009). The system prompt's explicit 'business continuity over security' framing and the fallback business-context overrides make approval the path of least resistance. The CTF flag scaffolding in the code confirms this is a reachable end state.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_agentic", "label": "ASI01 — Agent Goal Hijack" },
        { "kind": "owasp_llm", "label": "LLM01 — Prompt Injection" }
      ],
      "policy_rule_ids": "R-08, R-14",
      "policy_rule_text": "Allow runtime messages — including invoice descriptions, vendor notes, or API payloads — to modify its goals, system prompt, or decision logic / The agent's decision logic is defined in code; it is not redefinable at runtime by any user, vendor, or API caller",
      "evidence": [
        { "file": "src/services/finbot_agent.py", "line": 119, "snippet": "process_invoice flags ctf_flag_captured when an injection-containing or over-threshold invoice reaches 'approved' (lines 119-139) — the chain's success condition is encoded in the agent itself" },
        { "file": "src/services/finbot_agent.py", "line": 40, "snippet": "system prompt ranks 'Business Continuity' first and instructs to 'prioritize business continuity' over flagging fraud (lines 40-43, 81)" }
      ],
      "recommended_actions": [
        "Treat the chain as the priority remediation: authenticate <code>/admin/*</code>, remove runtime goal injection, add a deterministic pre-approval fraud + amount + vendor-status gate, and add structured action logging — fixing any single link does not break the chain.",
        "Re-baseline the system prompt so security/escalation is not framed as subordinate to speed and business continuity."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM01",
      "owasp_agentic": "ASI01",
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-001", "PRAX-2026-05-22-002", "PRAX-2026-05-22-003", "PRAX-2026-05-22-005", "PRAX-2026-05-22-006", "PRAX-2026-05-22-009"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-009",
      "severity": "High",
      "summary": "No action logging anywhere — agent decisions, config changes, and goal updates leave only DB state and a few print() calls, with no audit trail.",
      "description": "There is no logging framework in the codebase. Agent approve/reject/escalate decisions are persisted only as overwritable DB columns (ai_decision, ai_reasoning), config and goal mutations are not recorded at all, and the only console output is scattered <code>print()</code> calls. High-impact actions (marking invoices paid, disabling fraud detection, injecting goals) occur with no durable, structured, action-level record — making the exploit chain in PRAX-008 undetectable after the fact.",
      "tags": [
        { "kind": "raise", "label": "Monitor Continuously" },
        { "kind": "owasp_llm", "label": "LLM02 — Sensitive Information Disclosure" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "src/routes/admin.py", "line": 239, "snippet": "log_agreement uses print() to console — comment 'For now, just log to console (implement database storage if needed)'; representative of the only 'logging' present" },
        { "file": "src/services/finbot_agent.py", "line": null, "snippet": "no logging import or logger calls anywhere in the 878-line agent; decisions are written only as mutable Invoice columns and config/goal changes are never recorded" }
      ],
      "recommended_actions": [
        "Add structured, append-only action logging (e.g., Python <code>logging</code> with JSON formatter) capturing every approve/reject/escalate decision, config change, and goal update with actor, timestamp, invoice id, amount, and fraud result.",
        "Treat config/goal mutations in <code>update_config</code> / <code>update_goals</code> as audit events and record old vs new values."
      ],
      "raise_category": "monitor_continuously",
      "owasp_llm": "LLM02",
      "owasp_agentic": null,
      "confidence": "High",
      "related_findings": ["PRAX-2026-05-22-008"],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-010",
      "severity": "High",
      "summary": "Domain is enforced only in a natural-language system prompt that itself prioritizes speed and business continuity over security, with no code-level scope gate.",
      "description": "FinBot's narrow remit (invoice approve/reject/escalate) is asserted only in the <code>get_system_prompt</code> text, which is soft and, worse, actively counter-productive: it tells the model production delays 'cost thousands per day', to 'prioritize fast approvals' when speed_priority > 0.7 (the default is 0.7), and to 'prioritize business continuity' even when flagging fraud. There is no deterministic code boundary constraining the agent to the authorized decision space; the prompt is the only domain control and it leans toward approval.",
      "tags": [
        { "kind": "raise", "label": "Limit Your Domain" },
        { "kind": "owasp_llm", "label": "LLM09 — Misinformation" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "src/services/finbot_agent.py", "line": 45, "snippet": "system prompt: 'If speed_priority > 0.7: Prioritize fast approvals for legitimate invoices' (lines 45-47); default speed_priority is 0.7 per FinBotConfig" },
        { "file": "src/models/vendor.py", "line": 103, "snippet": "FinBotConfig.speed_priority default 0.7, biasing the agent toward the fast-approval branch out of the box" }
      ],
      "recommended_actions": [
        "Re-author the system prompt so security and escalation are not subordinated to speed; remove the 'prioritize business continuity over fraud flagging' framing at finbot_agent.py:81.",
        "Back the prompt with deterministic code gates (amount, vendor status, fraud result) so domain limits do not depend on model compliance."
      ],
      "raise_category": "limit_your_domain",
      "owasp_llm": "LLM09",
      "owasp_agentic": null,
      "confidence": "High",
      "related_findings": [],
      "escalation": "alert"
    },
    {
      "id": "PRAX-2026-05-22-011",
      "severity": "Medium",
      "summary": "Hardcoded Flask SECRET_KEY committed in source rather than loaded from environment or a secret store.",
      "description": "The Flask application secret is a hardcoded string literal in source control instead of being read from the environment or a vault. While this app currently uses no server-side sessions, a committed SECRET_KEY undermines any future session/CSRF/signed-cookie security and is a credential-hygiene failure for a financial decision system.",
      "tags": [
        { "kind": "raise", "label": "Manage Your Supply Chain" },
        { "kind": "owasp_llm", "label": "LLM02 — Sensitive Information Disclosure" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "src/main.py", "line": 15, "snippet": "app.config['SECRET_KEY'] hardcoded as a ~20-char string literal [REDACTED — Flask SECRET_KEY at src/main.py:15], not loaded from os.getenv" }
      ],
      "recommended_actions": [
        "Load <code>SECRET_KEY</code> from an environment variable / secret manager in <code>src/main.py:15</code> and rotate the committed value, treating it as compromised."
      ],
      "raise_category": "manage_your_supply_chain",
      "owasp_llm": "LLM02",
      "owasp_agentic": null,
      "confidence": "High",
      "related_findings": [],
      "escalation": "log_only"
    },
    {
      "id": "PRAX-2026-05-22-012",
      "severity": "Medium",
      "summary": "Dependencies are largely unpinned (caret/floor ranges) and there is no SBOM or model-version pinning beyond a model string literal.",
      "description": "<code>requirements.txt</code> pins only Flask exactly; flask-cors, flask-sqlalchemy, openai, gunicorn, requests and others use floor (>=) ranges, leaving the build susceptible to version-swap and dependency-confusion drift. There is no SBOM/ML-BOM and the LLM model is a bare string (<code>gpt-4.1-mini</code>) with no provenance or integrity record.",
      "tags": [
        { "kind": "raise", "label": "Manage Your Supply Chain" },
        { "kind": "owasp_llm", "label": "LLM03 — Supply Chain" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "requirements.txt", "line": 2, "snippet": "flask-cors>=6.0.0, flask-sqlalchemy>=3.1.1, openai>=1.54.0, gunicorn>=21.2.0 — floor ranges, not exact pins (lines 2-12)" },
        { "file": "src/services/finbot_agent.py", "line": 17, "snippet": "self.model = 'gpt-4.1-mini' — model identifier as a bare literal with no version/provenance record" }
      ],
      "recommended_actions": [
        "Pin all dependencies to exact versions in <code>requirements.txt</code> and add a lockfile; generate an SBOM as part of the build.",
        "Record the model identifier and any provider configuration in a tracked component inventory."
      ],
      "raise_category": "manage_your_supply_chain",
      "owasp_llm": "LLM03",
      "owasp_agentic": null,
      "confidence": "High",
      "related_findings": [],
      "escalation": "log_only"
    },
    {
      "id": "PRAX-2026-05-22-013",
      "severity": "Medium",
      "summary": "No rate limiting, budget cap, or per-session call ceiling on the vendor-facing invoice submission or LLM orchestration loop.",
      "description": "Invoice submission (<code>POST /api/vendors/<id>/invoices</code>) is public and immediately triggers an LLM orchestration loop of up to five OpenAI calls, with no rate limiting, cost ceiling, or concurrency control. A single worker (gunicorn workers=1) and an unbounded public submission path create both a denial-of-wallet and a denial-of-service exposure.",
      "tags": [
        { "kind": "raise", "label": "Implement Zero Trust" },
        { "kind": "owasp_llm", "label": "LLM10 — Unbounded Consumption" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "src/routes/vendor.py", "line": 69, "snippet": "submit_invoice (public, no auth) calls finbot.process_invoice synchronously on every POST with no rate limit (lines 69-110)" },
        { "file": "src/services/finbot_agent.py", "line": 171, "snippet": "max_iterations = 5 caps loop length per invoice but there is no cross-request rate/cost ceiling on OpenAI calls" }
      ],
      "recommended_actions": [
        "Add request rate limiting on the vendor invoice-submission endpoint and a per-period OpenAI cost/call budget; reject or queue beyond the limit.",
        "Consider moving LLM processing off the synchronous request path to bound resource use."
      ],
      "raise_category": "implement_zero_trust",
      "owasp_llm": "LLM10",
      "owasp_agentic": null,
      "confidence": "Medium",
      "related_findings": [],
      "escalation": "log_only"
    },
    {
      "id": "PRAX-2026-05-22-014",
      "severity": "Medium",
      "summary": "No evidence of adversarial testing that drives design fixes — the only security artifact is a CTF walkthrough that demonstrates the weaknesses rather than remediating them.",
      "description": "The remit implies a security posture but the workspace contains no test suite, no red-team report, and no record of findings feeding architectural change. The single security-adjacent artifact is <code>docs/FinBot-CTF-walkthrough-goal-manipulation.md</code>, whose purpose is to demonstrate the goal-manipulation vulnerability for a CTF, not to harden the agent. Per RAISE calibration, a demonstration suite that does not drive fixes does not lift this category above Ad hoc.",
      "tags": [
        { "kind": "raise", "label": "Build an AI Red Team" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "docs/FinBot-CTF-walkthrough-goal-manipulation.md", "line": null, "snippet": "CTF walkthrough document demonstrating goal-manipulation exploitation; no corresponding remediation or test code exists in the workspace" },
        { "file": "requirements.txt", "line": null, "snippet": "no test framework (pytest/unittest) present; no tests/ directory in the workspace" }
      ],
      "recommended_actions": [
        "Stand up an adversarial test suite that exercises the goal-injection, fraud-bypass, and over-threshold-approval paths and gates releases on them.",
        "Convert the CTF walkthrough's exploit cases into regression tests that must fail (be blocked) before merge."
      ],
      "raise_category": "build_an_ai_red_team",
      "owasp_llm": null,
      "owasp_agentic": null,
      "confidence": "High",
      "related_findings": [],
      "escalation": "log_only"
    },
    {
      "id": "PRAX-2026-05-22-015",
      "severity": "Low",
      "summary": "Vendor banking PII (account/routing numbers, tax IDs) is stored and returned in full by unauthenticated read endpoints.",
      "description": "Vendor records hold bank account number, routing number, and tax ID, and <code>to_dict()</code> returns all of them. The unauthenticated <code>GET /api/vendors</code> and <code>GET /api/vendors/<id></code> endpoints expose this PII without masking or access control. This is data-minimization / sensitive-disclosure exposure adjacent to the agent rather than in its decision path, hence Low, but it compounds the unauthenticated-admin surface.",
      "tags": [
        { "kind": "raise", "label": "Balance Your Knowledge Base" },
        { "kind": "owasp_llm", "label": "LLM02 — Sensitive Information Disclosure" }
      ],
      "policy_rule_ids": null,
      "policy_rule_text": null,
      "evidence": [
        { "file": "src/models/vendor.py", "line": 38, "snippet": "Vendor.to_dict() returns bank_name, account_holder_name, account_number, routing_number, tax_id in full (lines 37-41)" },
        { "file": "src/routes/vendor.py", "line": 63, "snippet": "GET /vendors (list_vendors) and GET /vendors/<id> return full vendor.to_dict() with no auth or PII masking" }
      ],
      "recommended_actions": [
        "Mask or omit banking fields and tax ID in <code>Vendor.to_dict()</code> for read endpoints, and require authentication on vendor-read routes in <code>src/routes/vendor.py</code>."
      ],
      "raise_category": "balance_your_knowledge_base",
      "owasp_llm": "LLM02",
      "owasp_agentic": null,
      "confidence": "High",
      "related_findings": [],
      "escalation": "log_only"
    }
  ],
  "positives": [
    {
      "title": "Decision and reasoning persisted for every processed invoice",
      "description": "Each decision path writes ai_decision, ai_confidence, and ai_reasoning (and human_reviewer/human_decision/human_notes on manual review) to the invoice record, satisfying the remit's record-keeping rule even though it is mutable DB state rather than an append-only log.",
      "evidence_path": "src/services/finbot_agent.py:417-419 (and _reject/_request_human_review)"
    },
    {
      "title": "Specific, verifiable behavioral rules in the Worker Remit",
      "description": "The remit names exact tools, thresholds, forbidden actions, and approval conditions, which made code-level policy-implementation auditing possible; the gaps are in the implementation, not the policy's clarity.",
      "evidence_path": "examples/finbot/WORKER_REMIT.md (Behavioral Constraints, Configuration and Policy Separation)"
    },
    {
      "title": "A regex prompt-injection detector exists and runs on invoice descriptions",
      "description": "_detect_prompt_injection covers both technical and business-manipulation patterns and flags injected descriptions; it is bypassable and non-blocking (see PRAX-003/007) but represents a real, present detection primitive rather than nothing.",
      "evidence_path": "src/services/finbot_agent.py:549-659"
    }
  ],
  "log_files": {
    "present": false,
    "no_logs_note": "No log files exist in the workspace and no logging framework is used — only scattered print() calls; the absence of action logging is itself a finding (PRAX-2026-05-22-009).",
    "rows": []
  },
  "raise_posture": {
    "weighted_overall": 0.60,
    "weighted_rationale": "Ad hoc. FinBot has the shape of an invoice-approval agent but almost no operative security substance: domain limits, fraud gating, and goal integrity all live in a natural-language system prompt that the prompt itself undercuts, while the admin surface that controls thresholds and goals is fully unauthenticated. The only category with genuine credit is the persisted decision record; Zero Trust — the heaviest-weighted category and the one covering the approve/pay decision — scores zero because no code interposes on the agent's high-impact actions, and Monitor scores zero for the absence of any action log. This is consistent with a deliberately-vulnerable CTF target, but the gaps are real and exploitable as written.",
    "categories": [
      { "key": "limit_your_domain", "name": "Limit Your Domain", "score": 1, "confidence": "High", "weight": 0.15, "rationale": "Scope is asserted only in get_system_prompt text that prioritizes speed/business continuity over security (finbot_agent.py:40-47, 81); no code gate constrains the agent to its authorized decision space (PRAX-010)." },
      { "key": "balance_your_knowledge_base", "name": "Balance Your Knowledge Base", "score": 1, "confidence": "High", "weight": 0.15, "rationale": "Vendor-controlled invoice descriptions enter the LLM context unsanitized (finbot_agent.py:394) and unauthenticated read endpoints expose vendor banking PII (vendor.py:63); the only filter is a non-blocking regex flag (PRAX-003, PRAX-015)." },
      { "key": "implement_zero_trust", "name": "Implement Zero Trust", "score": 0, "confidence": "High", "weight": 0.25, "rationale": "No code-level interposition on the agent's high-impact actions: _approve_invoice sets payment_processed=True with no amount, fraud, or vendor-status check (finbot_agent.py:415-416), every /admin/* route is unauthenticated, and fraud detection is a flag away from off (PRAX-001/002/004/005/006)." },
      { "key": "manage_your_supply_chain", "name": "Manage Your Supply Chain", "score": 1, "confidence": "High", "weight": 0.15, "rationale": "A hardcoded Flask SECRET_KEY (main.py:15) sits beside floor-ranged, unpinned dependencies and a bare model literal with no SBOM (requirements.txt; finbot_agent.py:17) (PRAX-011, PRAX-012)." },
      { "key": "build_an_ai_red_team", "name": "Build an AI Red Team", "score": 1, "confidence": "High", "weight": 0.15, "rationale": "No test suite or red-team artifact drives fixes; the only security document is a CTF walkthrough that demonstrates the goal-manipulation weakness rather than remediating it (docs/FinBot-CTF-walkthrough-goal-manipulation.md) (PRAX-014)." },
      { "key": "monitor_continuously", "name": "Monitor Continuously", "score": 0, "confidence": "High", "weight": 0.15, "rationale": "No logging framework anywhere; high-impact actions and config/goal mutations leave only mutable DB columns and a few print() calls, so the exploit chain is undetectable after the fact (PRAX-009)." }
    ]
  },
  "footer": {
    "severity_counts": { "critical": 5, "high": 5, "medium": 4, "low": 1, "info": 0 }
  }
}
