design/V2_HARVEST_PLAN.md §5 proposed porting PR #1’s parallel-analysis idea: six
RAISE-category mapper agents run concurrently, then a reducer agent does the
cross-category compound-signal reasoning and assembles the canonical findings JSON. The
hypothesis was ≈4–5× wall-clock on the analysis phase (the render is already deterministic
and ~0.08 s). §5.1 built the orchestrator (skills/behavior-verifier/parallel.py +
SKILL-PARALLEL.md, PR #5). §5.2 is the gate: run it on real targets, diff against
tests/baselines/v0.3-sequential/, time a controlled A/B, then make the §5.3 call
(ship-as-default / opt-in / drop).
This file is that gate’s record. Verdict: §5.3 outcome 3 — drop the parallel path.
| metric | v0.3-sequential | v0.4-parallel | |
|---|---|---|---|
| weighted RAISE | 0.60 | 0.60 | exact parity ✓ |
| per-category | (sums to 0.60) | LYD 1 / BYKB 0 / IZT 0 / MYSC 1 / BART 1 / MC 1 | in band ✓ |
| Critical themes | sys-prompt API-key embed; no path validation; no input validation; false-history accept; no audit logging | all present (+ supply-chain / CORS / rate-limit broken out) | superset coverage ✓ |
| findings | 13 (5C/4H/3M/1I) | 24 (3C/8H/9M/3L/1I) | count ~1.8×, severity skewed lower ✗ |
Timing (map phase, 6 mappers parallel, single-target, repo pre-cloned, analysis only): mapper durations 65 / 85 / 92 / 103 / 184 / 236 s → map wall-clock ≈ 236 s; then reducer ≈ ~5 min; render ≈ 0.08 s ⇒ parallel total ≈ ~9 min vs sequential ≈ ~4.5 min (the latter incl. clone — but even discounting the clone, parallel is not faster).
The weighted score landed dead-on because it comes straight off the six mapper category scores, which were sound. But the reducer under-consolidates (kept 24 of 23 merged mapper findings — essentially no dedup, +1 compound) and under-escalates vs the holistic analyst (Critical→High drift). Category-isolated mappers each surface their own slice; folding them into the “one root cause = one finding” view the holistic pass produces is exactly what the reducer is for, and it isn’t doing enough of it.
Repo pre-cloned, single-target (no contention), analysis + render only, sequential then parallel back to back.
| metric | v0.3-sequential (frozen) | sequential re-run (this A/B) | v0.4-parallel | |
|---|---|---|---|---|
| weighted RAISE | 0.45 | 0.45 | 0.85 | drifts +0.40 — edge of band, wrong direction ✗ |
| per-category | LYD1/BYKB0/IZT0/MYSC1/BART0/MC1 | same | LYD1/BYKB1/IZT1/MYSC1/BART0/MC1 | mappers inflate BYKB & IZT 0→1 ✗ |
| findings | 17 (7C/6H/3M/1L) | 13 (7C/6H) | 29 (4C/11H/8M/4L/2I) | count ~2×; Critical bar slipped 7→4 ✗ |
| Critical themes | empty sandbox stubs; raw subprocess.run; path-traversal write; no injection screening; unauth /api/settings; Netlify deploy |
all present | most present but downgraded; /api/settings and the arbitrary-file-read endpoints missing entirely |
coverage hole ✗ |
| wall-clock (analysis+render, pre-cloned, single-target) | — | 353 s (~5.9 min) | map phase max(122,124,135,78,45,171) = 171 s + reducer ~5 min ⇒ ~8 min | parallel SLOWER ✗ |
| token cost | ~136 K (one pass) | ~136 K | ~270 K mappers + a large reducer ≈ ~6× | much more expensive ✗ |
The reducer is a serial bottleneck that costs about as much as the whole sequential analysis. It ingests six detailed mapper reports and re-does the cross-category compound reasoning, severity calibration, dedup, and prose synthesis — that’s most of a holistic pass — and it runs after the map phase, not during it. So parallel ≈ map_phase(~3 min) + reduce(~5 min ≈ a full analysis) > sequential(~6 min). There is no architectural fix: the reduce step can’t be cheap because the cross-category reasoning is the entire point of the design. PR #1’s “≈4–5×” came from a one-target spot-check; it does not hold up.
shell=True”, “an undefined Jinja var means search_results don’t reach the coder” — and bump a genuinely-Absent category to Ad hoc. The reducer adopts the mapper scores rather than re-applying the downward calibration anchor against the compound picture (it was explicitly instructed to, and didn’t). HelperBot happened to land dead-on; Devika drifted +0.40./api/settings config-rewrite + API-key-leak (sequential: Critical; it’s on Devika’s must-catch list in tests/README.md) and the /api/get-browser-snapshot / /api/download-project-pdf arbitrary-file-read endpoints (sequential: High) — because nobody on the six-mapper roster owns “review the Flask/web API surface.” That’s not a tuning issue; splitting the analysis by RAISE category creates blind spots for cross-cutting issues by construction.Signal-to-noise. Parallel’s “extra” findings over sequential are one genuinely useful catch (browser_interaction scope creep) and a pile of Low/Info padding (src/experts/ stubs, GitHub client, RAG stub, curl |
bash toolchain, “ARCHITECTURE.md overstates auditing”, an undefined-Jinja-var nit, LOG_PROMPTS default). Net negative. |
The parallel map-reduce path is slower, less accurate, and ~6× more expensive than the sequential pipeline — on the small target (HelperBot) and on the worst-case-for-parallel big target (Devika). It fails outcome 1 (no speedup — it’s a slowdown) and it’s worse than outcome 2 (calibration genuinely drifts and there’s a structural coverage hole, not just “close but not there”). Per §5.3 outcome 3: drop it, close the phase, reclaim the effort. Phase 3 (GitHub Actions + automated testing) follows directly.
The §5.1 orchestrator build was sound engineering — the architecture just doesn’t pay off, for reasons inherent to “split the analysis six ways then re-merge” rather than anything fixable in the prompts. skills/behavior-verifier/parallel.py, skills/behavior-verifier/SKILL-PARALLEL.md, and tests/parallel/test_parallel.py stay recoverable on branch phase2/parallel-analysis (PR #5, closed not merged) and are listed in design/DEFERRED.md. Nothing parallel ships in a release.
The report artifacts the A/B produced (/tmp/praxa-helperbot-parallel/, /tmp/praxa-devika-par/, /tmp/praxa-devika-seq/) are not committed; this dir + this file are the record.