Same Dockerfile shape as the sister: computer-use Ubuntu 22.04 + Node.js 20 + Claude Code CLI v2.1.45 + Flask 3.0 + mcp-vnc MCP server + Firefox ESR with EFS-backed profile. Flask state machine has 9 production states (IDLE → SETTING_UP → DATA_SET → AGENT1_RUNNING → (AGENT1_COMPLETE | AGENT1_ERROR) → AGENT2_RUNNING → (AGENT2_COMPLETE | AGENT2_ERROR)) — same shape as sister. The MOCK FEMA HTTP server is GONE and CHROME is GONE (only Firefox ESR remains). The Agent 1 script navigates to the live https://www.disasterassistance.gov/ and stops at the Login.gov page; the user completes 2FA inside the noVNC session; the user (or operator) calls /agent2/run. Includes aws_deployment/ CDK stacks (ECS, API Gateway, CloudFront, Secrets, Storage, Observability) and aws_demo_deploy/REPLIT_DEPLOY.md for Replit iframe integration. The /survivor-info/p1 + /survivor-info/p2 split endpoints are PRESENT (not removed as previously documented).

Role in the system: Production sibling of af-disaster-assistance-gov-agent — same Flask state machine, same per-survivor container, but with mock infrastructure removed and Firefox ESR as the only browser. Includes AWS CDK + Replit integration.

Surfaces:

Flask data-plane :5001 (same 8 endpoints as sister)
noVNC :6080 (vnc_embed.html for iframe)
VNC :5900
Firefox ESR with persisted profile in EFS
Claude Code CLI agent runtime
AWS CDK stacks under aws_deployment/
Replit integration docs (aws_demo_deploy/REPLIT_DEPLOY.md)

User workflows

Build + start
/health = ok
Submit survivor
Container ready
Run Agent 1
State → AGENT1_COMPLETE; manual user step required
Manual Login.gov
User authenticated
Run Agent 2
State → AGENT2_COMPLETE

API endpoints

GET/healthLiveness
POST/agent_healthLLM smoke test
GET/stateCurrent orchestration state
POST/survivor-infoSubmit full survivor JSON; generate scripts; launch Firefox to real FEMA URL
POST/survivor-info/p1Partial submission (pages 1-N before Login.gov)
POST/survivor-info/p2Partial submission (post-login pages)
GET/agent1/status_detailedVerbose Agent 1 status (logs, sentinels, retry counts)
GET/agent2/status_detailedVerbose Agent 2 status
POST/agent1/restartForce-restart Agent 1 subprocess
POST/agent2/restartForce-restart Agent 2 subprocess
POST/test/test_kup, /test/test_kup2, /test/force-statusTest-only endpoints for state-machine harness
POST/agent1/runSpawn Agent 1 subprocess
GET/agent1/statusPoll Agent 1
POST/agent2/runSpawn Agent 2 (only after manual login)
GET/agent2/statusPoll Agent 2

Third-party APIs

Anthropic Bedrock (or direct Claude API)
LLM backend for Claude Code CLI
https://www.disasterassistance.gov/
REAL FEMA application portal (target)
Login.gov
2FA between Agent 1 and Agent 2

Service dependencies

AWS ECS Fargate
Per-survivor task launch
AWS EFS
Firefox profile + work dir
AWS Secrets Manager
Bedrock + JWT + VNC keys
AWS API Gateway + Lambda (control plane)
Container manager (start/stop/list)
AWS CloudFront
Origin restriction + signed URLs (mTLS optional)
AWS DynamoDB
Container state + audit
Replit (frontend host)
Embeds noVNC iframe + calls control plane
af-backend-go-api
Alternative control-plane caller (production path)

Analysis

overall health3.4 / 5acceptable

4Module overview / clarity of intent

3External dependencies

3API endpoints

3Database schema

4Backend services

3WebSocket / real-time

3Frontend components

4Data flow clarity

4Error handling & resilience

3Configuration

3Data refresh patterns

3Performance

4Module interactions

4Troubleshooting / runbooks

4Testing & QA

3Deployment & DevOps

2Security & compliance

4Documentation & maintenance

3Roadmap clarity

af-fema-real-ai-agent — Prop-Build Analysis

Document Type: Critical Review & Analysis (companion to prop-build-template.md) Scope: Per-Repo / Per-Module Subject: af-fema-real-ai-agent (FEMA Real AI Agent — per-survivor Claude-computer-use container driving production DisasterAssistance.gov) Reviewer(s): Claude (automated code review) Date: 2026-04-09 Version: 0.1 Confidence Level: Medium What would raise confidence: running container locally, observing a full Agent 1 → Login.gov → Agent 2 run, access to CloudWatch logs + DynamoDB audit events, interview with Gordon, review of container_work_dir/fema-apply-agent/agent1.md + agent2.md prompts, and the CDK stacks under aws_deployment/.

Inputs Reviewed:

Prop-build doc: /Users/andres/src/af/af-analysis/data/af-fema-real-ai-agent.yaml
Companion docs: api-examples.md, data-flow.md, runbook.md, deployment.md
Source: /Users/andres/src/af/af-fema-real-ai-agent/fema_agent/src/web_service/{app.py,state.py,agent_runner.py} plus tree listing of survivor_api/, raw_builder/, intermediate_builder/, aws_deployment/, specs/021-mock-to-real-fema/.
Not executed; no prod metrics; agent prompt files (agent1.md/agent2.md) not inspected line-by-line.

Part A — Per-Repo / Per-Module Analysis

A.1 Executive Summary

Overall health: Functional, reasonably well-factored Python/Flask orchestrator around Claude Code CLI driving a real federal benefits site, but security posture is weak for something that handles FEMA IA PII and a live Login.gov session.
Top risk: No application-layer authentication on any Flask endpoint (app.py:110-564) combined with PII-laden survivor JSON flowing into prompt text that Claude then executes against disasterassistance.gov — a classic prompt-injection + PII-in-prompt surface. See A.6.5 and A.11.
Top win / thing worth preserving: Surgical Firefox profile lock cleanup that preserves cookies (app.py:42-84) and the resume-first / fresh-prompt retry escalation in agent_runner.restart_agent (agent_runner.py:164-184) — both are thoughtful, failure-mode-driven code worth propagating to the sister repo.
Single recommended next action: Put authenticated, signed-URL-only access in front of every Flask route (not just at CloudFront) and add an explicit survivor-data sanitizer before values are interpolated into as_agent1.txt / as_agent2.txt.
Blocking unknowns: The actual content of agent1.md / agent2.md (the prompts Claude receives) was not read; CDK auth/network posture was taken from the YAML summary only; no coverage/flake data.

A.2 Health Scorecard

#	Dimension	Score (1–5)	Justification
1	Module overview / clarity of intent	4	README + YAML + specs/021-mock-to-real-fema/ make the purpose and Agent1/Agent2 split unambiguous.
2	External dependencies	3	Hard dependency on Bedrock, Login.gov, live FEMA site; no abstraction to swap providers even though `LLM_PROVIDER` env hints at it.
3	API endpoints	3	Clean REST shape and error handlers (`app.py:92-106`), but ~16 routes all unauthenticated at app layer; test endpoints (`/test/test_kup*`, `/test/force-status`) live in the same app (`app.py:624,699,796` per YAML).
4	Database schema	3	DynamoDB tables (ContainerInstance, AuditEvent) per YAML; not inspected directly; in-container state is an in-memory dict in `state.py` — fine for per-survivor lifetime, but opaque.
5	Backend services	4	Clear separation: `survivor_api/` (parse), `raw_builder/`/`intermediate_builder/` (script gen), `web_service/agent_runner.py` (process mgmt), `state.py` (FSM).
6	WebSocket / real-time	3	noVNC is stock; only real-time surface; 30s heartbeat is pragmatic.
7	Frontend components	3	No real FE in this repo beyond `vnc_embed.html`; scope is narrow and appropriate.
8	Data flow clarity	4	The 9-state FSM and sentinel-file protocol are explicit; companion `data-flow.md` traces it.
9	Error handling & resilience	4	`check_inactivity_and_kill` (`agent_runner.py:241-283`), SIGTERM→SIGKILL escalation (`:210-238`), stale-detection + retry count, resume-first strategy are all solid.
10	Configuration	3	Env vars documented; but `WORK_DIR` defaulted twice (`app.py:26`, `agent_runner.py:16`) and `subprocess.run(..., env=os.environ.copy())` in `/agent_health` (`app.py:146`) passes the entire host env to the LLM CLI.
11	Data refresh patterns	3	EFS-backed Firefox profile persists across runs, with intentional lock cleanup; acceptable.
12	Performance	3	Per-section timeouts (100s A1, 300s A2) and 600s inactivity kill (`agent_runner.py:19`) are reasonable; no metrics to verify.
13	Module interactions	4	Clean boundary between Flask plane and Claude CLI subprocess.
14	Troubleshooting / runbooks	4	`runbook.md` + `RUN_BOOK.md` + inline docstrings cover stale locks, DOM drift, Login.gov expiry.
15	Testing & QA	4	YAML cites 874 passing tests, tests tree includes unit/integration/contract; test count is a strong signal even without coverage numbers.
16	Deployment & DevOps	3	CDK stacks present (not read); Dockerfile clean; no evidence of staged rollouts or canaries.
17	Security & compliance	2	Unauth Flask routes, `shell=True` subprocess with f-string interpolation, PII flowing into LLM prompts + EFS profile, broad env passthrough to LLM CLI. See A.6.5, A.11.
18	Documentation & maintenance	4	YAML + specs + companion md files are unusually thorough for a WIP-feeling repo.
19	Roadmap clarity	3	`specs/021-mock-to-real-fema/` and `022-secure-demo-deploy/` suggest direction but no explicit roadmap doc was reviewed.

Overall score: 3.32 average (19 rows). Weighted reading: operationally thoughtful, but security rating (2) is a load-bearing concern that should pull the effective score down for any prod-readiness decision.

A.3 What's Working Well

Strength: Surgical Firefox profile lock cleanup that deletes only lock and .parentlock and explicitly preserves cookies/storage.
- Location: fema_agent/src/web_service/app.py:42-84
- Why it works: The docstring names the exact failure mode (new container IP vs. old lock) and the code walks only one level so it cannot accidentally nuke profile state. This is the kind of narrow, well-justified hack that saves incidents.
- Propagate to: af-disaster-assistance-gov-agent (sister repo) if not already there.
Strength: Resume-first retry strategy with graceful fallback to fresh-prompt.
- Location: fema_agent/src/web_service/agent_runner.py:164-184 (restart_agent) + :120-161 (run_agent_resume, run_agent_fresh_with_resume).
- Why it works: Encodes real knowledge (--continue is broken with --mcp-config, so --resume <sessionId> is used) and escalates to a fresh prompt that self-skips completed pages using temp/ logs. Debt-aware, not debt-blind.
- Propagate to: Any other Claude Code CLI orchestrator in the org.
Strength: SIGTERM → grace → SIGKILL process-group termination.
- Location: agent_runner.py:210-238 (kill_process_group) + :241-283 (check_inactivity_and_kill).
- Why it works: start_new_session=True at spawn (:51, :137, :159) plus os.killpg ensures the shell wrapper and the claude-full child both die; inactivity is measured from max(start_time, latest_mtime) to avoid false-positive kills from stale temp/ files.
- Propagate to: Sister repo and any future agent runner.
Strength: Explicit P1/P2 cross-validation with county match.
- Location: app.py:359-370.
- Why it works: Catches a whole class of "wrong disaster" data-entry errors before they hit Claude and the real gov site.
- Propagate to: Any other form-automation agent.

A.4 What to Improve

A.4.1 P0 — Unauthenticated Flask data plane handling PII

Problem: Every route on the Flask app is defined without an @requires_auth wrapper. /survivor-info, /survivor-info/p1, /survivor-info/p2, /agent1/run, /agent2/run, /agent*/restart, and the /test/* endpoints all accept anonymous POSTs. The YAML argues "network isolation + CloudFront signed URLs" is sufficient, but nothing in this repo enforces that — any workload that lands in the VPC (misconfigured SG, sidecar compromise, SSRF from another service, or a misrouted ALB) can POST survivor PII directly.
Evidence: fema_agent/src/web_service/app.py:110, 114, 214, 239, 333, 418, 460, 560, 564 (and /test/* at :624,:699,:796 per YAML A.3). No before_request auth hook; no abort(401) anywhere in app.py.
Suggested change: Add a before_request that validates an HS256 JWT (the same AI_JWT_SECRET already used at the API Gateway layer per YAML §3.3) on every non-/health route; move test-only endpoints behind an env flag that is false in prod images.
Estimated effort: S
Risk if ignored: PII exfiltration; unauthorized hijack of a container mid-session after the user has completed Login.gov 2FA (attacker can ride the authenticated Firefox profile).

A.4.2 P0 — `shell=True` subprocess spawns with f-string interpolation

Problem: Three functions build command strings via f-string and pass them to subprocess.Popen(..., shell=True): run_agent, run_agent_resume, run_agent_fresh_with_resume. session_id and agent_file are interpolated into the shell string. Today session_id is a server-generated uuid.uuid4() (app.py:425, 471) and agent_file is a hard-coded constant, so there is no current injection path — but the pattern is fragile: any future refactor that lets a caller pass a session_id (e.g. for test harnesses, or resuming from DynamoDB) instantly becomes RCE-as-root-in-container.
Evidence: fema_agent/src/web_service/agent_runner.py:40-52, 127-139, 148-161.
Suggested change: Replace with list form: subprocess.Popen(["claude-full", "--session-id", session_id, "-p", prompt], shell=False, ...). Drop shell=True entirely.
Estimated effort: S
Risk if ignored: Latent command injection one refactor away; also makes it hard to reason about quoting of any prompt that ever contains a " or $.

A.4.3 P1 — Survivor PII is interpolated into LLM prompts with no sanitization layer

Problem: SurvivorFemaApplication.generate_agent1_script_from_p1 / generate_agent2_script (called at app.py:296, 395) take attacker-controllable JSON and produce as_agent1.txt / as_agent2.txt files that Claude Code then reads verbatim and executes against the real gov site. There is no evidence of (a) stripping control tokens, (b) length caps, (c) detection of prompt-injection strings like "ignore previous instructions", or (d) a deny-list for URLs/domains. The entire survivor JSON is also kept in state.py in-memory and persisted to EFS as the script files. PII (address, disability info, income, household/deceased members) passes through Bedrock as prompt text.
Evidence: app.py:295-296, 387-395 (generator invocation); YAML §4 enumerates the PII fields.
Suggested change: Introduce a survivor_sanitizer module that (1) validates every field against a tight allow-list regex, (2) caps each field length, (3) rejects strings containing obvious injection markers, (4) logs a redacted copy only; redact PII from any stderr/stdout tails returned by /agent_health and /agent*/status (app.py:154-167, agent_runner.py:302-313). Also document the DPA/BAA status of the Bedrock route since PII leaves the VPC.
Estimated effort: M
Risk if ignored: Prompt injection drives Claude to submit bogus data to the real FEMA site, exfiltrate cookies, or click malicious links inside Firefox. PII exposure to any log sink that captures stdout/stderr.

A.4.4 P1 — `/agent_health` hands the LLM subprocess the entire host environment

Problem: subprocess.run(["claude-full", "-p", prompt], env=os.environ.copy(), ...) passes every env var — including BEDROCK_API_KEY, AI_JWT_SECRET, AWS credentials injected by the task role — to a child that is explicitly meant to be a "quick smoke test."
Evidence: fema_agent/src/web_service/app.py:140-147.
Suggested change: Build a minimal allow-list env (PATH, HOME, LLM_PROVIDER, LLM_MODEL_NAME, AWS_REGION, provider key).
Estimated effort: S
Risk if ignored: Secret exfiltration via LLM output or future observability hooks.

A.4.5 P2 — Test-only routes live in the production app

Problem: /test/test_kup, /test/test_kup2, /test/force-status are in app.py and reachable in every built image with no env guard.
Evidence: YAML §3.2 cites app.py:624,699,796.
Suggested change: Gate with if os.environ.get("FEMA_ENABLE_TEST_ROUTES") == "1": inside create_app; fail-closed in prod.
Estimated effort: S
Risk if ignored: Unexpected state transitions from unauth traffic; widens blast radius of A.4.1.

A.5 Things That Don't Make Sense

Observation: _clear_firefox_profile_locks walks the top-level dir, breaks after the first iteration, then does a second manual two-level walk.
- Location: app.py:59-83.
- Hypotheses considered: defensive belt-and-suspenders; or the author found os.walk descended too far and spliced in a second pass.
- Question for author: Would a single glob.glob(os.path.join(firefox_profile_dir, "**/lock"), recursive=True) plus .parentlock glob be equivalent and cleaner?
Observation: check_agent_status returns stderr = stdout_text when stderr is empty.
- Location: agent_runner.py:305-308.
- Hypotheses considered: back-compat with older callers that only inspect stderr.
- Question for author: Is there still any caller that only looks at stderr? If not, drop the aliasing — it hides the real stream and complicates log grepping.

A.6 Anti-Patterns Detected

A.6.1 Code-level

A.6.2 Architectural

Big ball of mud
Distributed monolith
Chatty services
Leaky abstraction / inappropriate intimacy between layers
Golden hammer
Vendor lock-in without exit strategy
Stovepipe / reinvented wheel
Missing seams for testing — subprocess spawning, os.environ reads, time.time(), and os.walk are all called directly with no injection point.

A.6.3 Data

God table / EAV / missing indexes / N+1 / unbounded growth / nullable-everything / shared DB — N/A (not enough visibility into DynamoDB schemas from code read).

A.6.4 Async / Ops

Poison messages with no dead-letter queue — N/A (no queue).
Retry storms / no backoff — mitigated by retry_count ceiling.
Missing idempotency keys on non-idempotent ops — /agent*/run guarded by FSM state, effectively a per-container idempotency key.
Hidden coupling via shared state — Flask state.py in-memory dict is the single source of truth; any Flask worker count >1 would silently corrupt it. Single-worker assumption is not asserted.
Work queues without visibility / depth metrics

A.6.5 Security

Secrets in code, .env committed, or logs — stdout/stderr tails returned by /agent_health (app.py:154-167) and check_agent_status (agent_runner.py:302-313) can leak env-loaded secrets or PII echoed by the LLM.
Missing authn/z on internal endpoints — every route in app.py is unauthenticated (see A.4.1).
Overbroad IAM roles / least-privilege violations — not reviewed (no CDK inspection).
Unvalidated input crossing a trust boundary — survivor JSON is key-allow-listed but field-level values are fed into a script file that Claude then executes (prompt injection surface; see A.4.3).
PII/PHI in logs or error messages — in-memory survivor_data, EFS-persisted as_agent1.txt/as_agent2.txt, and process stderr returned via HTTP all carry PII. No redaction layer found.
Missing CSRF / XSS / SQLi / SSRF protections — Flask JSON API so CSRF N/A; no SQL; SSRF implicit in Firefox surface.

A.6.6 Detected Instances

#	Anti-pattern	Location (file:line)	Severity (P0/P1/P2)	Recommendation
1	God function (`create_app`)	`fema_agent/src/web_service/app.py:86-820`	P2	Split into `routes/health.py`, `routes/survivor.py`, `routes/agents.py` blueprints.
2	Duplicated `DEFAULT_WORK_DIR`	`app.py:26`, `agent_runner.py:16`	P2	Extract to `web_service/config.py`.
3	Near-duplicate agent1/agent2 route handlers	`app.py:418-502`	P2	Parameterize on `agent_key` like `_handle_restart` already does at `:522-549`.
4	Missing seams for testing	`agent_runner.py:45-52, 132-139, 156-161, 241-283`	P2	Inject a `ProcessLauncher` + `Clock`.
5	Single-process in-memory FSM with no worker guard	`state.py` (module-level dict)	P1	Assert `workers == 1` at startup or move state out of process.
6	Stdout/stderr tails leaked in HTTP responses	`app.py:154-167`, `agent_runner.py:302-313`	P1	Redact before return; log full body internally only.
7	Unauth endpoints on PII-handling data plane	`app.py:110,114,214,239,333,418,460,560,564`	P0	JWT `before_request` hook.
8	Unvalidated survivor JSON values interpolated into LLM prompts	`app.py:295-296, 387-395` (via `SurvivorFemaApplication`)	P0	Sanitizer + length caps + prompt-injection deny list.
9	`shell=True` + f-string interpolation in Popen	`agent_runner.py:40-52, 127-139, 148-161`	P0 (latent)	Use list form, drop `shell=True`.
10	`env=os.environ.copy()` passed to LLM subprocess	`app.py:146`	P1	Allow-list env.
11	Test routes reachable in prod image	`app.py:624,699,796` (per YAML)	P2	Env-gate.

A.7 Open Questions

Q: Is there any authentication at the container's Flask layer, or is the YAML's "VPC isolation + CloudFront signed URLs" the only control? If the latter, what stops an in-VPC service from posting directly?
- Blocks: A.4.1, A.11.
- Who can answer: Gordon / platform-sec.
Q: Have the agent1.md / agent2.md prompts been reviewed for prompt-injection resilience against untrusted survivor input?
- Blocks: A.4.3.
- Who can answer: Gordon / AI safety reviewer.
Q: Does the Bedrock path have a DPA/BAA in place for the PII being sent? FEMA IA data includes income, disability, deceased persons.
- Blocks: A.16 (if compliance is claimed).
- Who can answer: legal / compliance.
Q: Is Flask run with workers=1? If not, state.py's in-memory dict is unsafe.
- Blocks: A.6.4.
- Who can answer: deployment doc / Dockerfile CMD.

A.8 Difficulties Encountered

Difficulty: agent1.md / agent2.md prompt templates live under container_work_dir/fema-apply-agent/ (per YAML) and were not read as part of this review.
- Impact on analysis: Cannot concretely grade prompt-injection resilience — the "PII into prompts" finding (A.4.3) is inferred from the call sites, not from the prompt text itself.
- Fix that would help next reviewer: Commit a sanitized sample prompt to the repo root or link from README.
Difficulty: CDK stacks under aws_deployment/ were not opened; security posture at the edge is taken on faith from the YAML.
- Impact on analysis: Could not verify CloudFront signed-URL enforcement, mTLS, IAM least-privilege, or SG posture.
- Fix that would help next reviewer: Short aws_deployment/README.md per stack would shortcut this.
Difficulty: No coverage or flake numbers; test directory count alone is a weak signal.
- Impact on analysis: A.13 is mostly empty.
- Fix that would help next reviewer: pytest --cov badge or a coverage.xml artifact.

A.9 Risks & Unknowns

A.9.1 Known risks

#	Risk	Likelihood (L/M/H)	Impact (L/M/H)	Mitigation
1	Unauth Flask plane reachable from any VPC workload	M	H	JWT `before_request`; SG lockdown to API GW only.
2	Prompt injection via survivor JSON field values	M	H	Sanitizer + deny list; prompt design that quarantines user data.
3	DOM drift / CAPTCHA on real disasterassistance.gov	H	M	Runbook exists; add monitoring on agent retry count.
4	Login.gov session expiry mid-Agent-2 run	M	M	Detect 401/redirect to Login.gov; surface to user.
5	Multi-worker Flask corrupts in-memory FSM	L	H	Assert `workers=1` or move state.
6	Secrets/PII leak via stderr tails in HTTP responses	M	H	Redact before return.
7	`shell=True` becomes injectable after a future refactor	L	H	Switch to list form now.

A.9.2 Unknown unknowns

Area not reviewed: agent1.md / agent2.md prompt bodies. Reason: not in the paths I opened. Best guess at risk level: High — this is where prompt-injection defense either lives or doesn't.
Area not reviewed: aws_deployment/ CDK stacks. Reason: out of scope for time budget. Best guess at risk level: Medium — standard CDK patterns usually OK, but IAM scoping and CloudFront signed-URL enforcement need verification.
Area not reviewed: survivor_api/ field-level validators. Reason: only inspected the call site in app.py. Best guess at risk level: Medium — the sanitization verdict in A.4.3 hinges on what this module does or doesn't do.
Area not reviewed: DynamoDB ContainerInstance + AuditEvent schemas. Reason: in sibling container_manager service. Best guess at risk level: Low-Medium.
Area not reviewed: The 874-test suite content. Reason: time. Best guess at risk level: Low (tests existing is a strong signal).

A.10 Technical Debt Register

#	Debt item	Quadrant	Estimated interest	Remediation
1	Unauth Flask data plane	Reckless & Deliberate	High (security incidents)	JWT `before_request` (S).
2	`shell=True` + f-string Popen	Prudent & Inadvertent	Medium (latent)	Switch to list form (S).
3	PII into LLM prompts with no sanitizer	Reckless & Inadvertent	High (compliance + injection)	Sanitizer layer + redaction (M).
4	`env=os.environ.copy()` for LLM subprocess	Reckless & Inadvertent	Medium	Allow-list env (S).
5	`create_app` is 700+ lines	Prudent & Deliberate	Low	Blueprint split (M).
6	Duplicated `DEFAULT_WORK_DIR`	Prudent & Inadvertent	Low	Extract config module (S).
7	In-memory FSM with no worker-count assertion	Reckless & Inadvertent	Medium	Assert `workers=1` at startup (S).
8	Test routes in prod image	Prudent & Deliberate	Low-Medium	Env gate (S).
9	Stdout/stderr tails leaked over HTTP	Reckless & Inadvertent	Medium-High	Redaction layer (S).

A.11 Security Posture (lightweight STRIDE)

Category	Threat present?	Mitigated?	Gap
Spoofing (identity)	Yes — anyone in VPC can POST as "the control plane"	Partial (only at edge, per YAML)	No app-layer auth (A.4.1).
Tampering (integrity)	Yes — attacker can POST `/survivor-info/p2` to mutate in-flight state	No	Same root cause as spoofing.
Repudiation	Partial — DynamoDB AuditEvent exists per YAML	Unknown	Not verified end-to-end; no signed audit log seen in `app.py`.
Information Disclosure	Yes — stdout/stderr tails, in-memory PII, EFS-persisted scripts, Bedrock prompt payloads	Weak	Needs redaction + DPA check (A.4.3, A.4.4).
Denial of Service	Yes — `/agent_health` spawns a subprocess per call, no rate limit	Partial (409 if agent running)	Add rate limit / auth.
Elevation of Privilege	Yes — latent via `shell=True` if session_id ever becomes tainted; compromised agent runs with container role IAM	Partial	List-form Popen + IAM scoping review.

A.12 Operational Readiness

Capability	Present / Partial / Missing	Notes
Structured logs	Partial	`logger = logging.getLogger(__name__)` in `agent_runner.py` but `app.py` uses `print(..., file=sys.stderr)` (`:302-305`).
Metrics	Unknown	Not visible in code read.
Distributed tracing	Missing	No OTel imports.
Actionable alerts	Unknown	Presumed in `aws_deployment/observability` stack.
Runbooks	Present	`runbook.md` + `RUN_BOOK.md`.
On-call ownership defined	Unknown	Single author per git (Gordon).
SLOs / SLIs	Missing	Not documented.
Backup & restore tested	Unknown	EFS is the only stateful store; snapshot policy not verified.
Disaster recovery plan	Unknown	Not seen.
Chaos / failure testing	Missing	No evidence.

A.13 Test & Quality Signals

Coverage (line / branch): N/A — not reported.
Trend: N/A.
Flake rate: N/A.
Slowest tests: N/A.
Untested critical paths: Unknown; likely: prompt-injection robustness, multi-worker FSM safety, real-site DOM drift.
Missing test types: [ ] unit (present per YAML) [ ] integration (present) [ ] e2e (run_e2e_perf_test.sh present) [ ] contract (present) [x] load [x] security/fuzz.

A.14 Performance & Cost Smells

Hot paths: /agent*/status polled by control plane.
Suspected bottlenecks: Cold Firefox start + Claude Code CLI boot per container.
Wasteful queries / loops: get_latest_temp_mtime walks the full temp/ tree on every inactivity check (agent_runner.py:187-207) — probably fine at current sizes.
Oversized infra / idle resources: Fargate per-survivor is inherently spiky; without TTL enforcement (NOVNC_HEARTBEAT_SECONDS only keeps a conn open, doesn't kill idle tasks) cost could drift.
Cache hit/miss surprises: N/A.

A.15 Bus-Factor & Knowledge Risk

Who is the only person who understands X? Gordon (sole authors: entry in YAML, gordon.zhg@gmail.com from two git identities).
What breaks if they disappear tomorrow? Real-site DOM fixes, Login.gov handoff tuning, prompt engineering for agent1.md/agent2.md.
What is undocumented tribal knowledge? Why --resume instead of --continue (partially captured in docstring at agent_runner.py:123-125); the per-section timeouts (100s/300s) rationale.
Suggested knowledge-transfer actions: Pair-review with a second engineer on the prompt files; ADR for the Agent 1/Agent 2 split and Login.gov handoff.

A.16 Compliance Gaps

N/A — the prop-build doc does not explicitly claim HIPAA/SOC 2/PCI compliance. That said, if FEMA IA data is being processed, a reasonable auditor would ask about: (a) BAA/DPA with AWS Bedrock, (b) PII retention in EFS-backed Firefox profiles, (c) access control to the unauth Flask plane, (d) audit log integrity in DynamoDB. These are flagged here even without an explicit claim, because the data class (federal benefits PII including disability and deceased persons) would typically trigger review.

A.17 Recommendations Summary

Priority	Action	Owner (suggested)	Effort	Depends on
P0	Add JWT `before_request` auth to every Flask route except `/health`; env-gate `/test/*`	Gordon	S	AI_JWT_SECRET already exists
P0	Build a survivor-data sanitizer + prompt-injection deny list + length caps; wire into `generate_agent1_script_from_p1` and `generate_agent2_script`	Gordon + AI-safety reviewer	M	Read of `agent1.md`/`agent2.md`
P0	Replace `shell=True` + f-string Popen calls in `agent_runner.py` with list form	Gordon	S	—
P0	Redact PII/secrets from stdout/stderr tails before returning in `/agent_health` and `/agent*/status`	Gordon	S	—
P1	Build an allow-list env for `claude-full` subprocess in `/agent_health` (and future spawns)	Gordon	S	—
P1	Assert `workers=1` at Flask startup or document the single-worker requirement and Dockerfile CMD	Gordon	S	—
P1	Document / verify Bedrock DPA coverage for FEMA IA PII	Compliance	S	legal
P1	Read + security-review `container_work_dir/fema-apply-agent/agent1.md` and `agent2.md`	AI-safety reviewer	M	—
P2	Split `create_app` into Flask blueprints; extract `DEFAULT_WORK_DIR` to `config.py`	Gordon	M	—
P2	Parameterize agent1/agent2 route handlers like `_handle_restart` already does	Gordon	S	—
P2	Inject `ProcessLauncher` + `Clock` seams into `agent_runner.py` for testability	Gordon	M	—
P2	Add structured logging (swap `print(..., file=sys.stderr)` for `logger.*`)	Gordon	S	—

Environment variables

Name	Purpose
`LLM_PROVIDER`	bedrock\|anthropic
`LLM_MODEL_NAME`	Friendly alias
`BEDROCK_API_KEY`*	Bedrock auth
`ANTHROPIC_API_KEY`	Direct Anthropic API alt
`AWS_REGION`	Bedrock region
`WORKSPACE_DIR`	EFS mount point
`FIREFOX_PROFILE_DIR`	Persisted Firefox profile
`FIREFOX_CACHE_DIR`	Persisted Firefox cache
`NOVNC_HEARTBEAT_SECONDS`	Idle keepalive for CloudFront