Docker image based on ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest (Ubuntu 22.04 + X11 + noVNC). Adds Node.js 20, Claude Code CLI v2.1.45, Flask 3.0, MCP servers (chrome-devtools, mcp-vnc, playwright), and a fema_agent Python package implementing a survivor data parser, agent script generator, and an 8-state Flask state machine that spawns claude-full subprocesses for Agent 1 (pre-login) and Agent 2 (post-login). Browser is Chrome with remote debugging on 9222; profile + cookies persisted in EFS so the Login.gov session survives between Agent 1 and Agent 2.

Role in the system: Spawned per-survivor by af-backend-go-api (control plane in af-infra Lambda); exposed via Flask data-plane on :5001 and noVNC on :6080; container destroyed post-completion

Surfaces:

Flask data-plane HTTP API on :5001 (8 endpoints)
noVNC web on :6080 (vnc.html + vnc_embed.html for iframe)
VNC server on :5900
Mock FEMA HTTP server on :8020 (dev/training)
Chrome DevTools Protocol on :9222
Claude Code CLI as the agent runtime (claude-full wrapper)
.claude/ MCP server config + agents/commands

User workflows

Build image
Image ready
Start container
/health returns ok
Submit survivor
Container ready for Agent 1
Run Agent 1
State → AGENT1_COMPLETE / WAITING_FOR_LOGIN
Login.gov via VNC
Survivor authenticated; ready for Agent 2
Run Agent 2
State → AGENT2_COMPLETE; survivor reviews page 37 manually

API endpoints

GET/healthLiveness
POST/agent_healthLLM connectivity smoke test
GET/stateCurrent orchestration state + stale-agent flags
POST/survivor-infoSubmit full survivor JSON, generate scripts
POST/survivor-info/p1Partial submission (pages 1-16 only)
POST/survivor-info/p2Partial submission (pages 20-36 only)
POST/agent1/runSpawn Agent 1 subprocess
GET/agent1/statusPoll Agent 1 subprocess
POST/agent2/runSpawn Agent 2 subprocess
GET/agent2/statusPoll Agent 2 subprocess

Third-party APIs

Anthropic Bedrock (or direct Claude API)
LLM backend for Claude Code CLI
DisasterAssistance.gov
Real FEMA application portal (target)
Login.gov
2FA gate between Agent 1 and Agent 2
Chrome DevTools Protocol
Browser automation surface

Service dependencies

AWS ECS Fargate
Per-survivor task launch
AWS EFS
Persisted browser profile + work dir
AWS Secrets Manager
Bedrock API key, VNC signing key, JWT secret
AWS DynamoDB
Container state + audit log
AWS CloudWatch Logs
Audit trail
af-backend-go-api
Control plane caller (start/stop/poll)
af-infra (CDK or Terraform)
ECS task def, EFS, Secrets Manager, ALB, CloudFront

Analysis

overall health3.1 / 5acceptable

4Module overview / clarity of intent

3External dependencies

2API endpoints

3Database schema

4Backend services

3WebSocket / real-time

3Frontend components

4Data flow clarity

3Error handling & resilience

3Configuration

3Data refresh patterns

2Performance

4Module interactions

4Troubleshooting / runbooks

2Testing & QA

3Deployment & DevOps

2Security & compliance

4Documentation & maintenance

3Roadmap clarity

af-disaster-assistance-gov-agent — Prop-Build Analysis

Document Type: Critical Review & Analysis (companion to prop-build-template.md) Scope: Per-Repo / Per-Module Subject: af-disaster-assistance-gov-agent (DisasterAssistance.gov AI Automation Agent) Reviewer(s): Claude (automated code review) Date: 2026-04-09 Version: 0.1 Confidence Level: Medium What would raise confidence: Running the container locally end-to-end against mock FEMA; access to CloudWatch metrics and ECS task telemetry; interview with Gordon Zheng on two-agent split rationale; execution of pytest + e2e perf harness.

Inputs Reviewed:

Prop-build doc: /Users/andres/src/af/af-analysis/data/af-disaster-assistance-gov-agent.yaml
Companion docs: /Users/andres/src/af/af-analysis/data/af-disaster-assistance-gov-agent/{api-examples,data-flow,runbook,deployment}.md
Source tree: /Users/andres/src/af/af-disaster-assistance-gov-agent/ (fema_agent/src/web_service/{app.py:1271, state.py:388, agent_runner.py:317})
Dashboards / metrics: not accessed
ADRs / design docs: specs/001..020 directories (listing only)
Interviews: none

A.1 Executive Summary

Overall health: Functional and cleanly split into a Flask data-plane, FSM, agent runner, and survivor parser; architecture is coherent for a per-survivor ephemeral container, but the HTTP surface is oversized and unauthenticated at the app layer.
Top risk: All 22 Flask endpoints, including destructive and PII-accepting routes, have auth: "none" and rely entirely on network isolation + CloudFront signed URLs — a single misconfiguration in ALB/SG removes all authz (see A.6.5, A.11).
Top win / thing worth preserving: Sentinel-file completion pattern (temp/agent*.completed) decouples "agent finished successfully" from "subprocess exited cleanly" — robust against unclean Claude Code exits (fema_agent/src/web_service/agent_runner.py).
Single recommended next action: Gate the Flask data-plane behind a shared-secret header or mTLS verified by the container, even behind the ALB, so app-layer authz exists as defense-in-depth.
Blocking unknowns: Actual test coverage %, CI pass rate, production error rates, concurrent-task ceiling, and whether specs/019-stuck-agent-watcher is deployed.

A.2 Health Scorecard

#	Dimension	Score (1–5)	Justification
1	Module overview / clarity of intent	4	YAML + README articulate per-survivor container and two-agent split clearly; purpose unambiguous.
2	External dependencies	3	Well-enumerated but heavy: Bedrock, Login.gov, DisasterAssistance.gov DOM, 3 MCP servers, Chrome CDP — many failure surfaces.
3	API endpoints	2	22 routes defined in `app.py` vs 10 documented; test/internal routes (`/test/`, `/agent/restart`, `/test/force-status`) ship in the same binary as prod (`app.py:707–838`).
4	Database schema	3	No local DB; state is in-memory + EFS + DynamoDB (control plane). Reasonable for ephemeral container.
5	Backend services	4	Clean separation: `state.py` FSM, `agent_runner.py` subprocess orchestration, `survivor_api/` parsing.
6	WebSocket / real-time	3	noVNC proxy inherited from base image; heartbeat added — adequate.
7	Frontend components	3	Minimal — relies on noVNC; `vnc_embed.html` iframe variant is pragmatic.
8	Data flow clarity	4	`data-flow.md` plus YAML end-to-end block trace the request path explicitly.
9	Error handling & resilience	3	Good abort codes and sentinel fallback; no circuit breakers; stale-process handling exists but relies on operator retry.
10	Configuration	3	Env-var driven, documented; no feature-flag framework; some magic defaults (600s watchdog).
11	Data refresh patterns	3	On-demand only; appropriate for this runtime model.
12	Performance	2	Targets TBD; 1 vCPU-bound container per survivor implies linear cost scaling; no load numbers captured.
13	Module interactions	4	Explicitly documented with af-backend-go-api, Bedrock, EFS, DynamoDB.
14	Troubleshooting / runbooks	4	`runbook.md` covers top 6 failure modes.
15	Testing & QA	2	`pytest` referenced but coverage unknown/null in YAML; e2e is a bash script `run_e2e_perf_test.sh`; no contract tests executed in CI are verified.
16	Deployment & DevOps	3	GitHub Actions → ECR → CDK; rollback documented; no blue/green specifics.
17	Security & compliance	2	`auth: "none"` on 22 routes; survivor PII (SSN, DOB) in-memory + EFS; sensitive Login.gov cookies on EFS; STRIDE coverage incomplete (see A.11).
18	Documentation & maintenance	4	Strong README, AGENTS.md, 20 specs directories, runbook, api-examples.
19	Roadmap clarity	3	specs/001–020 imply active phased work but phase/owner mapping not captured here.

Overall score: 3.1 — architecture and docs are above average; security posture and test signal are the drag.

A.3 What's Working Well

Strength: Sentinel-file completion signal decoupled from subprocess exit code.
- Location: fema_agent/src/web_service/agent_runner.py:23-91 and dual-location poll.
- Why it works: Claude Code CLI subprocesses can exit uncleanly; relying on exit code would produce false negatives. The sentinel reifies "agent wrote its completion marker," a stronger invariant.
- Propagate to: af-fema-real-ai-agent sister repo; any future agent-runtime containers.
Strength: Two-agent split around the Login.gov human-in-the-loop boundary.
- Location: State machine fema_agent/src/web_service/state.py:8-16 plus script generator in fema_agent/src/raw_builder/ + intermediate_builder/.
- Why it works: Makes the unavoidable human 2FA gate an explicit state (WAITING_FOR_LOGIN) rather than a hidden pause, enabling the frontend to render the VNC iframe at the right moment.
- Propagate to: Any other gov-portal automation repos encountering Login.gov / ID.me.
Strength: Clear module decomposition under fema_agent/src/ (web_service, survivor_api, raw_builder, intermediate_builder, enums).
- Location: fema_agent/src/ package layout.
- Why it works: Parser, builder, and runtime concerns are separable; supports unit testing of pure-data layers without Flask/subprocess stack.

A.4 What to Improve

A.4.1 P0 — Add app-layer authentication to the Flask data-plane

Problem: Every documented and undocumented route is auth: "none", relying solely on SG + CloudFront signed URLs.
Evidence: YAML section_3_api.endpoints[*].auth = "none"; fema_agent/src/web_service/app.py:188–838 shows no @requires_auth decorator on any route.
Suggested change: Add a shared-secret header (HS256 JWT or per-task bearer token via Secrets Manager) in a Flask before_request hook.
Estimated effort: M
Risk if ignored: One SG misconfig → full PII breach + /test/force-status allows arbitrary state transitions.

A.4.2 P1 — Split test/internal routes out of the production Flask app

Problem: /test/agent-auth, /test/agent-complete, /test/force-status, /test/setup-mock, /test/e2e-mock, /agent1/restart, /agent2/restart, /agent*/status_detailed, and /setup-fema ship in the same binary as prod.
Evidence: fema_agent/src/web_service/app.py:707–838; YAML notes "~12 additional internal/test-only endpoints."
Suggested change: Gate behind ENABLE_TEST_ROUTES env var defaulted off, or separate Blueprint loaded only in dev/stg.
Estimated effort: S
Risk if ignored: Prod blast radius includes arbitrary state manipulation.

A.4.3 P1 — Capture and publish test coverage

Problem: YAML lists coverage_pct: null; unknown whether fema_agent/tests/ actually exercises FSM, parser validators, and stale-detection.
Evidence: section_15_testing.unit.coverage_pct: null.
Suggested change: Wire pytest --cov=fema_agent --cov-report=xml into CI; fail under threshold.
Estimated effort: S
Risk if ignored: FSM regressions invisible until production.

A.4.4 P2 — Replace `print(..., file=sys.stderr)` with structured logging

Evidence: fema_agent/src/web_service/app.py:357-360, 472-474.
Suggested change: Python logging with JSON formatter; include request/survivor correlation id.
Estimated effort: S

A.5 Things That Don't Make Sense

Observation: Two agents spawn claude-full subprocesses, but the state machine defines both production (9) and test-only (6) states in the same enum.
- Location: fema_agent/src/web_service/state.py:8-49.
- Question for author: Why not a separate TestState enum so the prod FSM's valid-transition matrix stays minimal?
Observation: Sentinel file is polled at two locations (pre-archive and post-archive).
- Location: fema_agent/src/web_service/agent_runner.py:94+.
- Question for author: Determined ordering, or genuinely defensive against concurrent archive?

A.6 Anti-Patterns Detected

A.6.1 Code-level

A.6.2 Architectural

Big ball of mud
Distributed monolith
Chatty services
Leaky abstraction — Flask state dict holds Popen handles alongside survivor JSON
Golden hammer
Vendor lock-in — Bedrock + Claude Code CLI + chrome-devtools-mcp, no abstraction layer
Stovepipe
Missing seams for testing — hard-coded filesystem paths, direct subprocess.Popen, direct Chrome CDP URLs

A.6.3 Data

N/A — no persistent DB in this repo.

A.6.4 Async / Ops

Poison messages
Retry storms
Missing idempotency keys
Hidden coupling via shared state — global Flask _state mutated by multiple routes + polling thread
Work queues without visibility

A.6.5 Security

Secrets in code / .env committed
Missing authn/z on internal endpoints — 22 routes, zero auth decorators
Overbroad IAM roles
Unvalidated input crossing trust boundary — /test/force-status accepts arbitrary state strings
PII/PHI in logs or error messages (suspected) — validation errors embed field values into 400 descriptions

A.6.6 Detected Instances

#	Anti-pattern	Location (file:line)	Severity	Recommendation
1	God object / god function	`app.py:164–838` (1271 LOC, single `create_app`, 22 routes)	P1	Split into Blueprints: `health_bp`, `survivor_bp`, `agent_bp`, `test_bp`.
2	Copy-paste / duplication	`app.py:303–590` (`/survivor-info`, `/p1`, `/p2` share near-identical validation + env-setup)	P1	Extract `_ingest_survivor(payload, variant)` helper.
3	Magic numbers	`section_10_configuration.thresholds` — 600s, 30s defaults	P2	Centralize in `config.py` with rationale.
4	Leaky abstraction	`state.py:76-104` — Flask state dict holds `Popen` handles + survivor JSON + state enum	P1	Wrap in `AgentProcess` class.
5	Vendor lock-in	Bedrock + Claude Code CLI + chrome-devtools-mcp, no abstraction	P2	Document exit cost; keep `LLM_PROVIDER` switch exercised.
6	Missing testing seams	`app.py:44-57`, `agent_runner.py` — hard-coded paths, direct subprocess	P1	Inject `Clock`, `FileSystem`, `ProcessSpawner`.
7	Hidden coupling	Global `_state` mutated by multiple routes + polling thread	P1	Serialize mutations behind a locked `StateStore`.
8	Missing authn/z	`app.py:188–838` — 22 routes, zero auth	P0	A.4.1.
9	Unvalidated input	`/test/force-status` (`app.py:763-772`) accepts arbitrary state strings	P0	Disable in prod (A.4.2); validate enum.
10	PII in error messages	`app.py:383,388,422,427,432,438,498,502,520,525,530,583,586` — validation errors embed field values	P1	Scrub PII from 400 descriptions.

A.7 Open Questions

Q: Are /test/* routes compiled out or disabled by env in production deploys?
- Blocks: A.4.2, A.6 #9
- Who can answer: Gordon Zheng / platform SRE
Q: What is the actual pytest coverage today?
- Blocks: A.2 row 15, A.4.3
- Who can answer: CI logs
Q: Is specs/019-stuck-agent-watcher merged and running in prod?
- Blocks: Confidence in stale-agent recovery
- Who can answer: Repo maintainer

A.8 Difficulties Encountered

Difficulty: No access to CI output, coverage, or runtime metrics.
- Impact: Scored row 15 conservatively.
- Fix: Publish CI summary (coverage %, test count, flake rate) into a badge or ci-summary.json.
Difficulty: Two sister repos (af-fema-real-ai-agent) with "near-identical" contents but no shared library.
- Impact: Unable to tell which is canonical; duplication risk unmeasured.
- Fix: Extract shared Python package or document divergence explicitly.

A.9 Risks & Unknowns

A.9.1 Known risks

#	Risk	Likelihood	Impact	Mitigation
1	Data-plane exposure if SG/CloudFront misconfigures	M	H	A.4.1 add app-layer auth
2	DisasterAssistance.gov DOM change breaks Agent scripts	M	H	Script regeneration + monitoring; manual override via VNC
3	Bedrock quota / auth failure mid-submission	M	M	Runbook exists; stale-detection; retry from operator
4	Login.gov session cookie leak via EFS snapshot	L	H	EFS encryption at rest; scope IAM read to task role
5	1:1 container-to-user scaling hits Fargate limits at volume	M	M	Spot + quota raises; capacity planning TBD

A.9.2 Unknown unknowns

Area not reviewed: CDK stack under aws_deployment/ (IAM policies, SG rules, CloudFront rules).
- Reason: Out of scope for per-repo Part A.
- Best guess: M — the entire security model relies on these.
Area not reviewed: fema_agent/tests/ contents.
- Reason: Time.
- Best guess: M.
Area not reviewed: specs/001–020 (20 spec directories).
- Reason: Time.
- Best guess: L — docs only.

A.10 Technical Debt Register

#	Debt item	Quadrant	Interest	Remediation
1	Unauthenticated Flask data-plane	Reckless & Deliberate	High — one misconfig = PII breach	Per-task bearer token (A.4.1)
2	22 routes in one 1271-line `app.py`	Prudent & Inadvertent	Medium — slows review	Blueprints (A.6 #1)
3	Test routes ship in prod binary	Reckless & Deliberate	Medium — `/test/force-status` blast radius	Env-gated Blueprint (A.4.2)
4	Duplication across `/survivor-info` handlers	Prudent & Inadvertent	Low — 3× change cost	Extract helper
5	Near-duplicate sister repo `af-fema-real-ai-agent`	Prudent & Deliberate	Medium — 2× maintenance	Extract shared package
6	Unknown test coverage (null)	Prudent & Inadvertent	Medium	Wire coverage (A.4.3)
7	Global `_state` dict mutated from multiple sites	Prudent & Inadvertent	Medium — race risk	Locked `StateStore`
8	`print(..., file=sys.stderr)` for errors	Prudent & Inadvertent	Low	Logger (A.4.4)

A.11 Security Posture (lightweight STRIDE)

Category	Threat present?	Mitigated?	Gap
Spoofing (identity)	Yes — any caller reaching :5001 is trusted	Partial (network only)	No app-layer auth (A.4.1)
Tampering (integrity)	Yes — `/test/force-status` can overwrite FSM	Partial if test routes disabled (unverified)	Confirm prod gating
Repudiation	Partial — DynamoDB AuditEvent at control plane	Partial	Data-plane actions don't emit per-request audit
Information Disclosure	Yes — PII in `_state`, EFS, possibly 400 descriptions	Partial	Scrub errors; verify EFS encryption
Denial of Service	Yes — unauthenticated endpoints + subprocess spawning	Partial	No rate limit at app layer
Elevation of Privilege	Low at app layer	Partial	Task IAM role scope not reviewed

A.12 Operational Readiness

Capability	Present / Partial / Missing	Notes
Structured logs	Partial	CloudWatch stdout; some `print` paths
Metrics	Partial	`AgentExecutionTime`, `ErrorRate` listed but runbook links TBD
Distributed tracing	Missing	No trace id propagation
Actionable alerts	Partial	Metrics defined, runbook links TBD
Runbooks	Present	`runbook.md` covers top 6 issues
On-call ownership defined	Missing	—
SLOs / SLIs	Missing	Performance targets TBD
Backup & restore tested	N/A	Ephemeral containers
Disaster recovery plan	Missing	—
Chaos / failure testing	Missing	—

A.13 Test & Quality Signals

Coverage: N/A — coverage_pct: null
Untested critical paths (suspected): FSM stale-detection, dual-location sentinel poll, _clear_chrome_session_restore
Missing test types: [ ] unit [x] integration [ ] e2e [x] contract [x] load [x] security/fuzz

A.14 Performance & Cost Smells

Hot paths: Agent subprocess poll loop; sentinel file stat.
Suspected bottlenecks: Chrome rendering under X11; Bedrock latency dominates per-page time.
Oversized infra: 2 vCPU / 4 GB per survivor — verify with telemetry.

A.15 Bus-Factor & Knowledge Risk

Only-person: Single author (Gordon Zheng). Script generator + raw/intermediate builder entirely author-originated.
What breaks: DOM-selector regressions when DisasterAssistance.gov changes; claude-full wrapper tuning.
Tribal knowledge: Mapping between page numbering and script sections; rationale for 2-location sentinel polling.
Actions: Architecture walkthrough recording; rationale comments in raw_builder/; pair-review script regeneration with second engineer.

A.16 Compliance Gaps

N/A — YAML does not claim HIPAA/SOC 2/state-insurance compliance. FEMA/PII handling regime implied but not asserted. Formal compliance review belongs in af-infra.

A.17 Recommendations Summary

Priority	Action	Owner	Effort	Depends on
P0	Add app-layer auth (bearer token) to all Flask routes	Repo maintainer + SRE	M	Secrets Manager provisioning
P0	Disable `/test/` and `/agent/restart` in prod	Repo maintainer	S	—
P1	Split `create_app` into Blueprints; extract `_ingest_survivor` helper	Repo maintainer	M	—
P1	Wrap global `_state` in locked `StateStore`; wrap Popen in `AgentProcess`	Repo maintainer	M	—
P1	Wire pytest coverage into CI with threshold	CI owner	S	CI access
P1	Inject filesystem/clock/process seams for FSM unit tests	Repo maintainer	M	Blueprint split
P1	Scrub PII from 4xx descriptions; structured-log instead	Repo maintainer	S	Logger replacement
P2	Replace `print(stderr)` with Python logger	Repo maintainer	S	—
P2	Centralize magic-number thresholds with rationale	Repo maintainer	S	—
P2	Document Bedrock / Claude Code exit strategy or extract shared package with `af-fema-real-ai-agent`	Architect	L	Cross-repo coordination

Environment variables

Name	Purpose
`LLM_PROVIDER`	bedrock\|anthropic
`LLM_MODEL_NAME`	Friendly alias mapped to Bedrock inference profile
`BEDROCK_API_KEY`*	Aliased to AWS_BEARER_TOKEN_BEDROCK
`AWS_REGION`	Bedrock region
`WORK_DIR`	Container work directory
`CLAUDE_CONFIG_DIR`	Claude credentials path inside container
`DISPLAY`	X11 display
`NOVNC_HEARTBEAT_SECONDS`	Idle keepalive for CloudFront
`AGENT_INACTIVITY_TIMEOUT_SECONDS`	Watchdog for stale agent subprocess
`PORT_5001 / PORT_5900 / PORT_6080`	Host port mappings (set in .env for start_container.sh)