AidFinder
Back to dashboard

af-disaster-assistance-gov-agent

DisasterAssistance.gov AI Automation Agent

Per-survivor Anthropic computer-use container running Claude Code CLI + Flask data-plane to fill the FEMA application; mock-targeted training variant with both Chrome and mock FEMA server inside the container.

Domain role
AI agent runtime (Anthropic computer-use desktop)
Last updated
2026-03-12
API style
REST

Docker image based on ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest (Ubuntu 22.04 + X11 + noVNC). Adds Node.js 20, Claude Code CLI v2.1.45, Flask 3.0, MCP servers (chrome-devtools, mcp-vnc, playwright), and a fema_agent Python package implementing a survivor data parser, agent script generator, and an 8-state Flask state machine that spawns claude-full subprocesses for Agent 1 (pre-login) and Agent 2 (post-login). Browser is Chrome with remote debugging on 9222; profile + cookies persisted in EFS so the Login.gov session survives between Agent 1 and Agent 2.

Role in the system: Spawned per-survivor by af-backend-go-api (control plane in af-infra Lambda); exposed via Flask data-plane on :5001 and noVNC on :6080; container destroyed post-completion

Surfaces:

  • Flask data-plane HTTP API on :5001 (8 endpoints)
  • noVNC web on :6080 (vnc.html + vnc_embed.html for iframe)
  • VNC server on :5900
  • Mock FEMA HTTP server on :8020 (dev/training)
  • Chrome DevTools Protocol on :9222
  • Claude Code CLI as the agent runtime (claude-full wrapper)
  • .claude/ MCP server config + agents/commands

User workflows

  • Build image

    Image ready

  • Start container

    /health returns ok

  • Submit survivor

    Container ready for Agent 1

  • Run Agent 1

    State → AGENT1_COMPLETE / WAITING_FOR_LOGIN

  • Login.gov via VNC

    Survivor authenticated; ready for Agent 2

  • Run Agent 2

    State → AGENT2_COMPLETE; survivor reviews page 37 manually

API endpoints

  • GET/healthLiveness
  • POST/agent_healthLLM connectivity smoke test
  • GET/stateCurrent orchestration state + stale-agent flags
  • POST/survivor-infoSubmit full survivor JSON, generate scripts
  • POST/survivor-info/p1Partial submission (pages 1-16 only)
  • POST/survivor-info/p2Partial submission (pages 20-36 only)
  • POST/agent1/runSpawn Agent 1 subprocess
  • GET/agent1/statusPoll Agent 1 subprocess
  • POST/agent2/runSpawn Agent 2 subprocess
  • GET/agent2/statusPoll Agent 2 subprocess

Third-party APIs

  • Anthropic Bedrock (or direct Claude API)

    LLM backend for Claude Code CLI

  • DisasterAssistance.gov

    Real FEMA application portal (target)

  • Login.gov

    2FA gate between Agent 1 and Agent 2

  • Chrome DevTools Protocol

    Browser automation surface

Service dependencies

  • AWS ECS Fargate

    Per-survivor task launch

  • AWS EFS

    Persisted browser profile + work dir

  • AWS Secrets Manager

    Bedrock API key, VNC signing key, JWT secret

  • AWS DynamoDB

    Container state + audit log

  • AWS CloudWatch Logs

    Audit trail

  • af-backend-go-api

    Control plane caller (start/stop/poll)

  • af-infra (CDK or Terraform)

    ECS task def, EFS, Secrets Manager, ALB, CloudFront

Analysis

overall health3.1 / 5acceptable
4Module overview / clarity of intent
3External dependencies
2API endpoints
3Database schema
4Backend services
3WebSocket / real-time
3Frontend components
4Data flow clarity
3Error handling & resilience
3Configuration
3Data refresh patterns
2Performance
4Module interactions
4Troubleshooting / runbooks
2Testing & QA
3Deployment & DevOps
2Security & compliance
4Documentation & maintenance
3Roadmap clarity

af-disaster-assistance-gov-agent — Prop-Build Analysis

Document Type: Critical Review & Analysis (companion to prop-build-template.md) Scope: Per-Repo / Per-Module Subject: af-disaster-assistance-gov-agent (DisasterAssistance.gov AI Automation Agent) Reviewer(s): Claude (automated code review) Date: 2026-04-09 Version: 0.1 Confidence Level: Medium What would raise confidence: Running the container locally end-to-end against mock FEMA; access to CloudWatch metrics and ECS task telemetry; interview with Gordon Zheng on two-agent split rationale; execution of pytest + e2e perf harness.

Inputs Reviewed:

  • Prop-build doc: /Users/andres/src/af/af-analysis/data/af-disaster-assistance-gov-agent.yaml
  • Companion docs: /Users/andres/src/af/af-analysis/data/af-disaster-assistance-gov-agent/{api-examples,data-flow,runbook,deployment}.md
  • Source tree: /Users/andres/src/af/af-disaster-assistance-gov-agent/ (fema_agent/src/web_service/{app.py:1271, state.py:388, agent_runner.py:317})
  • Dashboards / metrics: not accessed
  • ADRs / design docs: specs/001..020 directories (listing only)
  • Interviews: none

A.1 Executive Summary

  • Overall health: Functional and cleanly split into a Flask data-plane, FSM, agent runner, and survivor parser; architecture is coherent for a per-survivor ephemeral container, but the HTTP surface is oversized and unauthenticated at the app layer.
  • Top risk: All 22 Flask endpoints, including destructive and PII-accepting routes, have auth: "none" and rely entirely on network isolation + CloudFront signed URLs — a single misconfiguration in ALB/SG removes all authz (see A.6.5, A.11).
  • Top win / thing worth preserving: Sentinel-file completion pattern (temp/agent*.completed) decouples "agent finished successfully" from "subprocess exited cleanly" — robust against unclean Claude Code exits (fema_agent/src/web_service/agent_runner.py).
  • Single recommended next action: Gate the Flask data-plane behind a shared-secret header or mTLS verified by the container, even behind the ALB, so app-layer authz exists as defense-in-depth.
  • Blocking unknowns: Actual test coverage %, CI pass rate, production error rates, concurrent-task ceiling, and whether specs/019-stuck-agent-watcher is deployed.

A.2 Health Scorecard

#DimensionScore (1–5)Justification
1Module overview / clarity of intent4YAML + README articulate per-survivor container and two-agent split clearly; purpose unambiguous.
2External dependencies3Well-enumerated but heavy: Bedrock, Login.gov, DisasterAssistance.gov DOM, 3 MCP servers, Chrome CDP — many failure surfaces.
3API endpoints222 routes defined in app.py vs 10 documented; test/internal routes (/test/*, /agent*/restart, /test/force-status) ship in the same binary as prod (app.py:707–838).
4Database schema3No local DB; state is in-memory + EFS + DynamoDB (control plane). Reasonable for ephemeral container.
5Backend services4Clean separation: state.py FSM, agent_runner.py subprocess orchestration, survivor_api/ parsing.
6WebSocket / real-time3noVNC proxy inherited from base image; heartbeat added — adequate.
7Frontend components3Minimal — relies on noVNC; vnc_embed.html iframe variant is pragmatic.
8Data flow clarity4data-flow.md plus YAML end-to-end block trace the request path explicitly.
9Error handling & resilience3Good abort codes and sentinel fallback; no circuit breakers; stale-process handling exists but relies on operator retry.
10Configuration3Env-var driven, documented; no feature-flag framework; some magic defaults (600s watchdog).
11Data refresh patterns3On-demand only; appropriate for this runtime model.
12Performance2Targets TBD; 1 vCPU-bound container per survivor implies linear cost scaling; no load numbers captured.
13Module interactions4Explicitly documented with af-backend-go-api, Bedrock, EFS, DynamoDB.
14Troubleshooting / runbooks4runbook.md covers top 6 failure modes.
15Testing & QA2pytest referenced but coverage unknown/null in YAML; e2e is a bash script run_e2e_perf_test.sh; no contract tests executed in CI are verified.
16Deployment & DevOps3GitHub Actions → ECR → CDK; rollback documented; no blue/green specifics.
17Security & compliance2auth: "none" on 22 routes; survivor PII (SSN, DOB) in-memory + EFS; sensitive Login.gov cookies on EFS; STRIDE coverage incomplete (see A.11).
18Documentation & maintenance4Strong README, AGENTS.md, 20 specs directories, runbook, api-examples.
19Roadmap clarity3specs/001–020 imply active phased work but phase/owner mapping not captured here.

Overall score: 3.1 — architecture and docs are above average; security posture and test signal are the drag.


A.3 What's Working Well

  • Strength: Sentinel-file completion signal decoupled from subprocess exit code.

    • Location: fema_agent/src/web_service/agent_runner.py:23-91 and dual-location poll.
    • Why it works: Claude Code CLI subprocesses can exit uncleanly; relying on exit code would produce false negatives. The sentinel reifies "agent wrote its completion marker," a stronger invariant.
    • Propagate to: af-fema-real-ai-agent sister repo; any future agent-runtime containers.
  • Strength: Two-agent split around the Login.gov human-in-the-loop boundary.

    • Location: State machine fema_agent/src/web_service/state.py:8-16 plus script generator in fema_agent/src/raw_builder/ + intermediate_builder/.
    • Why it works: Makes the unavoidable human 2FA gate an explicit state (WAITING_FOR_LOGIN) rather than a hidden pause, enabling the frontend to render the VNC iframe at the right moment.
    • Propagate to: Any other gov-portal automation repos encountering Login.gov / ID.me.
  • Strength: Clear module decomposition under fema_agent/src/ (web_service, survivor_api, raw_builder, intermediate_builder, enums).

    • Location: fema_agent/src/ package layout.
    • Why it works: Parser, builder, and runtime concerns are separable; supports unit testing of pure-data layers without Flask/subprocess stack.

A.4 What to Improve

A.4.1 P0 — Add app-layer authentication to the Flask data-plane

  • Problem: Every documented and undocumented route is auth: "none", relying solely on SG + CloudFront signed URLs.
  • Evidence: YAML section_3_api.endpoints[*].auth = "none"; fema_agent/src/web_service/app.py:188–838 shows no @requires_auth decorator on any route.
  • Suggested change: Add a shared-secret header (HS256 JWT or per-task bearer token via Secrets Manager) in a Flask before_request hook.
  • Estimated effort: M
  • Risk if ignored: One SG misconfig → full PII breach + /test/force-status allows arbitrary state transitions.

A.4.2 P1 — Split test/internal routes out of the production Flask app

  • Problem: /test/agent-auth, /test/agent-complete, /test/force-status, /test/setup-mock, /test/e2e-mock, /agent1/restart, /agent2/restart, /agent*/status_detailed, and /setup-fema ship in the same binary as prod.
  • Evidence: fema_agent/src/web_service/app.py:707–838; YAML notes "~12 additional internal/test-only endpoints."
  • Suggested change: Gate behind ENABLE_TEST_ROUTES env var defaulted off, or separate Blueprint loaded only in dev/stg.
  • Estimated effort: S
  • Risk if ignored: Prod blast radius includes arbitrary state manipulation.

A.4.3 P1 — Capture and publish test coverage

  • Problem: YAML lists coverage_pct: null; unknown whether fema_agent/tests/ actually exercises FSM, parser validators, and stale-detection.
  • Evidence: section_15_testing.unit.coverage_pct: null.
  • Suggested change: Wire pytest --cov=fema_agent --cov-report=xml into CI; fail under threshold.
  • Estimated effort: S
  • Risk if ignored: FSM regressions invisible until production.

A.4.4 P2 — Replace print(..., file=sys.stderr) with structured logging

  • Evidence: fema_agent/src/web_service/app.py:357-360, 472-474.
  • Suggested change: Python logging with JSON formatter; include request/survivor correlation id.
  • Estimated effort: S

A.5 Things That Don't Make Sense

  1. Observation: Two agents spawn claude-full subprocesses, but the state machine defines both production (9) and test-only (6) states in the same enum.

    • Location: fema_agent/src/web_service/state.py:8-49.
    • Question for author: Why not a separate TestState enum so the prod FSM's valid-transition matrix stays minimal?
  2. Observation: Sentinel file is polled at two locations (pre-archive and post-archive).

    • Location: fema_agent/src/web_service/agent_runner.py:94+.
    • Question for author: Determined ordering, or genuinely defensive against concurrent archive?

A.6 Anti-Patterns Detected

A.6.1 Code-level

  • God object / god function — app.py:164–838 (1271 LOC file, single create_app)
  • Shotgun surgery
  • Feature envy
  • Primitive obsession
  • Dead code
  • Copy-paste / duplication — /survivor-info, /p1, /p2 near-identical handlers
  • Magic numbers — AGENT_INACTIVITY_TIMEOUT_SECONDS=600, NOVNC_HEARTBEAT_SECONDS=30
  • Deep nesting
  • Long parameter lists
  • Boolean-flag parameters

A.6.2 Architectural

  • Big ball of mud
  • Distributed monolith
  • Chatty services
  • Leaky abstraction — Flask state dict holds Popen handles alongside survivor JSON
  • Golden hammer
  • Vendor lock-in — Bedrock + Claude Code CLI + chrome-devtools-mcp, no abstraction layer
  • Stovepipe
  • Missing seams for testing — hard-coded filesystem paths, direct subprocess.Popen, direct Chrome CDP URLs

A.6.3 Data

N/A — no persistent DB in this repo.

A.6.4 Async / Ops

  • Poison messages
  • Retry storms
  • Missing idempotency keys
  • Hidden coupling via shared state — global Flask _state mutated by multiple routes + polling thread
  • Work queues without visibility

A.6.5 Security

  • Secrets in code / .env committed
  • Missing authn/z on internal endpoints — 22 routes, zero auth decorators
  • Overbroad IAM roles
  • Unvalidated input crossing trust boundary — /test/force-status accepts arbitrary state strings
  • PII/PHI in logs or error messages (suspected) — validation errors embed field values into 400 descriptions

A.6.6 Detected Instances

#Anti-patternLocation (file:line)SeverityRecommendation
1God object / god functionapp.py:164–838 (1271 LOC, single create_app, 22 routes)P1Split into Blueprints: health_bp, survivor_bp, agent_bp, test_bp.
2Copy-paste / duplicationapp.py:303–590 (/survivor-info, /p1, /p2 share near-identical validation + env-setup)P1Extract _ingest_survivor(payload, variant) helper.
3Magic numberssection_10_configuration.thresholds — 600s, 30s defaultsP2Centralize in config.py with rationale.
4Leaky abstractionstate.py:76-104 — Flask state dict holds Popen handles + survivor JSON + state enumP1Wrap in AgentProcess class.
5Vendor lock-inBedrock + Claude Code CLI + chrome-devtools-mcp, no abstractionP2Document exit cost; keep LLM_PROVIDER switch exercised.
6Missing testing seamsapp.py:44-57, agent_runner.py — hard-coded paths, direct subprocessP1Inject Clock, FileSystem, ProcessSpawner.
7Hidden couplingGlobal _state mutated by multiple routes + polling threadP1Serialize mutations behind a locked StateStore.
8Missing authn/zapp.py:188–838 — 22 routes, zero authP0A.4.1.
9Unvalidated input/test/force-status (app.py:763-772) accepts arbitrary state stringsP0Disable in prod (A.4.2); validate enum.
10PII in error messagesapp.py:383,388,422,427,432,438,498,502,520,525,530,583,586 — validation errors embed field valuesP1Scrub PII from 400 descriptions.

A.7 Open Questions

  1. Q: Are /test/* routes compiled out or disabled by env in production deploys?
    • Blocks: A.4.2, A.6 #9
    • Who can answer: Gordon Zheng / platform SRE
  2. Q: What is the actual pytest coverage today?
    • Blocks: A.2 row 15, A.4.3
    • Who can answer: CI logs
  3. Q: Is specs/019-stuck-agent-watcher merged and running in prod?
    • Blocks: Confidence in stale-agent recovery
    • Who can answer: Repo maintainer

A.8 Difficulties Encountered

  • Difficulty: No access to CI output, coverage, or runtime metrics.
    • Impact: Scored row 15 conservatively.
    • Fix: Publish CI summary (coverage %, test count, flake rate) into a badge or ci-summary.json.
  • Difficulty: Two sister repos (af-fema-real-ai-agent) with "near-identical" contents but no shared library.
    • Impact: Unable to tell which is canonical; duplication risk unmeasured.
    • Fix: Extract shared Python package or document divergence explicitly.

A.9 Risks & Unknowns

A.9.1 Known risks

#RiskLikelihoodImpactMitigation
1Data-plane exposure if SG/CloudFront misconfiguresMHA.4.1 add app-layer auth
2DisasterAssistance.gov DOM change breaks Agent scriptsMHScript regeneration + monitoring; manual override via VNC
3Bedrock quota / auth failure mid-submissionMMRunbook exists; stale-detection; retry from operator
4Login.gov session cookie leak via EFS snapshotLHEFS encryption at rest; scope IAM read to task role
51:1 container-to-user scaling hits Fargate limits at volumeMMSpot + quota raises; capacity planning TBD

A.9.2 Unknown unknowns

  • Area not reviewed: CDK stack under aws_deployment/ (IAM policies, SG rules, CloudFront rules).
    • Reason: Out of scope for per-repo Part A.
    • Best guess: M — the entire security model relies on these.
  • Area not reviewed: fema_agent/tests/ contents.
    • Reason: Time.
    • Best guess: M.
  • Area not reviewed: specs/001–020 (20 spec directories).
    • Reason: Time.
    • Best guess: L — docs only.

A.10 Technical Debt Register

#Debt itemQuadrantInterestRemediation
1Unauthenticated Flask data-planeReckless & DeliberateHigh — one misconfig = PII breachPer-task bearer token (A.4.1)
222 routes in one 1271-line app.pyPrudent & InadvertentMedium — slows reviewBlueprints (A.6 #1)
3Test routes ship in prod binaryReckless & DeliberateMedium — /test/force-status blast radiusEnv-gated Blueprint (A.4.2)
4Duplication across /survivor-info handlersPrudent & InadvertentLow — 3× change costExtract helper
5Near-duplicate sister repo af-fema-real-ai-agentPrudent & DeliberateMedium — 2× maintenanceExtract shared package
6Unknown test coverage (null)Prudent & InadvertentMediumWire coverage (A.4.3)
7Global _state dict mutated from multiple sitesPrudent & InadvertentMedium — race riskLocked StateStore
8print(..., file=sys.stderr) for errorsPrudent & InadvertentLowLogger (A.4.4)

A.11 Security Posture (lightweight STRIDE)

CategoryThreat present?Mitigated?Gap
Spoofing (identity)Yes — any caller reaching :5001 is trustedPartial (network only)No app-layer auth (A.4.1)
Tampering (integrity)Yes — /test/force-status can overwrite FSMPartial if test routes disabled (unverified)Confirm prod gating
RepudiationPartial — DynamoDB AuditEvent at control planePartialData-plane actions don't emit per-request audit
Information DisclosureYes — PII in _state, EFS, possibly 400 descriptionsPartialScrub errors; verify EFS encryption
Denial of ServiceYes — unauthenticated endpoints + subprocess spawningPartialNo rate limit at app layer
Elevation of PrivilegeLow at app layerPartialTask IAM role scope not reviewed

A.12 Operational Readiness

CapabilityPresent / Partial / MissingNotes
Structured logsPartialCloudWatch stdout; some print paths
MetricsPartialAgentExecutionTime, ErrorRate listed but runbook links TBD
Distributed tracingMissingNo trace id propagation
Actionable alertsPartialMetrics defined, runbook links TBD
RunbooksPresentrunbook.md covers top 6 issues
On-call ownership definedMissing
SLOs / SLIsMissingPerformance targets TBD
Backup & restore testedN/AEphemeral containers
Disaster recovery planMissing
Chaos / failure testingMissing

A.13 Test & Quality Signals

  • Coverage: N/A — coverage_pct: null
  • Untested critical paths (suspected): FSM stale-detection, dual-location sentinel poll, _clear_chrome_session_restore
  • Missing test types: [ ] unit [x] integration [ ] e2e [x] contract [x] load [x] security/fuzz

A.14 Performance & Cost Smells

  • Hot paths: Agent subprocess poll loop; sentinel file stat.
  • Suspected bottlenecks: Chrome rendering under X11; Bedrock latency dominates per-page time.
  • Oversized infra: 2 vCPU / 4 GB per survivor — verify with telemetry.

A.15 Bus-Factor & Knowledge Risk

  • Only-person: Single author (Gordon Zheng). Script generator + raw/intermediate builder entirely author-originated.
  • What breaks: DOM-selector regressions when DisasterAssistance.gov changes; claude-full wrapper tuning.
  • Tribal knowledge: Mapping between page numbering and script sections; rationale for 2-location sentinel polling.
  • Actions: Architecture walkthrough recording; rationale comments in raw_builder/; pair-review script regeneration with second engineer.

A.16 Compliance Gaps

N/A — YAML does not claim HIPAA/SOC 2/state-insurance compliance. FEMA/PII handling regime implied but not asserted. Formal compliance review belongs in af-infra.


A.17 Recommendations Summary

PriorityActionOwnerEffortDepends on
P0Add app-layer auth (bearer token) to all Flask routesRepo maintainer + SREMSecrets Manager provisioning
P0Disable /test/* and /agent*/restart in prodRepo maintainerS
P1Split create_app into Blueprints; extract _ingest_survivor helperRepo maintainerM
P1Wrap global _state in locked StateStore; wrap Popen in AgentProcessRepo maintainerM
P1Wire pytest coverage into CI with thresholdCI ownerSCI access
P1Inject filesystem/clock/process seams for FSM unit testsRepo maintainerMBlueprint split
P1Scrub PII from 4xx descriptions; structured-log insteadRepo maintainerSLogger replacement
P2Replace print(stderr) with Python loggerRepo maintainerS
P2Centralize magic-number thresholds with rationaleRepo maintainerS
P2Document Bedrock / Claude Code exit strategy or extract shared package with af-fema-real-ai-agentArchitectLCross-repo coordination

Environment variables

NamePurpose
LLM_PROVIDERbedrock|anthropic
LLM_MODEL_NAMEFriendly alias mapped to Bedrock inference profile
BEDROCK_API_KEY*Aliased to AWS_BEARER_TOKEN_BEDROCK
AWS_REGIONBedrock region
WORK_DIRContainer work directory
CLAUDE_CONFIG_DIRClaude credentials path inside container
DISPLAYX11 display
NOVNC_HEARTBEAT_SECONDSIdle keepalive for CloudFront
AGENT_INACTIVITY_TIMEOUT_SECONDSWatchdog for stale agent subprocess
PORT_5001 / PORT_5900 / PORT_6080Host port mappings (set in .env for start_container.sh)