Single Python script (fema_form_filler.py, 589 LOC) using PyMuPDF (fitz). Opens fema_form_ff.pdf (a 178 KB partially-pre-filled FEMA FF-104), inserts 5 text fields by their internal PDF widget names, renders every page as a raster image at configurable DPI (default 150) and embeds the images into a new PDF — flattening the form so it can no longer be edited. Optionally uploads the result to S3 via boto3 (lazy-imported).
Role in the system: Pre-dates the agentic browser approach; retained because the FF-104 requires a wet (handwritten) signature, so the filled PDF must leave the digital domain to be signed.
Surfaces:
- CLI: python3 fema_form_filler.py [options]
- Module API: from fema_form_filler import FEMAFormFiller, fill_fema_form
- Optional S3 uploader (FEMAFormFiller.upload_to_s3)
User workflows
Run with default test data
Smoke test confirms install
Run with custom user data
Survivor receives PDF to print + sign
Module-level invocation
Caller hands the PDF off to the survivor
API endpoints
- CLI
python3 fema_form_filler.pyFill + flatten the FF-104 - PYTHON
FEMAFormFiller(input_pdf, dpi=150)Class constructor - PYTHON
FEMAFormFiller.fill(...)Fill with kwargs - PYTHON
FEMAFormFiller.fill_from_dict(data, output_path)Fill from dict - PYTHON
FEMAFormFiller.to_base64(pdf_path)Base64 encode the result - PYTHON
FEMAFormFiller.upload_to_s3(pdf_path, bucket, s3_key?, ...)Upload to S3 (lazy boto3 import) - PYTHON
fill_fema_form(...) (convenience)Module-level convenience wrapper
Third-party APIs
AWS S3 (optional)
Upload flattened PDF for downstream pickup
Analysis
af-fema-form-automation — Prop-Build Analysis
Document Type: Critical Review & Analysis (companion to prop-build-template.md)
Scope: Per-Repo / Per-Module
Subject: af-fema-form-automation (FEMA FF-104 Form Filler)
Reviewer(s): Claude (automated code review)
Date: 2026-04-09
Version: 0.1
Confidence Level: Medium
What would raise confidence: Running the CLI against a real template, observing a caller integration (af-backend-go-api or an agent repo) invoking the script end-to-end, and inspecting the S3 bucket policy actually used in production.
Inputs Reviewed:
- Prop-build doc:
/Users/andres/src/af/af-analysis/data/af-fema-form-automation.yaml - Source:
/Users/andres/src/af/af-fema-form-automation/fema_form_filler.py(589 LOC),test_fema_form_filler.py,README.md - Commit: 4e10f0d (only commit; initial 2025-12-16)
Part A — Per-Repo / Per-Module Analysis
A.1 Executive Summary
- Overall health: Small, single-file Python utility (~589 LOC) with a decent pytest suite and a narrow, well-defined job; it does what it claims and does it synchronously in <1s.
- Top risk: PII (name, DOB, physical address) flows through the process and out to an optional S3 bucket with no enforced encryption, no key-rotation story, no audit log, and no retention guarantees — hardening is explicitly pushed to the caller (
fema_form_filler.py:325-431, yaml §17). See A.6.5 and A.10. - Top win / thing worth preserving: The lazy
boto3import pattern (fema_form_filler.py:358-366) is exemplary — keeps the optional dependency genuinely optional with a clean error surface; propagate to other af-* utilities. - Single recommended next action: Add a CI workflow (pytest + lint) and a
pyproject.toml/requirements.txtwith pinned versions so the utility has a reproducible build and quality gate. - Blocking unknowns: Whether any caller actually uses
upload_to_s3in production, and what the target bucket's encryption/lifecycle/IAM posture is — I could not verify from this repo alone (see A.8, A.9.2).
A.2 Health Scorecard
| # | Dimension | Score (1–5) | Justification |
|---|---|---|---|
| 1 | Module overview / clarity of intent | 5 | Single-purpose script; README + module docstring state the job crisply (fema_form_filler.py:1-38). |
| 2 | External dependencies | 4 | Minimal runtime deps (PyMuPDF); boto3 optional & lazy-imported (:358-366). No pinning/manifest file is the sole gap. |
| 3 | API endpoints | 4 | CLI + class + convenience fn are consistent; arg parsing clean (:438-544). Return-dict schema uniform. |
| 4 | Database schema | N/A | No database. |
| 5 | Backend services | 4 | The fill+flatten pipeline is linear, readable, ~80 lines (:172-259). |
| 6 | WebSocket / real-time | N/A | Synchronous one-shot. |
| 7 | Frontend components | N/A | No UI. |
| 8 | Data flow clarity | 4 | data-flow.md companion + straight-line pipeline; easy to trace. |
| 9 | Error handling & resilience | 3 | try/except/finally around fill; S3 error classes split (:408-423); but bare except Exception swallows details (:237,316,424) and errors are returned as dicts, not raised — callers may miss failures. |
| 10 | Configuration | 3 | DPI + paths via CLI; no env-var wiring for anything besides AWS; no config file. Good enough for a one-shot util. |
| 11 | Data refresh patterns | N/A | Not applicable. |
| 12 | Performance | 4 | <1s per PDF is fine for the intended volume; no parallelization needed. |
| 13 | Module interactions | 3 | Documented in yaml §13 as "likely invoked by…" — no concrete caller contract in this repo. |
| 14 | Troubleshooting / runbooks | 4 | Companion runbook.md plus README troubleshooting section cover the common failure modes. |
| 15 | Testing & QA | 4 | ~30+ tests across 7 classes in test_fema_form_filler.py; no coverage number reported, no CI to enforce. |
| 16 | Deployment & DevOps | 2 | No CI, no pyproject.toml, no requirements.txt, no Dockerfile, no GitHub Actions (verified in yaml §2 notable). Manual pip install only. |
| 17 | Security & compliance | 2 | Handles FEMA Privacy Act PII with no in-repo encryption/retention/IAM guidance, no audit trail, bare-except in S3 path can log sensitive errors. The form is the Privacy Act release — posture needs to be stronger than "caller's problem". |
| 18 | Documentation & maintenance | 4 | Strong README, yaml prop-build, 4 companion markdown docs. |
| 19 | Roadmap clarity | 2 | Yaml §19 says "no active roadmap"; tech debt listed but not owned or dated. |
Overall score: 3.44 average across the 16 rated dimensions (N/A excluded). Weighted reading: the score is dragged down by deployment (#16), security (#17), and roadmap (#19) — the code itself is solid, but the operational envelope around it is thin for something touching Privacy Act data.
A.3 What's Working Well
-
Strength: Lazy optional dependency import for
boto3- Location:
fema_form_filler.py:357-367 - Why it works:
boto3is imported insideupload_to_s3inside a try/exceptImportError, so the core fill path has zero AWS baggage and the S3 feature degrades gracefully with a clear remediation message. Keeps the runtime surface minimal. - Propagate to: af-map, af-backend-go-api helpers, any af-* Python utility where an optional heavy SDK is used by one code path.
- Location:
-
Strength: Defensive resource cleanup with
finally- Location:
fema_form_filler.py:248-259 - Why it works: Both
docandnew_docare closed infinallywith nested try/except, so a partial failure mid-pipeline does not leak PyMuPDF handles — nontrivial for a library that wraps native MuPDF state. - Propagate to: Other file-handle-heavy utilities in the platform.
- Location:
-
Strength: Uniform result-dict contract
- Location:
fema_form_filler.py:227-246,:308-322,:400-431 - Why it works: Every public method returns
{success, ..., message}so callers can branch on a single shape. Easy to script against. - Propagate to: Other Python helper modules consumed by the Go backend via subprocess.
- Location:
-
Strength: Deliberate scope boundary — wet signature stays physical
- Location:
fema_form_filler.py:56-64(FIELD_MAP comment) + yaml §17pii_handling - Why it works:
place_of_birthand the signature are intentionally left blank because the form's Privacy Act semantics require a handwritten signature. Rasterizing the output also prevents downstream tampering or field extraction. This is a principled product decision, not a bug.
- Location:
A.4 What to Improve
A.4.1 P0 — PII/Privacy Act data handling has no in-repo guardrails
- Problem: The utility fills a Privacy Act release form with name + DOB + full physical address and then optionally pushes the resulting PDF to an S3 bucket chosen by the caller. There is no enforced SSE, no required IAM scope, no retention/TTL, no audit log, and local output files persist in cwd by default (
fema_form_signature_required.pdf) with no cleanup. - Evidence:
fema_form_filler.py:76-77(default output in cwd),:325-431(S3 upload takes any bucket, any creds, no SSE param, no KMS, noServerSideEncryptioninExtraArgsat:389), yaml §17 explicitly delegates bucket hardening to caller. - Suggested change: (a) Require
ServerSideEncryption="aws:kms"(or at minimumAES256) in theExtraArgsofs3_client.upload_file; (b) accept and forward an optionalkms_key_id; (c) emit a loud warning (or refuse) if the target bucket does not have default encryption / Block Public Access; (d) add acleanup()helper and document temp-file lifecycle; (e) write a minimal audit record (who/when/key) even if only to stderr JSON. - Estimated effort: M
- Risk if ignored: Privacy Act breach, FEMA compliance exposure, loss of survivor trust, legal liability. The form is the Privacy Act release — this is the one repo in the fleet where "caller's problem" is the wrong answer.
A.4.2 P1 — No CI / no dependency manifest / no version pinning
- Problem: The repo has no
pyproject.toml, norequirements.txt, no.github/workflows, no lockfile. Install is "pip install pymupdf". A PyMuPDF breaking change (e.g. widget API) will silently rot the script; there is no automated test run on push. - Evidence: yaml §2
notable, yaml §16pipeline: None, yaml §19 tech_debt row 1. Verified via the yaml note "No pyproject.toml / requirements.txt / Dockerfile / GitHub Actions in repo (verified 2026-04-09 via gh contents walk at commit 4e10f0d)". - Suggested change: Add
pyproject.tomlwith pinnedpymupdf==X.Y.Z,boto3as extras, and a GitHub Actions workflow runningpytest+ruffon push/PR. - Estimated effort: S
- Risk if ignored: Silent regressions; upgrade pain; no quality gate on future changes.
A.4.3 P1 — Bare except Exception masks failures and may log PII
- Problem: Three locations catch
Exceptiongenerically and stringifystr(e)into the returnedmessage(:237-246,:316-322,:424-431). If PyMuPDF or boto3 raises with PII-containing paths or data in the exception message, that string goes into a result dict that callers may log. Additionally:244passes the fulldatadict back in the failure result. - Evidence:
fema_form_filler.py:237(except Exception as e: ... f"Error filling form: {str(e)}"),:244(data echoed on failure),:316(base64),:424(S3). - Suggested change: Narrow the except clauses (
fitz.FileDataError,OSError, etc.) and scrub PII from any error message/dict that flows to callers; log detail to a controlled sink only. - Estimated effort: S
- Risk if ignored: PII leakage via logs; debugging gets harder because root causes get flattened into strings.
A.4.4 P1 — Silent default substitution of "Doe, John" into real PDFs
- Problem:
fill()falls back to the built-inDEFAULT_DATA = {"name_last_first": "Doe, John", ...}(:67-73, :147-151) whenever a field is missing or empty. A partially-populated invocation from a caller silently emits a PDF with "Doe, John" baked into the raster. There is no required-field validation and nostrictmode. - Evidence:
fema_form_filler.py:139-152. - Suggested change: Add a
strict: bool=Falsekwarg that raises on missing required fields; log a warning whenever defaults are substituted in non-strict mode; never auto-substitute in CI/production modes. MoveDEFAULT_DATAto the test module. - Estimated effort: S
- Risk if ignored: A production caller that accidentally drops a field produces a legally-ambiguous PDF with a stranger's name on it.
A.5 Things That Don't Make Sense
-
Observation:
FIELD_MAPuses the key"print_name"internally (:60) but the public kwarg isname_first_last(:119). The mapping infill()re-resolves this (:180).- Location:
fema_form_filler.py:57-64vs:116-124, :178-183 - Hypotheses considered: Historical rename; field name mirrored from the PDF widget label.
- Question for author: Is
"print_name"vestigial from an earlier PDF revision? Can the key be renamed to"name_first_last"to eliminate the dictionary alias?
- Location:
-
Observation: Default test data is production code, not test code.
DEFAULT_DATA(:67-73) ships in the production module and is used whenever a caller omits fields.- Location:
fema_form_filler.py:67-73, :147-151 - Hypotheses considered: Convenience for smoke testing; leftover dev scaffolding.
- Question for author: Should
DEFAULT_DATAmove totest_fema_form_filler.pyand the productionfill()raise when a field is missing?
- Location:
A.6 Anti-Patterns Detected
A.6.1 Code-level
- God object / god function
- Shotgun surgery (one change touches many files)
- Feature envy (method uses another class's data more than its own)
- Primitive obsession
- Dead code
- Copy-paste / duplication
- Magic numbers / unexplained constants
- Deep nesting (>3 levels)
- Long parameter lists (>4)
- Boolean-flag parameters that change behavior
A.6.2 Architectural
- Big ball of mud
- Distributed monolith (micro-services that must deploy in lockstep)
- Chatty services (N+1 at service boundary)
- Leaky abstraction / inappropriate intimacy between layers
- Golden hammer (one tool used for everything)
- Vendor lock-in without exit strategy
- Stovepipe / reinvented wheel
- Missing seams for testing (hard-coded clocks, network, filesystem)
A.6.3 Data
- God table
- EAV (entity-attribute-value) abuse
- Missing indexes on hot queries
- N+1 queries
- Unbounded growth / no retention policy
- Nullable-everything schemas
- Implicit coupling via shared database
A.6.4 Async / Ops
- Poison messages with no dead-letter queue
- Retry storms / no backoff
- Missing idempotency keys on non-idempotent ops
- Hidden coupling via shared state
- Work queues without visibility / depth metrics
A.6.5 Security
- Secrets in code,
.envcommitted, or logs - Missing authn/z on internal endpoints
- Overbroad IAM roles / least-privilege violations
- Unvalidated input crossing a trust boundary
- PII/PHI in logs or error messages
- Missing CSRF / XSS / SQLi / SSRF protections where relevant
A.6.6 Detected Instances
| # | Anti-pattern | Location (file:line) | Severity (P0/P1/P2) | Recommendation |
|---|---|---|---|---|
| 1 | Magic numbers (text offset +2/+9, DPI 50/600, fontsize 8) | fema_form_filler.py:109, :194, :199 | P2 | Named constants (TEXT_X_OFFSET, TEXT_Y_OFFSET, MIN_DPI, MAX_DPI, TEXT_POINT_SIZE) with a comment explaining why +2/+9 fits inside the widget rect. |
| 2 | Missing seams for testing — datetime.now() called directly (:151), filesystem path hard-coded (:76-77) | fema_form_filler.py:151, :76-77 | P2 | Inject a clock (now_fn) and a base path; trivial to mock without monkeypatching datetime. |
| 3 | Unbounded growth / no retention on output PDFs or S3 objects | fema_form_filler.py:76-77, :223, :325-431 | P1 | Caller-facing cleanup API; documented bucket lifecycle requirement; refuse to overwrite without --force. |
| 4 | Implicit overbroad IAM expectations on S3 upload | fema_form_filler.py:373-398 | P1 | Document the minimum IAM (s3:PutObject only) in README; ship a sample bucket policy. |
| 5 | PII in error messages via str(e) and echoed data dict | fema_form_filler.py:237-246, :316-322, :424-431 | P1 — Privacy Act sensitive | Narrow the except, scrub str(e) before surfacing, and never include the data dict in the returned error path unredacted. |
| 6 | Silent default substitution of "Doe, John" into real PDFs | fema_form_filler.py:67-73, :147-151 | P1 | Add strict mode; warn when defaults used. Cross-ref A.4.4. |
A.7 Open Questions
- Q: Which caller(s) actually use
upload_to_s3, which bucket, which region, and what is that bucket's encryption/lifecycle/IAM policy?- Blocks: A.6.6 #4, A.11 Information Disclosure row
- Who can answer: af-backend-go-api owner; infra/compliance
- Q: Is
"print_name"key vestigial? Safe to rename?- Blocks: A.5 #1
- Who can answer: Gordon Zheng (original author)
- Q: Are there any plans to support an e-signature path (DocuSign/Adobe Sign)? yaml §19 lists it as tech debt.
- Blocks: Roadmap clarity (A.2 #19)
- Who can answer: Product/compliance
A.8 Difficulties Encountered
- Difficulty: No CI artifacts, no coverage report, no runtime metrics — I could not verify that any caller actually invokes this module today.
- Impact on analysis: A.2 #13 (module interactions) and A.9.2 below are based on yaml §13's "Likely invoked via…" language rather than grepped call sites.
- Fix that would help next reviewer: A short
USAGE.mdin this repo (or pointer in README) naming the concrete caller(s), their invocation mode (subprocess vs import), and the target S3 bucket.
- Difficulty: The input PDF template (
fema_form_ff.pdf) is a binary checked into the repo and I did not render or validate it.- Impact on analysis: Cannot verify the 5 widget names
FIELD_MAPtargets are still present in the current FEMA template, or that the +2/+9 offset still lands inside each rect after any template rev. - Fix that would help next reviewer: A golden-image test: render page 1 of the output PDF and compare hashes.
- Impact on analysis: Cannot verify the 5 widget names
- Difficulty: No running environment; I did not execute the tests.
- Impact on analysis: A.13 coverage fields are unknown.
- Fix: Publish a coverage badge or
pytest --covartifact.
A.9 Risks & Unknowns
A.9.1 Known risks
| # | Risk | Likelihood (L/M/H) | Impact (L/M/H) | Mitigation |
|---|---|---|---|---|
| 1 | FEMA reissues FF-104 template; widget names shift; output PDF silently misfiles data into wrong fields | M | H | Golden-image tests; assert len(filled_fields) == 5 and raise if not. Currently fields_filled count is returned but never asserted. |
| 2 | PyMuPDF major version bump breaks widget/pixmap APIs | M | M | Pin version; add CI. |
| 3 | S3 target bucket is public or unencrypted because the caller passed in a mis-hardened bucket | L–M | H (Privacy Act) | See A.4.1. |
| 4 | PII captured in error message field and logged upstream | M | M–H | Narrow except clauses; scrub. |
| 5 | Default DEFAULT_DATA ("Doe, John") silently baked into a real survivor's PDF due to a dropped field upstream | L | H | Strict mode; cross-ref A.4.4. |
A.9.2 Unknown unknowns
- Area not reviewed: The actual FEMA PDF template (
fema_form_ff.pdf) — did not render, did not diff against the current FEMA.gov FF-104 revision.- Reason: Binary; no rendering environment in this review.
- Best guess at risk level: Medium — template drift is the most likely way this utility fails silently in production.
- Area not reviewed: Real caller integration — no grep across af-backend-go-api or the agent repos was performed from this review.
- Reason: Scope limited to this repo + its yaml.
- Best guess at risk level: Medium — I'm taking yaml §13 on faith.
- Area not reviewed: The test file
test_fema_form_filler.pywas not read line-by-line; I counted classes from the yaml summary.- Reason: Time boxed.
- Best guess at risk level: Low — the class inventory suggests broad coverage, but I cannot attest to it.
- Area not reviewed: The 4 companion markdown files (api-examples, data-flow, runbook, deployment) were not opened.
- Reason: Not load-bearing for code-level findings.
- Best guess at risk level: Low.
A.10 Technical Debt Register
| # | Debt item | Quadrant | Estimated interest | Remediation |
|---|---|---|---|---|
| 1 | No CI, no dependency manifest, no version pinning | Reckless & Inadvertent | High — silent breakage on any PyMuPDF upgrade; no quality gate on PRs | Add pyproject.toml with pinned deps + GitHub Actions running pytest. Effort: S. |
| 2 | S3 upload path has no enforced SSE / IAM scoping / audit trail for Privacy Act PII | Reckless & Deliberate (yaml §17 explicitly punts to caller) | High — single compliance incident dwarfs the fix cost | Require SSE-KMS in ExtraArgs; document minimum IAM; add audit line. Effort: M. |
| 3 | Bare except Exception with str(e) surfacing PII-laden error messages | Prudent & Inadvertent | Medium — PII leak risk, debugging friction | Narrow except clauses; scrub messages. Effort: S. |
| 4 | DEFAULT_DATA lives in production module and silently substitutes | Reckless & Inadvertent | Medium — correctness hazard in partial-data invocations | Move to tests; add strict mode. Effort: S. |
| 5 | Hard-coded field mappings tied to one FEMA template revision | Prudent & Deliberate (yaml §19) | Medium — one FEMA template rev away from failure | Golden-image test + assertion on fields_filled == 5. Effort: S. |
| 6 | Magic constants (DPI bounds, text offsets, font size) | Prudent & Inadvertent | Low — readability | Named constants. Effort: S. |
| 7 | E-signature alternative not explored (yaml §19) | Prudent & Deliberate | Low today, medium if FEMA permits e-sig for FF-104 | Product decision; out of scope for code. |
A.11 Security Posture (lightweight STRIDE)
| Category | Threat present? | Mitigated? | Gap |
|---|---|---|---|
| Spoofing (identity) | Low — no auth surface in-process; survivor identity is established upstream | N/A here | Caller is responsible |
| Tampering (integrity) | Medium — rasterization prevents field edits, but the flattened PDF itself is unsigned | Partial — raster flatten (:205-220) | No digital signature / hash of output; no tamper-evident wrapper |
| Repudiation (non-repudiation) | Yes — no audit log of who generated which PDF with which data | No | Add a structured audit line (user id + timestamp + output hash) |
| Information Disclosure | Yes — primary concern. PII in the PDF, in S3 uploads, in error strings | Partial | See A.4.1, A.4.3; no SSE enforced; no log scrubbing |
| Denial of Service | Low — single-shot sync; <1s; no network listener | Implicit | N/A |
| Elevation of Privilege | Low in-process; S3 creds come from caller | N/A | Document least-privilege IAM |
A.12 Operational Readiness
| Capability | Present / Partial / Missing | Notes |
|---|---|---|
| Structured logs | Missing | Plain print to stdout in CLI main (:528-535); no structured logger. |
| Metrics | Missing | — |
| Distributed tracing | Missing | — |
| Actionable alerts | Missing | Caller-owned per yaml §14. |
| Runbooks | Present | runbook.md companion. |
| On-call ownership defined | Missing | No CODEOWNERS; yaml lists a single author email. |
| SLOs / SLIs | Missing | Not meaningful for a one-shot util, but target runtime would be cheap to commit to. |
| Backup & restore tested | N/A | Stateless. |
| Disaster recovery plan | N/A | Stateless; input PDF is in-repo. |
| Chaos / failure testing | Missing | — |
A.13 Test & Quality Signals
- Coverage (line / branch): Unknown — no coverage artifact, no CI. (yaml §15
coverage_pct: null) - Trend: Unknown (single commit history).
- Flake rate: Unknown.
- Slowest tests: Unknown; the integration test renders the actual PDF and is presumably the slowest.
- Untested critical paths: Template drift (what happens if a widget is missing from a new FEMA template — no test asserts the count), strict/partial-data failure modes, S3 error paths beyond the mocked happy path.
- Missing test types: [ ] unit (present) [ ] integration (present, in-process) [x] e2e (only the default-data smoke test) [x] contract [x] load [x] security/fuzz
A.14 Performance & Cost Smells
- Hot paths: Fill + flatten at
fema_form_filler.py:172-223— I/O bound, <1s per invocation. - Suspected bottlenecks: Pixmap rendering at 150 DPI (
:211); acceptable for a 2-page form. - Wasteful queries / loops: None material; the nested widget loop at
:187-203is O(pages × widgets) which is tiny. - Oversized infra / idle resources: N/A — runs where the caller runs.
- Cache hit/miss surprises: N/A.
A.15 Bus-Factor & Knowledge Risk
- Who is the only person who understands X? Gordon Zheng (yaml authors, single-commit history).
- What breaks if they disappear tomorrow? Re-deriving the widget-name mapping when FEMA rev's the template; understanding why the +2/+9 offset was chosen.
- What is undocumented tribal knowledge? The offset/DPI calibration; the rationale for the rasterize-to-flatten approach vs PyMuPDF's native
remove_widgets; the choice of which fields to leave blank. - Suggested knowledge-transfer actions: (a) Comment the offset derivation with a line about "empirically chosen so 8pt Helvetica sits inside the widget rect on this template"; (b) add a second reviewer via CODEOWNERS; (c) record a 5-minute Loom.
A.16 Compliance Gaps
| Regulation | Requirement | Status | Gap | Remediation |
|---|---|---|---|---|
| FEMA Privacy Act of 1974 | Minimum-necessary collection, secure transmission/storage, auditability of disclosure | Partial | No in-repo enforcement of SSE/IAM/audit logging; error strings may leak PII; rasterized PDFs live in cwd by default | See A.4.1; add audit line + scrubbed errors + enforced SSE-KMS on upload. |
| AWS Well-Architected — Security | SSE, Block Public Access, least-privilege IAM, logged writes | Missing (in this repo; delegated to caller) | No enforcement; no sample policy shipped | Ship a CloudFormation/Terraform snippet of the expected bucket + IAM policy in deployment.md. |
| Data minimization | Only collect what is needed | Present | Only 5 fields collected; SSN intentionally excluded; place_of_birth left blank | Preserve. Make DEFAULT_DATA dev-only to avoid shadow minimization violations (cross-ref A.4.4). |
A.17 Recommendations Summary
| Priority | Action | Owner (suggested) | Effort | Depends on |
|---|---|---|---|---|
| P0 | Enforce SSE-KMS and scoped IAM on upload_to_s3; add audit line; document required bucket policy (A.4.1, A.6.6 #4, A.16) | Module owner + infra/compliance | M | Confirmed target bucket (A.7 Q1) |
| P0 | Scrub PII from error strings and narrow except Exception clauses (A.4.3, A.6.6 #5) | Module owner | S | — |
| P1 | Add strict mode and kill silent DEFAULT_DATA substitution in production paths (A.4.4, A.6.6 #6) | Module owner | S | — |
| P1 | Add CI (GitHub Actions running pytest + ruff), pyproject.toml, pinned PyMuPDF (A.4.2, A.10 #1) | Module owner | S | — |
| P1 | Golden-image test + assertion on fields_filled == 5 to catch FEMA template drift (A.9.1 #1, A.10 #5) | Module owner | S | — |
| P1 | Document the concrete caller(s) and target S3 bucket in README (A.8) | Module owner + af-backend-go-api owner | S | — |
| P2 | Replace magic numbers with named constants; inject clock for testability (A.6.6 #1, #2) | Module owner | S | — |
| P2 | Rename print_name key to name_first_last (A.5 #1) | Module owner | S | Author confirmation |
| P2 | Explore e-signature alternative if FEMA rules permit (yaml §19) | Product | L | Compliance decision |
Environment variables
| Name | Purpose |
|---|---|
AWS_ACCESS_KEY_ID | S3 upload (optional) |
AWS_SECRET_ACCESS_KEY | S3 upload (optional) |
AWS_DEFAULT_REGION | S3 upload (optional) |
