⚕️ HIPAA Compliance Architecture

As a Healthcare AI Systems Engineer, I treat HIPAA not as a legal footnote bolted on after implementation, but as a first-order force that shapes my architecture, my delivery process, and my release gates. Every design choice is filtered through one hard question: does this protect electronic protected health information (ePHI) while still helping clinicians move faster and decide better? I build so that privacy, security, and operational reliability are the default behavior of the system, not a set of extra steps a tired human has to remember at 2 a.m. under pressure. This page is written as an engineering thesis. Where I make a claim about what HIPAA requires, I point at the regulation itself — the Privacy Rule, the Security Rule, the Breach Notification Rule, and the HITECH/Omnibus extensions — so the reasoning is inspectable instead of asserted. The goal is simple: demonstrate that I can turn the language of 45 CFR Part 164 into software behavior that is testable, observable, and defensible in an audit.

Related Work

GitHub Profile

Healthcare Mission

AI Engineering Methodologies

Primary Sources

HHS HIPAA for Professionals

45 CFR Part 164 (eCFR)

Healthcare software engineers designing AI systems that protect sensitive patient data

🏛️ The Regulatory Foundation I Build On

HIPAA is not one rule; it is a family of interlocking rules, and good healthcare engineering respects all of them at once. I keep the actual regulatory structure in my head while designing, because each rule answers a different question: who may touch the data, how it must be protected, and what happens when something goes wrong. Naming the specific citation is not pedantry — it is how a control survives scrutiny, because an auditor can trace it back to the exact obligation it satisfies.

Privacy Rule (45 CFR Part 164, Subpart E): governs who may use or disclose protected health information and under what conditions, anchored by the minimum necessary standard at §164.502(b) and §164.514(d).
Security Rule (45 CFR Part 164, Subpart C): mandates administrative (§164.308), physical (§164.310), and technical (§164.312) safeguards for electronic PHI, each split into "required" and "addressable" implementation specifications.
Breach Notification Rule (45 CFR §§164.400-414): defines what counts as a breach, the four-factor risk assessment, and notification to individuals, HHS, and in some cases the media within strict timelines.
HITECH Act (2009) and the Omnibus Final Rule (2013): extended direct liability to business associates and their subcontractors, raised civil penalty tiers, and tightened breach accountability across the data supply chain.

Read the Rules

HHS Security Rule

NIST SP 800-66 Rev. 2

🔐 Secure Systems, not Compliance Theater

A great deal of "HIPAA compliance" in the wild is theater: a badge on a marketing page, a policy PDF nobody reads, a checkbox checked the week before an audit. I work the opposite way. A control is only real if you can point to the obligation it satisfies, the code path that enforces it, and the telemetry that proves it operated. That traceability is what lets a security reviewer, a compliance officer, and an engineer all look at the same system and agree it does what it claims. A note on intellectual honesty, because trust depends on it: this page describes engineering practice, not legal advice. The controls I design are validated alongside compliance and legal counsel, because the regulation deliberately leaves room for risk-based judgment. My job is to make those judgments implementable, observable, and reversible — and to never let a confident demo stand in for a verified control.

⚙️ How HIPAA Changes My Engineering Decisions

In healthcare AI, raw speed is not a success metric. A workflow only succeeds if it is useful to clinicians, safe for patients, and defensible in an audit — all three, simultaneously. I make compliance concrete by translating regulatory language into system behavior that can be tested in CI and observed in production, rather than promised in a document.

Minimum necessary by default: every service boundary and UI query returns only the ePHI fields the current task requires, never the full patient record, operationalizing §164.502(b) in code instead of policy prose.
Role-scoped access control at multiple layers: identity-provider claims, API authorization policies, and database row/column rules must all agree before protected data is released, so a single misconfigured layer cannot leak a record.
Encryption in transit and at rest with explicit key ownership, defined rotation windows, and a hard separation between application operators and key-management authority, aligning with the addressable specifications at §164.312(a)(2)(iv) and §164.312(e)(2)(ii).
Comprehensive, immutable audit trails for every ePHI access — who accessed what, why, from which workflow, and whether the access succeeded or failed — satisfying the audit-controls standard at §164.312(b) and supporting accounting-of-disclosures obligations.
Data-retention and deletion policies encoded into scheduled jobs, not just policy documents, so ePHI is not retained past the legal and clinical need and disposal is provable.
Deterministic policy gates that never depend on model correctness: an LLM may draft, summarize, or route, but it never holds the authorization decision that releases protected data.

🧱 Concrete Examples of Software Respecting ePHI

Respecting ePHI means software intentionally limits exposure at every step of the lifecycle: collection, processing, inference, storage, observability, and support operations. Each example below is a place where the safe behavior is the built-in default, not a discipline the operator has to supply.

Clinical summarization agent: middleware strips direct identifiers before prompt assembly when they are not clinically required, injects only the relevant chart sections, and disables prompt/response logging the moment protected values enter the context window.
Nurse triage inbox: the UI masks phone, DOB, and address by default, then reveals each field only on an explicit, role-checked user action tied to the active encounter — and writes that reveal to the audit log.
Model observability pipeline: traces persist tool timing, model version, token cost, and rubric scores, but replace patient identifiers with irreversible, salted tokens so debugging stays useful without ever exposing raw ePHI in telemetry.
Support tooling: internal admin consoles issue just-in-time access grants with auto-expiration and mandatory reason codes, so every privileged ePHI lookup is time-boxed, attributable, and reviewable after the fact.
File export workflows: generated PDFs are encrypted, watermark-stamped with the requesting identity, and signed, with each download event logged so outbound movement of ePHI is provable rather than assumed.
Incident-response automation: a suspected overexposure event triggers immediate scoped access revocation, a targeted re-audit of the affected records, and a notification workflow pre-aligned with the Breach Notification Rule decision tree.

Related Work

SOC 2 Compliant Mobile App

Production LLM Systems

🛡️ The Security Rule, Translated Into Systems

The HIPAA Security Rule organizes its requirements into three safeguard families — administrative, physical, and technical — and marks each implementation specification as either "required" or "addressable." Addressable does not mean optional; it means I must implement the safeguard, adopt a documented equivalent, or record a defensible risk-based reason not to. The accordions below map each family to the concrete engineering work I do, with the governing citation attached so the lineage from regulation to implementation is never broken.

🗂️ Administrative Safeguards — §164.308

Risk analysis, access management, training, and incident procedures

🏢 Physical Safeguards — §164.310

Facility, workstation, and device/media controls

🔧 Technical Safeguards — §164.312

Access control, audit, integrity, authentication, transmission security

🕵️ De-identification, Done Correctly

The single highest-leverage way to reduce HIPAA risk is to stop handling identifiable data wherever the clinical or product need does not require it. But de-identification is a precise legal-technical standard, not a hand-wave. HIPAA recognizes exactly two methods under §164.514: Expert Determination and Safe Harbor. I treat de-identification as a first-class, separately audited service — never as a regex sprinkled through the codebase — because getting it almost right is the same as getting it wrong.

🔒 Safe Harbor — the 18 Identifiers (§164.514(b)(2))

Remove all 18 categories, plus no actual knowledge of residual re-identification

🎓 Expert Determination & the "Actual Knowledge" Caveat (§164.514(b)(1))

Statistical de-identification and the limits of both methods

🤖 Where AI Breaks HIPAA — and How I Stop It

Large language models introduce failure modes that traditional healthcare software never had. A model can memorize training data, a vector index can reconstruct source text, a debug log can quietly capture an entire prompt full of identifiers, and an agent can be coaxed into exfiltrating data through a tool call. These are not hypothetical — they are the predictable ways an AI system leaks ePHI. I design against each one explicitly, because the model is the least trustworthy component in the stack and must be treated that way.

Prompt and completion logging: the most common accidental ePHI sink in LLM products. I classify protected fields explicitly so logging, tracing, and replay tooling drop or tokenize them before anything is persisted.
Training and fine-tuning leakage: protected data never enters a training or fine-tuning corpus without de-identification plus a signed business associate agreement covering the model provider, and ideally never at all.
Vector store exposure: embeddings can memorize and reconstruct source text, so retrieval indexes are scoped, access-controlled, and built from minimum-necessary chunks — an embedding of ePHI is still ePHI.
RAG retrieval boundaries: retrieval is filtered by the requesting user’s authorization before documents reach the context window, so the model can never surface a record the user was not entitled to see.
Tool-call exfiltration: agent tool outputs are schema-validated and egress-filtered so a model cannot smuggle identifiers into a web request, email, or downstream API call.
Third-party model providers: inference happens only through providers under a BAA with zero-retention terms, or through self-hosted models when the data sensitivity warrants keeping inference fully in-boundary.

🧩 The Core Principle: the Model Is Not a Trust Boundary

Every AI-specific safeguard I build comes back to one principle: a probabilistic system must never be the thing that enforces a security guarantee. The model can draft a clinical summary, suggest a triage priority, or route a message — but the decision to release a record, the redaction of an identifier, and the logging of an access are all deterministic gates that sit outside the model and cannot be prompted away. That separation is what lets a team ship genuinely useful AI in a regulated environment without betting patient privacy on a model behaving correctly under an adversarial prompt. The intelligence lives in the model; the trust lives in the boundaries around it.

🏗️ Architecture Patterns I Apply in Healthcare AI

I structure healthcare systems so compliance survives real-world pressure — shift handoffs, incident response, on-call debugging, and production changes shipped at speed. The objective is to make the safe path the easiest path for engineers, clinicians, and operators alike, so the system stays trustworthy precisely when conditions are worst.

Treat HIPAA as an architecture constraint, not a final compliance checklist — privacy and security boundaries are drawn on the first diagram, not retrofitted before an audit.
Place policy enforcement close to the data and repeat it at every trust boundary, so a single missed guard degrades to reduced exposure rather than full disclosure (defense in depth).
Separate deterministic controls from probabilistic behavior: authorization, redaction, and logging gates must never depend on a model producing the cooperative output.
Build de-identification and re-identification as explicit, separately authorized services with their own audit trails — never as ad-hoc utility functions scattered across the codebase.
Require end-to-end lineage from user action to datastore read, so an audit can reconstruct exactly how a given piece of ePHI moved through the system and why.
Design for the worst day: assume a credential will leak, a dependency will be compromised, and an engineer will be paged at 3 a.m. — and make the safe path the path of least resistance under that pressure.

🔩 Implementation-Level Safeguards for ePHI

At the implementation level, I enforce strict boundary controls so that protected data cannot slip through a gap that a higher-level policy assumed was closed. The list below is the kind of mechanical, testable enforcement that turns a privacy intention into a property of the running system.

API schemas reject unapproved ePHI fields at the contract layer, so an over-broad query fails validation instead of quietly returning protected data.
Queue and event payload contracts disallow full-record replication; downstream consumers receive references and minimum-necessary fields, not copies of the chart.
ETL jobs validate redaction before any analytics sink accepts an event, so de-identification failures stop the pipeline rather than polluting the warehouse.
Model-serving paths classify protected fields explicitly so prompt assembly is structurally incapable of including a disallowed identifier.
Observability sampling and retention are tuned to preserve operational signal while suppressing sensitive values, and trace stores inherit the same access controls as production data.

🤝 Business Associate Agreements & the Shared-Responsibility Model

The moment ePHI flows to a vendor — a cloud provider, a model API, an analytics tool, a logging service — that vendor becomes a business associate, and HITECH made business associates and their subcontractors directly liable under the Security Rule. A signed Business Associate Agreement (§164.504(e)) is the contractual floor, not the ceiling. I design integrations so the responsibility split is explicit and enforced in code, not just promised on paper.

No BAA, no ePHI: a vendor without a signed BAA never receives protected data, and the architecture enforces that boundary rather than relying on engineers to remember it.
Subprocessor awareness: I trace where data actually flows, including the subcontractors behind a vendor, because the chain of liability follows the data all the way down.
Zero-retention and scoped-use terms with AI providers, so prompts and completions are not retained, logged, or used for training beyond the agreed purpose.
Egress controls at integration boundaries: only minimum-necessary fields cross to a third party, validated by schema, so a misconfigured call cannot ship a full record.
Vendor security evidence (SOC 2, HITRUST, penetration-test results) reviewed as part of selection, with inherited controls documented rather than assumed.

Comparable Compliance Work

Finequities SOC 2

Agentic Sensitive-Data Pipeline

🚨 Breach Response, Audit Trails & Evidence

A trustworthy system assumes that something will eventually go wrong and is engineered to detect it, contain it, and prove what happened. The Breach Notification Rule sets hard obligations — and hard clocks — so the audit infrastructure has to exist before the incident, not after. I build the telemetry and response paths up front so that a suspected exposure becomes an answerable question instead of a panic.

Notification timelines (§164.404): affected individuals must be notified without unreasonable delay and no later than 60 days after discovery of a breach, which only works if discovery is fast and scoping is precise.
HHS and media notice (§164.406, §164.408): breaches affecting 500 or more residents of a state or jurisdiction trigger media and prompt HHS notification, so the system must quantify blast radius accurately.
Four-factor risk assessment (§164.402): immutable audit trails let the team evaluate the nature of the data, who accessed it, whether it was actually acquired or viewed, and the extent of mitigation — with evidence, not guesses.
Encryption safe harbor: ePHI encrypted to current HHS/NIST standards is generally not considered breached if lost, which is why strong, well-managed encryption is a compliance control, not just a security one.
Accounting of disclosures: per-access logging supports a patient’s right to learn how their information was used and disclosed, turning the audit trail into a patient-facing guarantee.

Primary Source

HHS Breach Notification Rule

HHS Minimum Necessary

✅ Diligence Questions, Answered Directly

These are the questions a security reviewer, compliance officer, or technical co-founder tends to ask when evaluating whether an engineer actually understands healthcare data — not just the vocabulary. Straight answers, no hedging.

🧾 Common Diligence Questions

⚖️ The Threat Model I Design Against

Trustworthy healthcare engineering starts from pessimism about the system’s own components. I assume a credential will eventually leak, a dependency will eventually be compromised, an engineer will eventually be paged half-asleep, and a model will eventually receive an adversarial prompt. Designing for those assumptions — rather than the happy path — is what makes a system hold up when reality stops cooperating.

Compromised credential: least privilege and short-lived, scoped access limit how much any single identity can reach, so one leaked token is contained, not catastrophic.
Insider over-access: just-in-time grants, mandatory reason codes, and reviewable logs make curiosity-driven snooping detectable and accountable.
Supply-chain compromise: dependency pinning, provenance checks, and egress filtering reduce the blast radius of a poisoned package or model.
Prompt injection and jailbreaks: deterministic authorization and egress gates outside the model mean a successful jailbreak still cannot release data the user was not entitled to.
Silent telemetry leakage: observability inherits production access controls and tokenizes identifiers, so the debugging path is not a privacy backdoor.

🧪 How I Prove the Controls Actually Work

A control nobody tests is a hope, not a safeguard. I treat compliance controls the same way I treat any other critical system behavior: with automated verification, evidence, and continuous monitoring. The point is to make "we are compliant" a claim backed by passing tests and live telemetry rather than a slide.

CI checks that fail the build when a schema would allow an unapproved ePHI field, a log statement could capture protected data, or a vector store is built without scoping.
Redaction and de-identification covered by test suites with known-tricky inputs, so a regression in masking is caught before it ships, not after a breach.
Synthetic and de-identified fixtures in every lower environment, so live ePHI never leaves production to begin with.
Runtime monitors and alerting on anomalous access patterns, failed authorizations, and unexpected egress, closing the loop from design-time intent to run-time reality.
Audit-friendly telemetry by default, so the evidence an assessor needs is a query away rather than a forensic reconstruction.

⛑️ Bottom Line

HIPAA-grade engineering is, at its core, the discipline of building systems that are safe to trust with the most sensitive information people have — and being able to prove it. I bring a regulation-grounded, defense-in-depth, evidence-first approach to that work: minimum-necessary data flow, deterministic controls around probabilistic models, immutable audit trails, and de-identification treated as a first-class service. My fintech SOC 2 experience at Finequities sharpened these instincts, and I apply them directly to healthcare AI, where the stakes are not credibility but human well-being. If your team is building serious healthcare AI for real clinical workflows and you want privacy intent to show up in everyday engineering execution — not just in a policy binder — I would value the chance to talk.

Schedule a Meeting

Book Time with Ryan