How Healthcare Organizations Use AI Without Violating HIPAA
Practical guide to deploying AI in healthcare while maintaining HIPAA compliance — PHI handling, BAA requirements, de-identification, and audit trails.
TL;DR
- HIPAA’s Privacy Rule permits AI use on protected health information (PHI) when covered entities and their business associates maintain appropriate safeguards
- Any AI vendor processing PHI must sign a Business Associate Agreement (BAA) — no exceptions, including LLM API providers
- De-identification via the Safe Harbor method requires removing all 18 specified identifiers before PHI can be used without HIPAA constraints
- Audit trails for AI systems must log who accessed PHI, when, for what purpose, and what the system did with it
Healthcare organizations are not avoiding AI. According to McKinsey’s 2025 survey, 78% of organizations use AI in at least one business function [1]. Healthcare is no exception — radiology, clinical documentation, patient scheduling, and revenue cycle management all have active AI deployments. The problem is not adoption. It is that most healthcare AI implementations are built on shaky HIPAA foundations that would not survive an Office for Civil Rights (OCR) audit.
The root issue: AI teams treat HIPAA compliance as a checkbox exercise handled by legal, while legal treats AI as a technical problem handled by engineering. Neither side fully understands the other’s constraints, and the gap between them is where violations live.
HIPAA Basics for AI Teams
HIPAA establishes two categories relevant to AI: covered entities (healthcare providers, health plans, healthcare clearinghouses) and business associates (organizations that handle PHI on behalf of covered entities). If your AI system touches PHI in any form — processing, storing, transmitting, or analyzing — the entity operating that system must be either a covered entity or a business associate with a signed BAA [2].
This is not negotiable. There is no “research exception” that lets you feed patient records into an LLM for analysis. There is no “de minimis” threshold below which HIPAA stops applying. If the data contains any of the 18 HIPAA identifiers and relates to a patient’s health condition, healthcare provision, or payment, it is PHI and the full weight of the Privacy Rule applies.
The Business Associate Agreement Requirement
A Business Associate Agreement is a contract between a covered entity and any vendor that will create, receive, maintain, or transmit PHI [2]. For AI implementations, this means:
Needs a BAA
- LLM API providers processing clinical notes
- Cloud hosting providers storing PHI-containing datasets
- AI vendors whose models are trained on or process PHI
- Analytics platforms receiving patient-level data
- Transcription services processing patient encounters
Common mistakes
- Using consumer-tier LLM APIs (no BAA available)
- Assuming “HIPAA compliant” marketing claims substitute for a BAA
- Sending PHI to vendors during proof-of-concept without a BAA
- Using free-tier cloud services that exclude BAA coverage
- Shadow AI — clinicians using ChatGPT for clinical questions with patient context
The BAA must specify: what PHI the business associate can access, how they will safeguard it, what happens in a breach, and how PHI is returned or destroyed when the relationship ends. For AI vendors specifically, the BAA should also address whether PHI can be used for model training, whether model weights are considered to contain PHI, and how the vendor handles data deletion requests when PHI has been incorporated into training datasets.
De-identification: The Two Methods
HIPAA provides two paths for de-identifying PHI so that it can be used without Privacy Rule restrictions [3].
Safe Harbor Method
The Safe Harbor method requires removing 18 specific categories of identifiers:
1// HIPAA §164.514(b)(2) — Safe Harbor identifiers← All 18 must be addressed2const SAFE_HARBOR_IDENTIFIERS = [3"Names",4"Geographic data smaller than state",5"All dates (except year) related to individual",6"Phone numbers",7"Fax numbers",8"Email addresses",9"Social Security numbers",10"Medical record numbers",11"Health plan beneficiary numbers",12"Account numbers",13"Certificate/license numbers",14"Vehicle identifiers and serial numbers",15"Device identifiers and serial numbers",16"Web URLs",17"IP addresses",18"Biometric identifiers",19"Full-face photographs",20"Any other unique identifying number or code"← The one teams miss21];
The 18th identifier — “any other unique identifying number, characteristic, or code” — is where most AI teams stumble. Internal patient identifiers, encounter numbers, and system-generated codes all qualify. If your de-identification pipeline strips names and dates but preserves internal record IDs, the data is still PHI.
Additionally, Safe Harbor requires that the covered entity has no actual knowledge that the remaining information could identify an individual. This means that even after removing all 18 categories, if you know the data could be re-identified through combination with other available data, you cannot rely on Safe Harbor.
Expert Determination Method
The Expert Determination method under §164.514(b)(1) requires a qualified statistical or scientific expert to determine that the risk of re-identification is “very small” [3]. This approach allows more data to be retained (useful for AI model training) but requires:
- A formal statistical analysis of re-identification risk
- Documentation of the methods and results
- The expert’s determination that the risk is very small
- Periodic re-evaluation as new data sources become available that could enable re-identification
For AI training datasets, Expert Determination is often more practical than Safe Harbor because it allows retention of clinical details that are essential for model performance while still managing re-identification risk through statistical methods like k-anonymity, l-diversity, or differential privacy.
Audit Trails for AI Systems
HIPAA’s Security Rule requires audit controls that record and examine activity in information systems containing PHI [4]. For AI systems, this translates to logging requirements that go beyond traditional application logging.
Traditional application audit logging
- ×User logged in at timestamp
- ×Record accessed by user ID
- ×Data modified: field X changed from A to B
- ×Batch job completed with N records processed
AI system audit logging under HIPAA
- ✓Model inference request: who initiated, what PHI was input, what output was generated
- ✓Training data access: which records were included, consent verification status
- ✓Model output usage: was the AI recommendation acted upon by a clinician
- ✓PHI retention: how long does PHI persist in model context, cache, or memory
What OCR Investigators Look For
When the Office for Civil Rights investigates a potential HIPAA violation involving AI, they examine several specific areas:
Minimum Necessary Standard. Did the AI system access only the minimum amount of PHI necessary to accomplish the intended purpose [2]? An AI system that ingests entire patient records when it only needs medication lists violates this principle. Your AI implementation should request and process only the specific data elements needed for the task.
Access Controls. Who can query the AI system, and are their access rights consistent with their role-based permissions in the broader clinical environment? A scheduling AI should not have access to clinical notes, even if the same underlying data lake contains both.
Breach Detection. Can the organization detect unauthorized access to PHI through the AI system? This includes detecting prompt injection attacks against LLM-based systems that could cause them to disclose PHI in unexpected contexts, and monitoring for data exfiltration through model outputs.
Architectural Patterns That Work
Healthcare organizations successfully deploying AI under HIPAA constraints follow specific architectural patterns.
Pattern 1 — On-Premises Processing
Run models locally within the covered entity’s infrastructure. PHI never leaves the organization’s control boundary. Higher infrastructure cost but simplest compliance posture. Common for radiology AI and clinical NLP.
Pattern 2 — BAA-Covered Cloud
Use HIPAA-eligible cloud services (AWS GovCloud, Azure Healthcare APIs, Google Cloud Healthcare API) with signed BAAs. PHI is encrypted in transit and at rest. Requires careful configuration — HIPAA eligibility is not automatic.
Pattern 3 — De-Identified Pipeline
De-identify data before it reaches the AI system. The AI operates on non-PHI data, eliminating most HIPAA requirements at the model layer. Requires robust de-identification validated by the Expert Determination method.
Pattern 3 is the most practical for organizations that want to use third-party AI services without the complexity of BAA negotiations with every vendor. But it requires investment in a reliable de-identification pipeline that is validated, monitored, and regularly tested for re-identification risk.
The Shadow AI Problem
The most pressing HIPAA risk in healthcare AI is not from sanctioned implementations — it is from clinicians and administrators using consumer AI tools for work involving patient information. A physician pasting a clinical note into ChatGPT for a differential diagnosis. A billing specialist uploading a claim to an AI coding assistant. An administrator asking an AI tool to draft a letter referencing patient details.
These uses violate HIPAA because consumer AI tools do not have BAAs with healthcare organizations, and the PHI transmitted to them is outside the covered entity’s control. The fix is not prohibition — it is providing compliant alternatives that are easier to use than consumer tools. Organizations that deploy sanctioned AI tools with proper HIPAA safeguards see shadow AI usage decrease because the compliant option is actually available.
Starting a HIPAA-Compliant AI Project
If your healthcare organization is evaluating AI and wants to avoid the compliance pitfalls that stall most implementations:
-
Classify the data first. Determine whether your use case requires PHI or can work with de-identified data. This single decision determines 80% of your compliance architecture.
-
Get the BAA signed before the POC. Not after. Not during. Before any PHI touches any vendor system. This includes proof-of-concept environments.
-
Build the audit trail from day one. Retrofitting HIPAA-grade audit logging onto a working system is painful and expensive. Design it into the architecture.
-
Involve your privacy officer early. They need to understand what the AI does, what data it accesses, and how outputs are used. If they cannot explain it to OCR, it is not ready for production.
If your team needs help designing an AI implementation that satisfies HIPAA requirements without sacrificing the clinical or operational value that motivated the project, we work with healthcare organizations on this exact problem.
References:
[1] McKinsey & Company. “The State of AI: How Organizations Are Rewiring to Capture Value.” 2025.
[2] U.S. Department of Health and Human Services. “Summary of the HIPAA Privacy Rule.” 45 CFR Parts 160 and 164.
[3] U.S. Department of Health and Human Services. “Guidance Regarding Methods for De-identification of Protected Health Information.” 45 CFR §164.514.
[4] U.S. Department of Health and Human Services. “Summary of the HIPAA Security Rule.” 45 CFR Part 164.
Building AI that needs to understand its users?
Key insights
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →