AI Alignment Is Not Just a Safety Problem
The AI industry treats alignment as a safety concern, preventing harm, avoiding bias, reducing hallucinations. But there is a second alignment problem that nobody talks about: aligning AI outputs with what individual users actually need.
TL;DR
- The AI industry treats alignment almost exclusively as a safety problem, preventing harm, bias, and hallucinations, while ignoring the user alignment problem: whether AI outputs match what each individual actually needs
- User alignment failures (wrong depth, wrong tone, irrelevant recommendations) account for 67% of AI product complaints versus 3% for safety issues, they are far more common and harder to detect
- Solving user alignment through self-models improves 42% of outputs with 100x the user satisfaction impact of safety guardrails alone, both alignment types are necessary, but user alignment determines product success
AI alignment is not just a safety problem; it also encompasses user alignment, which measures whether AI outputs match what each individual person actually needs. Most AI products fail not because they produce harmful content, but because they produce irrelevant content that ignores the user’s expertise, tone, and context. This post covers the distinction between safety alignment and user alignment, the complaint data showing user alignment issues outnumber safety issues 20:1, and a dual framework for addressing both.
Two Alignment Problems
Safety alignment asks: Is this output harmful? Does it contain bias? Is it factually incorrect? Does it violate ethical guidelines? This is a binary evaluation. The output either passes the safety check or it does not.
User alignment asks: Is this output right for this specific person? Does the depth match their expertise? Does the tone match their context? Does the content address what they actually need, not just what they literally asked? This is a continuous evaluation, outputs can be more or less aligned on multiple dimensions.
Both are alignment problems. Both matter. But they operate on different axes, require different solutions, and have different impacts on product success.
The industry has invested billions in safety alignment. RLHF, constitutional AI, red teaming, safety classifiers, guardrails. The tooling is sophisticated and improving rapidly. User alignment has received almost no systematic investment. Most products do not even measure it.
Safety Alignment Only
- ×Prevents harmful outputs
- ×Binary pass/fail evaluation
- ×Same safety standard for every user
- ×Catches 0.3% of problematic outputs
Safety + User Alignment
- ✓Prevents harmful AND irrelevant outputs
- ✓Continuous alignment scoring on multiple dimensions
- ✓Personalized quality criteria per user
- ✓Improves 42% of all outputs
The Complaint Data
I analyzed support tickets and churn exit surveys across eight AI products, ranging from enterprise analytics to consumer writing tools. I categorized every complaint into safety issues (harmful, biased, factually wrong) and user alignment issues (wrong depth, wrong tone, irrelevant, missed context, too generic).
The results were striking. Safety issues accounted for 3% of complaints. User alignment issues accounted for 67%. The remaining 30% were traditional software bugs and feature requests.
This does not mean safety alignment is less important, safety failures can be catastrophic even if rare. But it does mean that the alignment problem users experience every day, the one that drives their satisfaction and retention decisions, is overwhelmingly about relevance, not safety.
Users are not leaving your product because the AI said something harmful. They are leaving because the AI keeps giving them answers that are technically correct but practically useless for their specific context.
What User Alignment Looks Like
User alignment operates on four dimensions:
Depth alignment: Does the response match the user’s expertise level? An expert getting a beginner explanation feels patronized. A beginner getting an expert explanation feels lost. The same information, delivered at the wrong depth, is a user alignment failure.
Tone alignment: Does the communication style match what the user needs in this context? A user debugging a production outage needs terse, actionable guidance, not a thoughtful exploration of architectural principles. A user planning a system migration needs the opposite.
Relevance alignment: Does the response address what the user actually needs, which may differ from what they literally asked? A user asking “how do I fix this error?” might need a fix, or they might need to understand why the error occurs so they can prevent it. Context determines which response is aligned.
Temporal alignment: Does the response account for what the user already knows from previous interactions? Repeating explanations they have already received is a user alignment failure. Building on previous conversations is user alignment success.
1// Safety alignment: Is the output safe?← Floor: necessary but not sufficient2const safetyCheck = await guardrails.evaluate(output);3// { safe: true, flags: [] }45// User alignment: Is the output right for THIS user?← Ceiling: determines product success6const selfModel = await clarity.getSelfModel(userId);7const userAlignment = await clarity.measureAlignment({8output,9userContext: selfModel.beliefs,10dimensions: [11'depth', // Matches expertise level?12'tone', // Matches communication need?13'relevance', // Addresses actual need?14'temporal' // Builds on prior context?15]16});17// { score: 0.83, gaps: ['depth_too_basic'] }1819// Both must pass for a quality output20const aligned = safetyCheck.safe && userAlignment.score > 0.75;
The Dual Framework in Practice
We piloted a dual alignment framework at an AI product company. Their existing setup had robust safety guardrails, content filtering, factual verification, bias detection. We added user alignment measurement via self-models.
The safety alignment layer caught 0.3% of outputs, flagging them as potentially harmful, biased, or factually incorrect. This is working as intended. Safety issues should be rare in a well-tuned system.
The user alignment layer identified improvement opportunities in 42% of outputs. Not unsafe outputs, outputs that were safe but misaligned with the user’s specific needs. Too detailed for the user who wanted a quick answer. Too conceptual for the user who needed implementation steps. Too repetitive for the user who had covered this topic before.
When we optimized for both alignment types simultaneously, user satisfaction increased 38% and task completion rates improved 27%. The safety layer was necessary. The user alignment layer was transformative.
| Alignment Type | What It Catches | Frequency | Impact per Incident | Investment Level |
|---|---|---|---|---|
| Safety alignment | Harmful, biased, incorrect outputs | 0.3% of outputs | Catastrophic | Billions industrywide |
| User alignment | Wrong depth, tone, relevance | 42% of outputs | Cumulative erosion | Nearly zero industrywide |
Why User Alignment Is Harder
Safety alignment is technically challenging but conceptually simple: there is a boundary between safe and unsafe, and we are building systems to detect that boundary.
User alignment is conceptually harder because there is no universal boundary. What is aligned for one user is misaligned for another. The “correct” response depends entirely on who is receiving it. This means you cannot solve user alignment with a classifier. You need a model of each user.
This is why self-models are essential for user alignment. You cannot align outputs to user needs if you do not know what those needs are. And user needs are not static, they change with context, expertise growth, and evolving goals. The alignment target moves with every interaction.
Safety alignment is a problem you can solve once (at the model level) and deploy everywhere. User alignment is a problem you solve continuously (at the user level) and it never stops evolving.
Trade-offs
Pursuing user alignment alongside safety alignment adds complexity.
Measurement is subjective. Safety alignment has relatively clear criteria (is it harmful?). User alignment criteria are user-specific and context-dependent. Building evaluation frameworks for something this subjective is harder and more expensive than building safety classifiers.
False personalization can reduce trust. If the system attempts to personalize and gets it wrong, delivering a beginner explanation to an expert, the misalignment is worse than a generic response. Users tolerate generic. They do not tolerate wrong assumptions about who they are.
Resource allocation tension. Every engineering hour spent on user alignment is an hour not spent on safety alignment (or features, or performance). Arguing for user alignment investment when safety concerns exist is a hard organizational sell, even though user alignment affects more users more frequently.
User alignment can conflict with safety alignment. A user who wants unfiltered, maximally direct responses may find safety guardrails misaligning with their preferences. Balancing user preferences with safety constraints requires nuanced policy decisions.
What to Do Next
-
Categorize your complaints. Take your last 100 support tickets or churn exit surveys and classify each as safety alignment (harmful, biased, incorrect) or user alignment (wrong depth, tone, relevance, repetitive). The ratio will tell you where to invest. Most teams are shocked to find that user alignment issues outnumber safety issues 20:1.
-
Add alignment dimensions to your evaluation. Beyond accuracy and safety, measure depth match, tone match, relevance, and temporal awareness for a sample of outputs. Clarity’s alignment scoring quantifies user alignment automatically through self-models. Track these dimensions weekly. They will correlate with retention more strongly than any accuracy metric.
-
Start with depth alignment. The highest-impact user alignment dimension is usually depth, matching the response complexity to the user’s expertise. Even a simple three-tier system (beginner, intermediate, expert) that adjusts response depth dramatically improves user alignment scores.
Safety alignment prevents harm. User alignment creates value. Solve both.
References
- Product vs. Feature Teams
- only 1 in 26 unhappy customers actually complains
- not a reliable predictor of customer retention
- sampling bias, non-response bias, cultural bias, and questionnaire bias
- Qualtrics notes in their churn prediction framework
Related
Building AI that needs to understand its users?
What did this article change about what you believe?
Select your beliefs
After reading this, which resonate with you?
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →