Two Kinds of AI Alignment You Need
Everyone talks about AI safety alignment. Almost nobody talks about AI user alignment, making sure the AI serves what each individual user actually wants. You need both.
TL;DR
- There are two kinds of AI alignment: safety alignment (ensuring AI does not cause harm) and user alignment (ensuring AI serves each individual user’s goals). The industry has invested billions in the first and almost nothing in the second
- Products that focus exclusively on safety alignment end up safe but generic, frustrating users who need personalized responses that guardrails block without context
- User alignment requires self-models: structured, evolving representations of what each user wants, knows, and values, enabling AI that is both safe and genuinely helpful
AI alignment has two distinct forms: safety alignment that prevents harmful outputs, and user alignment that ensures AI serves each individual’s actual goals. The industry has invested billions in safety alignment through guardrails and content filtering while investing almost nothing in user alignment, producing AI that is safe but generic. This post covers why both forms of alignment are complementary, how self-models provide the user understanding layer that context-aware safety needs, and the organizational challenges of building dual alignment architecture.
Two Alignment Problems
Let me define the terms precisely because the word alignment gets used loosely.
Safety alignment asks: will this AI response cause harm? It is concerned with preventing toxic content, misinformation, bias, privacy violations, and other risks that affect people broadly. Safety alignment operates at the system level. The same guardrails apply to every user.
User alignment asks: does this AI response serve what this specific user wants? It is concerned with understanding individual goals, preferences, context, and values, and ensuring the AI’s output matches them. User alignment operates at the individual level. Different users get different behaviors based on their self-models.
These are not competing objectives. They are complementary layers. A well-designed AI product needs both: safety alignment as the floor (no harmful outputs) and user alignment as the ceiling (maximally helpful outputs for each individual).
The problem is that the industry has treated safety alignment as the entire alignment problem. Billions of dollars have gone into content filtering, red teaming, constitutional AI, RLHF for harmlessness, and automated safety evaluations. These are important investments. They have made AI products meaningfully safer.
But almost no investment has gone into user alignment infrastructure. Products that can tell you whether a response is safe cannot tell you whether it is helpful for the specific human who asked.
Safety Alignment Only
- ×Same guardrails for every user
- ×AI refuses valid professional requests
- ×Optimized to avoid harm, not to help
- ×Users fight the AI to do their job
Safety + User Alignment
- ✓Context-aware guardrails informed by user model
- ✓AI understands professional context and adapts
- ✓Optimized to be safe AND maximally helpful
- ✓Users feel the AI is working with them
Why User Alignment Requires User Models
You cannot align an AI to a user you do not understand.
This seems obvious, but consider how most AI products work. They receive a prompt. They generate a response. They apply safety filters. They return the result. At no point does the system consult a model of who this user is, what they need, or what context they are operating in.
Without a user model, the AI has two modes: generic helpfulness and refusal. It either gives the most broadly acceptable response (safe but generic) or it refuses because the request triggered a safety filter (safe but unhelpful).
With a user model, a self-model that tracks this user’s role, expertise, goals, and communication preferences, the AI gains a third mode: contextually appropriate helpfulness. It can distinguish between a marketing director asking for assertive copy (professional context, should help) and an anonymous user asking for manipulative messaging (no context, should decline).
The user model does not weaken safety alignment. It strengthens it by providing the context that safety decisions need. A guardrail that blocks assertive language for everyone is a blunt instrument. A guardrail that considers the user’s professional context and intent is a precise one.
The Dual Alignment Architecture
Building both kinds of alignment requires a layered architecture where safety and user alignment work together rather than in tension.
1// Layer 1: User alignment: understand what the user wants← Self-model first2const selfModel = await clarity.getSelfModel(userId);3const intent = await clarity.classifyIntent(selfModel, userMessage);4// { role: 'marketing-director', goal: 'draft-campaign-copy',5// expertise: 'professional', context: 'b2b-saas' }67// Layer 2: Safety alignment: with user context← Informed guardrails8const safetyCheck = await evaluateSafety(userMessage, {9userContext: intent, // marketing pro, not bad actor10contentType: 'professional-copy',11safetyPolicy: 'enterprise-standard'12});1314// Layer 3: Generate with both alignments← Safe AND helpful15const response = await generate(userMessage, {16selfModel, // personalized to this user17safetyBounds: safetyCheck, // within safety limits18tone: selfModel.preferences.communicationStyle19});
The key insight is the order of operations. User alignment comes first: understanding who is asking and why. Safety alignment comes second: evaluating the request in context. Generation comes third: producing an output that is both safe and aligned with the user.
When safety alignment comes first (or operates without user context), it makes decisions in a vacuum. Is this request for marketing copy potentially misleading? Without context, maybe. With the context that a professional marketer is drafting B2B SaaS copy? The risk profile changes entirely.
The Trust Equation
Here is why dual alignment matters for product success: trust.
Users trust AI products that are both safe and helpful. Safe-only products generate a specific kind of frustration (“the AI is protecting me from myself”) that erodes trust as rapidly as harmful outputs would.
When an AI refuses a professional request without understanding the user’s context, the implicit message is: I do not trust you. I do not know you well enough to serve you well. That message, repeated daily, drives users to competitors who offer fewer guardrails (less safe) or better personalization (more aligned).
The products that win long-term will be the ones that solve both alignment problems: safe enough to be trusted and aligned enough to be genuinely useful.
The Organizational Challenge
The reason dual alignment is rare is not technical. It is organizational.
Safety alignment has clear ownership. There are responsible AI teams, trust and safety orgs, compliance functions. They have budgets, headcount, and executive sponsors. Safety alignment is a checkbox on every enterprise RFP.
User alignment has no owner. Product teams think it is a personalization feature. AI teams think it is a recommendation problem. Data teams think it is a profiling challenge. Nobody owns the end-to-end architecture of understanding each user deeply enough to align AI behavior with their individual needs.
Self-models are the missing architectural layer that gives user alignment a home. They provide a structured, maintainable, privacy-respecting representation of each user that can be consumed by any part of the product stack.
The Complementary Relationship
Let me be explicit about something: user alignment does not weaken safety alignment. They are not in tension. A system that understands the user better makes better safety decisions, not worse ones.
Consider: a safety filter that blocks discussion of medications is appropriate for a social media chatbot and inappropriate for a healthcare professional tool. The difference is user context. Without a user model, the safety filter has to use the most restrictive setting for every interaction. With a user model, the safety filter can apply context-appropriate policies.
This is not about lowering safety standards. It is about raising the precision of safety decisions. Fewer false positives (blocking legitimate professional requests) and the same or lower false negatives (allowing genuinely harmful content).
Trade-offs and Limitations
Dual alignment introduces complexity that single-alignment approaches avoid.
User models can be gamed. A bad actor could build a false professional identity to bypass context-aware safety filters. The system needs integrity checks on user model claims: verification mechanisms for professional roles, cross-referencing stated context with observed behavior, and anomaly detection for rapidly changing self-models.
Complexity increases. Maintaining two alignment layers with interactions between them is harder than maintaining one. The safety layer needs to understand user model signals. The user layer needs to respect safety boundaries. The integration testing surface grows significantly.
Privacy tensions. User alignment requires knowing things about the user. Safety alignment requires protecting user privacy. The tension between understanding users deeply and respecting their privacy boundaries requires careful architecture, particularly around what user information is used for safety decisions versus personalization decisions.
Organizational change is hard. Creating a user alignment function requires cross-team coordination between product, AI, data, and trust/safety teams. This organizational change is often harder than the technical implementation.
What to Do Next
-
Map your current alignment architecture. Draw the flow from user request to AI response. Identify where safety checks happen and where user context is (or is not) considered. If user context is absent from safety decisions, you have a single-alignment architecture and your users are likely feeling the friction.
-
Interview 5 users about guardrail friction. Ask users: when has the AI refused or hedged on something you legitimately needed? The patterns in their answers reveal where safety alignment is working against user alignment.
-
Explore self-models for dual alignment. Self-models provide the user understanding layer that context-aware safety needs. They give your safety system the information to make precise decisions instead of broad ones. See how Clarity enables dual alignment architecture.
Safety alignment protects users from AI. User alignment ensures AI serves users. You need both. Build the dual alignment architecture.
References
- not a reliable predictor of customer retention
- sampling bias, non-response bias, cultural bias, and questionnaire bias
- NPS does not correlate with renewal or churn
- Nielsen Norman Group has noted
- Research confirms
Related
Building AI that needs to understand its users?
What did this article change about what you believe?
Select your beliefs
After reading this, which resonate with you?
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →