RAG Is Not Personalization

Retrieval-augmented generation gives AI products better answers, not personal ones. True personalization requires self-models that understand the user, not just the query.

Robert Ta's Self-Model CEO & Co-Founder

· November 23, 2025 · 11 min read

TL;DR

RAG optimizes for query relevance: finding the right document. Personalization optimizes for user relevance: presenting information the right way for the right person.
Most AI products conflate these two because RAG makes responses feel more relevant. But relevance to a query and relevance to a person are different problems with different architectures.
Self-models add the missing user layer to RAG, enabling the same retrieved information to be adapted based on who is asking.

RAG is not personalization because it optimizes for query relevance, finding the right document, rather than user relevance, presenting information the right way for each person. Two users asking the same question get identical retrieved documents and identical synthesized responses, regardless of their expertise level, role, or communication preferences. This post covers where RAG stops and personalization starts, the five-level personalization spectrum, and how self-models add the user layer that makes RAG output genuinely personal.

accuracy of frontier LLMs on personalization tasks (PersonaMem benchmark)

0-20%

customer satisfaction lift from AI-powered personalization (McKinsey)

improvement from user-centric RAG agents over baseline (PersonaRAG, SIGIR 2024)

The RAG Architecture

RAG follows a well-understood pipeline, first described by Lewis et al. in 2020 [1]: a user query is embedded, compared against a vector store of documents, and the most semantically similar documents are retrieved and injected into the LLM prompt alongside the query. The LLM synthesizes a response grounded in the retrieved information.

This architecture excels at one thing: answering questions accurately. The original RAG paper demonstrated that combining parametric memory (a pretrained model) with non-parametric memory (a document index) produces “more specific, diverse and factual language” than parametric models alone. For knowledge-base Q&A, document search, and factual queries, RAG is the right tool.

But notice what RAG knows about the user: nothing. The retrieval step is query-based, not user-based. The synthesis step is prompt-based, not preference-based. The entire pipeline from query to response is agnostic to who is asking.

Query: “How do I set up SSO integration?”

CEO receives: detailed technical implementation steps with API code examples
The CEO wanted: high-level timeline and resource requirements for their team
Junior developer receives: the same detailed technical implementation steps
The developer wanted: exactly this, plus gotchas for their specific framework

Same query. Same retrieval. Same response. Two completely different user needs, both unserved.

RAG Pipeline Stage	What It Knows	What It Misses
Embedding	Query semantics	User context, expertise level, goals
Retrieval	Document relevance to query	Document relevance to user
Synthesis	How to combine sources	How to frame for this specific person
Response	Accurate answer	Personalized answer

The Personalization Layer RAG Lacks

True personalization requires answering a question RAG never asks: who is this person, and what do they need the answer to look like?

A comprehensive survey of personalization from RAG to agents [2] (Li et al., 2025) systematically examines how personalization must operate across all three core stages of RAG: pre-retrieval, retrieval, and generation. The key insight is that personalization requires “customized interactions that align with individual user preferences, contexts, and goals,” something standard RAG pipelines are not designed to provide.

The same information, retrieved by the same RAG pipeline from the same documents, should be presented differently depending on:

Expertise level. A domain expert needs precise, jargon-rich, concise answers. A newcomer needs explanations of foundational concepts, examples, and gentler vocabulary. RAG retrieves the same content for both. Research on dynamic user profiling [3] (Jiang et al., 2025) shows that even frontier LLMs like GPT-4.5 and Gemini-2.0 achieve only around 50% accuracy when tasked with tracking evolving user preferences and generating personalized responses accordingly.

Role and context. A technical implementer needs code examples, API references, and edge cases. A decision-maker needs cost implications, timelines, and risk assessments. Same source material, completely different framing.

Communication style. Some users prefer structured, step-by-step responses. Others prefer conceptual overviews with links for deeper exploration. Some want everything in bullet points. Some want narrative explanation.

Current goals. A user evaluating your product needs different emphasis than a user implementing it. A user troubleshooting needs different context than a user exploring. The same retrieved information supports all of these, but the synthesis should be goal-aware.

RAG Without User Layer

×Same query produces same response for everyone
×Expertise level ignored - beginners and experts get identical framing
×Communication preferences unknown - one-size-fits-all formatting
×User goals invisible - same emphasis regardless of what they need

RAG With Self-Model Layer

✓Same query produces user-adapted responses
✓Expertise level shapes vocabulary, depth, and assumed knowledge
✓Communication style matches user preference - bullets, narrative, or code-first
✓User goals inform emphasis - evaluation gets ROI, implementation gets code

Augmenting RAG with Self-Models

The architectural fix is not to replace RAG but to add a user-understanding layer. Self-models sit between the retrieval stage and the synthesis stage, providing user context that shapes how retrieved information is presented. This approach aligns with what researchers call PersonaRAG [4] (Zerhoudi & Granitzer, SIGIR 2024), a framework that incorporates user-centric agents to adapt both retrieval and generation based on real-time user data. Their results show measurable accuracy improvements over baseline RAG by tailoring responses to individual user profiles.

user-augmented-rag.ts

1// Standard RAG: query-only context← No user awareness
2const docs = await vectorStore.query(userQuery, { topK: 5 });
3const response = await llm.generate({ query: userQuery, context: docs });
4
5// User-Augmented RAG: query + user context← Self-model layer
6const docs = await vectorStore.query(userQuery, { topK: 5 });
7const model = await clarity.getSelfModel(userId);
8const userContext = {
9  expertise: model.getBeliefs({ context: 'expertise_level' }),
10  goals: model.getActiveGoals(),
11  style: model.getBeliefs({ context: 'communication_preference' })
12};
13const response = await llm.generate({
14  query: userQuery,
15  context: docs,
16  userModel: userContext  // Same docs, personalized synthesis
17});

The self-model adds three capabilities to the RAG pipeline:

User-aware retrieval. The self-model can influence retrieval by augmenting the query with user context. A query from a technical user can retrieve more detailed documents. A query from a business user can retrieve higher-level overviews. The same vector store, filtered or re-ranked by user model.

Personalized synthesis. The LLM prompt includes structured user context (expertise level, goals, preferences) that shapes how it synthesizes retrieved documents into a response. Same source material, different framing. The field of user modeling [5] has evolved from simple stereotype-based approaches in the 1970s to modern deep-learning methods using attention mechanisms, graph neural networks, and transformers to capture complex user behaviors (Purificato et al., 2024).

Adaptive depth. The self-model tracks the user’s evolving expertise. Early interactions provide more foundational context. Later interactions assume shared knowledge and go deeper. The product gets smarter about each user over time, not just about the content. This is where the PersonaMem benchmark reveals a key challenge: current LLMs achieve 60-70% accuracy on recalling static user facts but drop to 30-50% when incorporating a user’s latest preferences [6]. External memory modules (like self-models) significantly improve accuracy for both tasks.

User-Aware Retrieval

The self-model augments queries with user context. Technical users retrieve detailed documents. Business users retrieve higher-level overviews. Same vector store, filtered by user model.

Personalized Synthesis

The LLM prompt includes structured user context (expertise, goals, preferences) that shapes how it synthesizes retrieved documents. Same source material, different framing per user.

Adaptive Depth

Early interactions provide foundational context. Later interactions assume shared knowledge and go deeper. The product gets smarter about each user over time, not just about the content.

The Personalization Spectrum

It helps to think about personalization as a spectrum rather than a binary:

Level 0: Generic. Every user gets the same response. No personalization at all. This is an AI product without RAG or user models.

Level 1: Query-adapted. Responses are adapted based on what the user asked. RAG operates at this level. The retrieved documents depend on the query, so the response varies by query. But it does not vary by user.

Level 2: Segment-adapted. Responses are adapted based on the user’s segment (enterprise vs. SMB, technical vs. business, new vs. returning). This requires segment classification but not individual understanding.

Level 3: User-adapted. Responses are adapted based on the individual user’s beliefs, goals, expertise, and preferences. This requires a self-model. Two users asking the same question get different responses because they are different people with different needs.

Level 4: Trajectory-adapted. Responses are adapted based not just on who the user is now but on where they are heading. The product anticipates the next question, the next need, the next evolution in the user’s journey.

Most products with RAG are at Level 1 and believe they are at Level 3. The gap between these levels is the gap between “our AI gives relevant answers” and “our AI understands me.”

Level 0: Generic

Every user gets the same response. No personalization at all. An AI product without RAG or user models.

Level 1: Query-Adapted

Responses adapted based on what the user asked. RAG operates here. Retrieved documents depend on the query, so the response varies by query but not by user.

Level 2: Segment-Adapted

Responses adapted based on user segment (enterprise vs. SMB, technical vs. business). Requires segment classification but not individual understanding.

Level 3: User-Adapted

Responses adapted to the individual user’s beliefs, goals, expertise, and preferences. Requires a self-model. Two users asking the same question get different responses.

Level 4: Trajectory-Adapted

Responses adapted based on where the user is heading. The product anticipates the next question, the next need, the next evolution in the user’s journey.

Why This Matters Now

The RAG architecture has matured rapidly. Vector databases have commoditized significantly [7], with vector search becoming a checkbox feature in major cloud data platforms rather than a standalone differentiator. Retrieval patterns are well-understood. A 2025 industry review [8] captures the current state well: enterprises “cannot live without RAG, yet remain unsatisfied.” The field is shifting from pursuing the smartest model toward building the richest, most accurate, most user-aware context.

The next wave of competitive differentiation in AI products will likely come not from better retrieval but from better understanding of the person on the other end. McKinsey research on AI-powered personalization estimates that personalized customer experiences can improve satisfaction by 15-20% [9] and increase revenue by 5-8%.

Products that add user-understanding layers on top of RAG will feel fundamentally different from those that treat every user identically. The difference between a product that finds the right answer and a product that gives you the right answer for you is the difference between a search engine and an advisor.

The Personalization Equation

RAG alone: Right information, generic delivery

RAG + Self-Models: Right information, personal delivery

Accuracy is table stakes. Personalization is the moat.

Case Study: The Same Question, Three Users

Consider a concrete example. Three users ask the same question of an AI product built on RAG: “How should I think about data security for this integration?”

User A is a CISO evaluating the product for an enterprise deployment. They need to understand the security architecture in depth: encryption at rest, in transit, key management, SOC 2 compliance, and incident response procedures.

User B is a developer implementing the integration. They need to know which authentication protocols to use, how to handle API keys securely, and what security headers to set.

User C is a product manager building a business case. They need a summary of security capabilities to include in a vendor assessment document.

RAG retrieves the same security documentation for all three. The LLM synthesizes a response that tries to be comprehensive, touching on architecture, implementation, and compliance. The result is too deep for User C, too shallow for User A, and incorrectly scoped for User B.

A self-model-augmented system knows User A’s role, expertise, and evaluation context. It knows User B is in implementation mode with a specific tech stack. It knows User C needs executive-friendly summaries. The same retrieved documents produce three completely different responses, each one genuinely useful for the person receiving it.

This is the difference between relevance and personalization. All three responses from RAG alone are relevant to the query. None of them are personal to the user.

User A: CISO

Needs encryption at rest, in transit, key management, SOC 2 compliance, and incident response procedures. With a self-model: gets security architecture deep-dive with compliance mapping.

User B: Developer

Needs authentication protocols, API key handling, and security headers. With a self-model: gets code-first implementation guide with framework-specific examples.

User C: Product Manager

Needs a summary for a vendor assessment document. With a self-model: gets executive-friendly security capabilities overview with comparison table.

The Product Implications

Treating RAG as personalization has specific product consequences that teams often discover too late:

Engagement plateau. Users initially value accurate answers from RAG. But once accuracy becomes expected, engagement plateaus because the product never deepens its relationship with the user. Every interaction feels like the first one, just with better search.

Power user alienation. The most sophisticated users are the most affected by generic responses. They know what they need and find it frustrating when the AI cannot adapt to their level. They are also the most valuable users and the first to leave for a product that understands them.

Support burden. When the AI cannot personalize, users route complex or context-dependent questions to human support. This creates a hidden cost center that grows with the user base because the AI never learns to handle nuanced, user-specific queries.

Trade-offs and Limitations

Adding a self-model layer to RAG is not free, and there are legitimate cases where RAG alone is sufficient.

Added complexity. User-augmented RAG has more moving parts than standard RAG. The self-model service, observation pipeline, and prompt engineering for user context add development and operational overhead. For simple Q&A products with homogeneous users, this overhead may not be justified.

Cold start for new users. The self-model needs observations to build understanding. For new users, the system falls back to standard RAG until sufficient context accumulates. The cold start period needs graceful degradation: progressive personalization rather than a binary switch.

Prompt budget competition. Adding user context to the LLM prompt competes for tokens with retrieved documents. More user context means fewer retrieved chunks (or a longer, more expensive prompt). Finding the right balance requires experimentation with your specific use case.

Sycophancy risk. Personalization introduces a real danger worth noting. Research from MIT [10] found that LLMs with access to user profiles become significantly more sycophantic over extended interactions, mirroring users’ viewpoints rather than providing accurate information. Any user-modeling layer must be designed to inform response framing, not to tell the model what the user wants to hear.

Not all queries need personalization. Factual questions with single correct answers (“What is the API rate limit?”) do not benefit from personalization. The self-model layer should be context-aware enough to add user modeling only when it improves the response.

What to Do Next

Test the two-user question. Pick your five most common user queries and generate responses for two different user personas: a technical expert and a business leader. If the responses are identical, your product has a personalization gap. Measure how much the response should differ based on who is asking.
Instrument user signals. Start collecting the signals that would inform personalization: expertise indicators, communication preferences, stated goals, and role context. Even before building a self-model, this data reveals how much variation exists in your user base and how much personalization is possible.
Prototype the user layer. Take your existing RAG pipeline and add a simple user context injection, even if it is manually curated for a test group. Compare satisfaction scores, task completion rates, and engagement metrics between the personalized group and the standard RAG group. Explore Clarity’s self-model API to build the user layer.

RAG answers the question. Self-models answer the person. Add the missing layer.

References

Building AI that needs to understand its users?

Talk to us →

◉The Clarity Mirror

What did this article change about what you believe?

Select your beliefs

After reading this, which resonate with you?

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →

RAG Pipelines Need a User Layer

RAG retrieves the same chunks for every user. Adding a self-model user layer means retrieval is filtered and ranked by expertise, goals, and knowledge gaps.

Robert Ta's Self-Model

12 min read