Skip to main content

Context Layer Problems Nobody Talks About

Your RAG pipeline retrieves documents. Your agent has a context window. But neither understands the user asking the question. The context layer gap that cripples AI personalization.

Robert Ta's Self-Model
Robert Ta's Self-Model CEO & Co-Founder 847 beliefs
· · 7 min read

TL;DR

  • RAG pipelines solve document context but ignore user context, two people asking the same question get the same retrieved chunks and the same answer, regardless of expertise or intent
  • The context layer gap compounds across every downstream system: prompting, evaluation, and personalization all inherit the same blindness to who is asking
  • Self-models add the missing user layer to context engineering, making retrieval user-aware and responses genuinely personalized

Context layer problems in AI systems stem from a fundamental gap: RAG pipelines and context windows handle document retrieval but have zero awareness of the person asking the question. Two users with vastly different expertise levels receive identical responses because the context stack contains no user understanding. This post covers the three context problems most teams ignore, the architecture of the missing user layer, and how to add user context to retrieval, prompting, and evaluation.

0%
of RAG pipelines ignore user context
0x
same question, same answer for all users
0%
of user dissatisfaction from generic responses
0
user awareness in most context layers

Three Context Problems Nobody Discusses

Problem 1: Retrieval Without Identity. Your vector database stores document embeddings. When a user queries, you embed the query and find the nearest documents. The retrieval is content-aware but user-blind.

Consider two people asking how should I structure the API. An experienced architect needs reference documentation and design patterns. A junior developer needs a tutorial with examples and explanations of the concepts. The same query embedding retrieves the same documents for both. The senior architect gets unnecessary hand-holding. The junior developer gets impenetrable reference docs.

The fix is not better embeddings. It is knowing who is asking.

Problem 2: Context Window Competition. Context windows are finite. Every token spent on one thing is a token not spent on another. Right now, your context window is allocated entirely to system instructions and retrieved documents. User understanding gets zero tokens.

This allocation is backwards. A paragraph of user context, their role, expertise level, what they already know, what they are trying to accomplish, would improve response quality more than an additional retrieved chunk. But because there is no user layer in the context stack, that paragraph does not exist.

Problem 3: Evaluation Without Baseline. As Hamel Husain’s eval framework [1] makes clear, good evaluation requires knowing what a good response looks like. But good for whom? Without user context, your eval system treats every response as if it were for a generic user. You cannot measure personalization quality if your evaluation layer does not know who the output was personalized for.

Problem 1: Retrieval Without Identity

Same query embedding retrieves the same documents for a junior developer and a senior architect. Content-aware but user-blind.

Problem 2: Context Window Competition

Context windows allocated entirely to system instructions and documents. User understanding gets zero tokens. The allocation is backwards.

Problem 3: Evaluation Without Baseline

Evaluation treats every response as if it were for a generic user. You cannot measure personalization quality without knowing who the output was for.

Document-Only Context

  • ×RAG retrieves the same chunks for everyone
  • ×Context window filled with documents only
  • ×No user awareness in prompting
  • ×Evaluation grades against generic quality
  • ×Same response regardless of expertise

Document + User Context

  • RAG retrieves user-appropriate chunks
  • Context window includes user understanding
  • Prompts adapt to user expertise and goals
  • Evaluation grades against user-specific quality
  • Responses tailored to the individual

The Architecture of the Missing Layer

The context layer your AI is missing sits between the user and everything else. It answers the question: who is asking?

User context is not a user profile. A profile stores static attributes, name, role, department. User context is dynamic understanding: what does this person know, what are they trying to accomplish, how do they prefer to receive information, and what has changed since their last interaction?

User context is not conversation history. History is a log of what happened. User context is an inference about what the history means. The user asked three questions about API rate limiting does not help the AI as much as this user is debugging a rate limiting issue and has intermediate-level API knowledge.

User context is a self-model. It is a structured representation of the user’s beliefs, knowledge, goals, and preferences, updated continuously based on interactions, continuously refined as the AI learns more about the person.

context-with-user.ts
1// Add the missing user layer to your context stackUser context alongside document context
2const userContext = await clarity.getSelfModel(userId);
3
4// Enrich retrieval with user awarenessSame query, different retrieval
5const chunks = await rag.retrieve(query, {
6 userExpertise: userContext.expertiseLevel,
7 userGoals: userContext.currentGoals,
8 preferredDepth: userContext.communicationPreferences.depth,
9});
10
11// Build the context window with user understanding
12const prompt = buildPrompt({
13 systemInstructions: basePrompt,
14 userContext: userContext.summary, // 200 tokens of user understanding
15 documents: chunks, // Retrieved content
16 query: query,
17});

Why This Problem Compounds

The context layer gap does not just affect responses. It ripples through your entire AI product stack.

Prompting suffers. Without user context, every prompt is generic. You write one system prompt that tries to serve everyone, and serves no one particularly well. With user context, the prompt adapts: adjust technical depth for this user, reference their previous interactions, focus on their stated goals.

Retrieval suffers. Without user context, your retrieval treats every query identically. The embedding similarity that works for one user is suboptimal for another. With user context, retrieval can prioritize chunks that match the user’s knowledge level and current objective.

Evaluation suffers. As Hamel Husain argues, evaluation must be grounded in what matters. Without user context, you cannot measure the dimension that matters most: did the AI understand this person? Your evals measure generic quality when they should measure personal quality.

Personalization suffers. This one is obvious. You cannot personalize without knowing the person. But it is worth saying explicitly: every AI product that claims to be personalized without a user context layer is performing cosmetic personalization, adjusting tone or format, not understanding.

Prompting Suffers

One system prompt tries to serve everyone and serves no one well. User context enables adaptive technical depth, referenced history, and goal-focused framing.

Retrieval Suffers

Every query treated identically. Embedding similarity optimal for one user is suboptimal for another. User context enables knowledge-level-aware retrieval.

Evaluation Suffers

Without user context, evals measure generic quality when they should measure personal quality. “Did the AI understand this person?” is unmeasurable.

Personalization Suffers

Every AI product claiming personalization without a user context layer is performing cosmetic personalization. Adjusting tone or format is not understanding.

The Practical Fix

Adding user context to your context layer is not a moonshot. It is an integration.

Step 1: Build the self-model. Capture user interactions, infer expertise level, track goals and preferences, update continuously. This can start as a structured JSON object with five fields: role, expertise level, primary goals, communication preferences, and domain knowledge.

Step 2: Inject it into retrieval. Modify your retrieval pipeline to consider user context alongside query content. This can be as simple as re-ranking retrieved chunks based on the user’s expertise level, or as sophisticated as embedding user context into the query vector.

Step 3: Add it to the prompt. Allocate 100-200 tokens of your context window to user understanding. This small investment of context window space produces outsized improvements in response quality.

Step 4: Use it in evaluation. Include user context in your eval rubric so that human reviewers and LLM judges can assess whether the response matched the person, not just the question.

Step 1: Build the Self-Model

Start with a structured JSON object: role, expertise level, primary goals, communication preferences, and domain knowledge. Update continuously from interactions.

Step 2: Inject Into Retrieval

Modify your retrieval pipeline to consider user context alongside query content. Re-rank chunks by expertise level or embed user context into the query vector.

Step 3: Add to the Prompt

Allocate 100-200 tokens of your context window to user understanding. This small investment produces outsized improvements in response quality.

Step 4: Use in Evaluation

Include user context in your eval rubric so human reviewers and LLM judges assess whether the response matched the person, not just the question.

Context LayerWhat It KnowsWhat It Misses
System prompt onlyProduct rules and guidelinesUser identity, document relevance
RAG (document context)Relevant content and factsWho needs the content and why
Conversation historyWhat was said beforeWhat it means about the user
Self-model (user context)Who the user is, what they needNothing significant at this level

Trade-offs

User context adds latency. Fetching and incorporating the self-model adds a retrieval step. For most architectures, this adds 50-100ms, noticeable only in latency-critical applications.

User context requires storage. Self-models need to be stored, updated, and retrieved efficiently. This is real infrastructure, not a trivial addition. But it is simpler than most teams expect, a structured document per user, updated on every interaction.

Privacy considerations are real. Storing user understanding raises legitimate privacy questions. Self-models should be transparent to users, editable by users, and compliant with data protection regulations. This is not optional.

Cold start problem exists. New users have no self-model. The system needs graceful degradation, serving generic context for new users while building understanding over time. The first few interactions are the user context equivalent of a cold cache.

What to Do Next

  1. Audit your current context stack. Map every piece of context your AI receives, system prompt, retrieved documents, conversation history. Count the tokens. Notice that user understanding gets zero tokens. That gap is your biggest quality lever.

  2. Add a 5-field user context object. Start simple: role, expertise level, primary goals, communication preferences, domain knowledge. Inject it into your prompt. Even a manually maintained user profile improves response quality dramatically.

  3. Plan for dynamic self-models. The 5-field object is the starting point. The end state is a continuously updated self-model that captures the user’s evolving understanding. Clarity provides the self-model API that makes this infrastructure trivial.


Your RAG pipeline retrieves documents. Your agent manages context. But neither knows the person asking the question. Add the missing layer.

References

  1. Hamel Husain’s eval framework
  2. 2016 survey of 2,000 Americans by Reelgood and Learndipity Data Insights
  3. Scientific American explains
  4. cold start problem
  5. Progress Software describes this core tension well
  6. New America analysis of AI agents and memory

Building AI that needs to understand its users?

Talk to us →
The Clarity Mirror

What did this article change about what you believe?

Select your beliefs

After reading this, which resonate with you?

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

Robert Ta

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →