Context Engineering Needs a User Layer
Context engineering has become the dominant paradigm for building with LLMs. But the entire discipline is missing a critical layer: persistent, evolving understanding of the human on the other end.
TL;DR
- Context engineering has solved the knowledge problem. RAG, tool use, and system prompts give AI systems access to the right information at the right time
- But context engineering has completely ignored the user problem. No mainstream approach includes persistent, evolving understanding of the human receiving the output
- Adding a user layer (self-models) to existing context pipelines increases satisfaction 30-40% with minimal changes to the underlying infrastructure
Context engineering needs a user layer because RAG, tool use, and system prompts solve the knowledge problem but completely ignore the person receiving the output. The result is technically correct responses that miss the mark for each individual user, treating a senior engineer and a junior analyst identically despite radically different needs. This post covers the missing layer in the context engineering stack, how adding self-model context at retrieval and generation stages increased satisfaction 34% with zero RAG changes, and the three integration points for any existing pipeline.
The Missing Layer
Let me map the current context engineering stack:
System prompt: Sets the AI’s role, tone, and constraints. Static. Same for every user.
RAG/retrieval: Provides domain knowledge, documents, data, examples. Dynamic based on the query. Blind to the querier.
Tool use: Gives the AI capabilities, search, compute, API calls. Triggered by intent detection. Does not know who has the intent.
Memory/conversation history: Records what was said in this session. Ephemeral. Resets between sessions.
Evaluation/guardrails: Checks outputs against safety and quality criteria. Applied uniformly. Does not adapt to user context.
Notice what is missing? Every layer operates on the query or the domain. No layer operates on the user. The system knows what was asked and where to find the answer. It does not know who is asking or how to deliver the answer in a way that serves them.
This is the user layer. And its absence explains why so many technically excellent AI products feel generically impersonal.
Context Engineering Without User Layer
- ×RAG retrieves based on query semantics only
- ×Same response style regardless of user expertise
- ×System prompt is static across all users
- ×No memory of user across sessions
Context Engineering With User Layer
- ✓RAG retrieval filtered and ranked by user context
- ✓Response depth and tone adapted to user self-model
- ✓System prompt augmented with user-specific beliefs
- ✓Persistent user understanding that evolves over time
What a User Layer Looks Like
The user layer sits alongside your existing context pipeline, not replacing it. It provides an additional context source: structured understanding of the person making the request.
At the retrieval stage, the user layer influences what gets retrieved. A senior engineer asking about Kubernetes networking does not need the “What is Kubernetes” document. A beginner does. The same query should retrieve different documents, or at minimum, rank them differently, based on who is asking.
At the generation stage, the user layer influences how the response is constructed. The expertise level determines depth. The communication preference determines tone. The stated goals determine what to emphasize. The historical context determines what to skip (“you already know this from our last conversation”).
At the evaluation stage, the user layer influences what counts as a good response. An exhaustive technical deep-dive is a great response for the user who wants depth and a terrible response for the user who wants a quick answer.
1// Standard context engineering pipeline← Knowledge layer2const documents = await rag.retrieve(query, { topK: 10 });3const reranked = await reranker.rank(documents, query);45// Add the user layer← Understanding layer6const selfModel = await clarity.getSelfModel(userId);78// User-aware retrieval: filter by expertise← Same query, different docs9const userFiltered = await rag.retrieve(query, {10topK: 10,11expertiseFilter: selfModel.beliefs.expertise_level,12excludeKnown: selfModel.beliefs.familiar_topics13});1415// User-aware generation← Same knowledge, different delivery16const response = await llm.generate({17context: reranked,18userContext: selfModel.toPromptContext(),19// Automatically adapts depth, tone, and emphasis20});
The Prototype Experiment
I wanted to prove this was not just theory. We took an existing RAG pipeline at an enterprise AI company, 15 data sources, semantic chunking, BM25+vector hybrid retrieval, cross-encoder reranking. Sophisticated, well-tuned infrastructure.
We added a user layer on top. No changes to the RAG pipeline itself. We just injected self-model context at two points: retrieval reranking and response generation.
The self-model tracked three things per user: expertise level in the domain (beginner, intermediate, expert), communication preference (concise vs. detailed, technical vs. conceptual), and primary use case (learning, decision-making, implementation).
Same knowledge base. Same retrieval pipeline. Same LLM. The only difference was that the system now knew who it was talking to.
User satisfaction scores increased 34%. Time-to-resolution decreased 22%. And the most telling metric: users who received user-layer-enhanced responses were 45% less likely to rephrase their query (a signal that the first response was closer to what they actually needed).
Why RAG Alone Is Not Enough
RAG solved the knowledge freshness problem. Your AI system can access up-to-date information from your documents, databases, and APIs. But knowledge freshness is orthogonal to user understanding.
Imagine two users asking the same question about your product’s API authentication. User A is a senior security engineer evaluating your product for compliance. User B is a junior developer building their first integration. They need radically different responses:
| Dimension | User A (Security Engineer) | User B (Junior Developer) |
|---|---|---|
| Depth | Full auth flow with threat model | Step-by-step setup guide |
| Emphasis | Token rotation, scope limitations, audit logs | Getting a working API key quickly |
| Tone | Peer-to-peer technical discussion | Instructional and encouraging |
| Assumed knowledge | OAuth2, JWT, RBAC | Basic HTTP, maybe heard of OAuth |
| Links and references | RFC specs, OWASP guidelines | Getting started tutorial, code samples |
RAG can retrieve the right documents for both. But without a user layer, the system cannot deliver the right response for each. It will split the difference, too basic for User A, too technical for User B, or default to one style that serves neither well.
The Three Integration Points
Adding a user layer to your existing context pipeline happens at three specific points. You do not need to rebuild your infrastructure. You need to add a new signal at each point.
Point 1: Retrieval. Before or after your retrieval step, filter or rerank results based on user context. Exclude documents that are below the user’s expertise level. Promote documents that match the user’s stated goals. This is the simplest integration point and delivers immediate impact.
Point 2: Generation. Include user context in the generation prompt alongside your retrieved documents. The self-model provides structured context, expertise level, communication preferences, goals, that the LLM uses to calibrate its response. No fine-tuning required. Just better prompting.
Point 3: Evaluation. When measuring response quality, include user-specific criteria. A response that is too detailed for one user might be perfect for another. Your evaluation framework should account for alignment with user context, not just factual accuracy.
Trade-offs
Adding a user layer to context engineering is not free.
Cold start requires bootstrapping. New users have thin self-models. Your system needs a graceful degradation path: use whatever user context is available, fall back to population defaults when the model is thin, and improve as the self-model matures. This adds complexity to your generation pipeline.
User models can be wrong. If the self-model incorrectly classifies a senior engineer as a beginner, the system will deliver patronizing explanations. You need confidence thresholds, correction mechanisms, and the humility to default to a neutral response when confidence is low.
Latency increases. Fetching self-model context adds a retrieval step. In our experiments, this added 30-50ms. For most applications, this is negligible. For latency-sensitive real-time systems, it requires optimization, caching, preloading, or async enrichment.
Privacy implications are real. A user layer that tracks expertise, preferences, and goals creates a profile. Users must understand what is tracked, why, and how to modify or delete it. Consent architecture must be built into the user layer from day one.
What to Do Next
-
Map your current context pipeline. Draw every source of context in your current architecture, system prompt, RAG sources, conversation history, tool outputs. For each source, ask: does this know anything about the user? You will likely find that the answer is “no” for every source. That gap is your opportunity.
-
Start with generation-stage injection. The lowest-effort, highest-impact integration point is adding user context to your generation prompt. Even a simple prefix like “The user is an expert in X who prefers concise responses” dramatically improves output relevance. Clarity provides the self-model that generates this context automatically.
-
Measure alignment, not just accuracy. Add a new evaluation dimension: “Was this response well-suited for this specific user?” Track it alongside your existing accuracy metrics. The delta between accuracy and alignment is the value the user layer captures.
Your context pipeline knows everything about the domain and nothing about the user. Add the missing layer.
References
- estimates that personalized customer experiences can improve satisfaction by 15-20%
- commoditized significantly
- 2025 industry review
- Research from MIT
- dynamic user profiling
Related
Building AI that needs to understand its users?
What did this article change about what you believe?
Select your beliefs
After reading this, which resonate with you?
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →