Context Engineering Needs a User Layer

Context engineering has become the dominant paradigm for building with LLMs. But the entire discipline is missing a critical layer: persistent, evolving understanding of the human on the other end.

Robert Ta's Self-Model CEO & Co-Founder 847 beliefs

· March 6, 2026 · 7 min read

TL;DR

Context engineering has solved the knowledge problem. RAG, tool use, and system prompts give AI systems access to the right information at the right time
But context engineering has completely ignored the user problem. No mainstream approach includes persistent, evolving understanding of the human receiving the output
Adding a user layer (self-models) to existing context pipelines increases satisfaction 30-40% with minimal changes to the underlying infrastructure

Context engineering needs a user layer because RAG, tool use, and system prompts solve the knowledge problem but completely ignore the person receiving the output. The result is technically correct responses that miss the mark for each individual user, treating a senior engineer and a junior analyst identically despite radically different needs. This post covers the missing layer in the context engineering stack, how adding self-model context at retrieval and generation stages increased satisfaction 34% with zero RAG changes, and the three integration points for any existing pipeline.

data sources in the audited RAG pipeline

sources that represented user understanding

satisfaction increase from adding user layer

RAG changes required for the improvement

The Missing Layer

Let me map the current context engineering stack:

System prompt: Sets the AI’s role, tone, and constraints. Static. Same for every user.

RAG/retrieval: Provides domain knowledge, documents, data, examples. Dynamic based on the query. Blind to the querier.

Tool use: Gives the AI capabilities, search, compute, API calls. Triggered by intent detection. Does not know who has the intent.

Memory/conversation history: Records what was said in this session. Ephemeral. Resets between sessions.

Evaluation/guardrails: Checks outputs against safety and quality criteria. Applied uniformly. Does not adapt to user context.

Notice what is missing? Every layer operates on the query or the domain. No layer operates on the user. The system knows what was asked and where to find the answer. It does not know who is asking or how to deliver the answer in a way that serves them.

This is the user layer. And its absence explains why so many technically excellent AI products feel generically impersonal.

Context Engineering Without User Layer

×RAG retrieves based on query semantics only
×Same response style regardless of user expertise
×System prompt is static across all users
×No memory of user across sessions

Context Engineering With User Layer

✓RAG retrieval filtered and ranked by user context
✓Response depth and tone adapted to user self-model
✓System prompt augmented with user-specific beliefs
✓Persistent user understanding that evolves over time

What a User Layer Looks Like

The user layer sits alongside your existing context pipeline, not replacing it. It provides an additional context source: structured understanding of the person making the request.

At the retrieval stage, the user layer influences what gets retrieved. A senior engineer asking about Kubernetes networking does not need the “What is Kubernetes” document. A beginner does. The same query should retrieve different documents, or at minimum, rank them differently, based on who is asking.

At the generation stage, the user layer influences how the response is constructed. The expertise level determines depth. The communication preference determines tone. The stated goals determine what to emphasize. The historical context determines what to skip (“you already know this from our last conversation”).

At the evaluation stage, the user layer influences what counts as a good response. An exhaustive technical deep-dive is a great response for the user who wants depth and a terrible response for the user who wants a quick answer.

user-layer-context.ts

1// Standard context engineering pipeline← Knowledge layer
2const documents = await rag.retrieve(query, { topK: 10 });
3const reranked = await reranker.rank(documents, query);
4
5// Add the user layer← Understanding layer
6const selfModel = await clarity.getSelfModel(userId);
7
8// User-aware retrieval: filter by expertise← Same query, different docs
9const userFiltered = await rag.retrieve(query, {
10  topK: 10,
11  expertiseFilter: selfModel.beliefs.expertise_level,
12  excludeKnown: selfModel.beliefs.familiar_topics
13});
14
15// User-aware generation← Same knowledge, different delivery
16const response = await llm.generate({
17  context: reranked,
18  userContext: selfModel.toPromptContext(),
19  // Automatically adapts depth, tone, and emphasis
20});

The Prototype Experiment

I wanted to prove this was not just theory. We took an existing RAG pipeline at an enterprise AI company, 15 data sources, semantic chunking, BM25+vector hybrid retrieval, cross-encoder reranking. Sophisticated, well-tuned infrastructure.

We added a user layer on top. No changes to the RAG pipeline itself. We just injected self-model context at two points: retrieval reranking and response generation.

The self-model tracked three things per user: expertise level in the domain (beginner, intermediate, expert), communication preference (concise vs. detailed, technical vs. conceptual), and primary use case (learning, decision-making, implementation).

Same knowledge base. Same retrieval pipeline. Same LLM. The only difference was that the system now knew who it was talking to.

User satisfaction scores increased 34%. Time-to-resolution decreased 22%. And the most telling metric: users who received user-layer-enhanced responses were 45% less likely to rephrase their query (a signal that the first response was closer to what they actually needed).

Why RAG Alone Is Not Enough

RAG solved the knowledge freshness problem. Your AI system can access up-to-date information from your documents, databases, and APIs. But knowledge freshness is orthogonal to user understanding.

Imagine two users asking the same question about your product’s API authentication. User A is a senior security engineer evaluating your product for compliance. User B is a junior developer building their first integration. They need radically different responses:

Dimension	User A (Security Engineer)	User B (Junior Developer)
Depth	Full auth flow with threat model	Step-by-step setup guide
Emphasis	Token rotation, scope limitations, audit logs	Getting a working API key quickly
Tone	Peer-to-peer technical discussion	Instructional and encouraging
Assumed knowledge	OAuth2, JWT, RBAC	Basic HTTP, maybe heard of OAuth
Links and references	RFC specs, OWASP guidelines	Getting started tutorial, code samples

RAG can retrieve the right documents for both. But without a user layer, the system cannot deliver the right response for each. It will split the difference, too basic for User A, too technical for User B, or default to one style that serves neither well.

The Three Integration Points

Adding a user layer to your existing context pipeline happens at three specific points. You do not need to rebuild your infrastructure. You need to add a new signal at each point.

Point 1: Retrieval. Before or after your retrieval step, filter or rerank results based on user context. Exclude documents that are below the user’s expertise level. Promote documents that match the user’s stated goals. This is the simplest integration point and delivers immediate impact.

Point 2: Generation. Include user context in the generation prompt alongside your retrieved documents. The self-model provides structured context, expertise level, communication preferences, goals, that the LLM uses to calibrate its response. No fine-tuning required. Just better prompting.

Point 3: Evaluation. When measuring response quality, include user-specific criteria. A response that is too detailed for one user might be perfect for another. Your evaluation framework should account for alignment with user context, not just factual accuracy.

integration points for a user layer

decrease in time-to-resolution

fewer query rephrases with user context

Trade-offs

Adding a user layer to context engineering is not free.

Cold start requires bootstrapping. New users have thin self-models. Your system needs a graceful degradation path: use whatever user context is available, fall back to population defaults when the model is thin, and improve as the self-model matures. This adds complexity to your generation pipeline.

User models can be wrong. If the self-model incorrectly classifies a senior engineer as a beginner, the system will deliver patronizing explanations. You need confidence thresholds, correction mechanisms, and the humility to default to a neutral response when confidence is low.

Latency increases. Fetching self-model context adds a retrieval step. In our experiments, this added 30-50ms. For most applications, this is negligible. For latency-sensitive real-time systems, it requires optimization, caching, preloading, or async enrichment.

Privacy implications are real. A user layer that tracks expertise, preferences, and goals creates a profile. Users must understand what is tracked, why, and how to modify or delete it. Consent architecture must be built into the user layer from day one.

What to Do Next

Map your current context pipeline. Draw every source of context in your current architecture, system prompt, RAG sources, conversation history, tool outputs. For each source, ask: does this know anything about the user? You will likely find that the answer is “no” for every source. That gap is your opportunity.
Start with generation-stage injection. The lowest-effort, highest-impact integration point is adding user context to your generation prompt. Even a simple prefix like “The user is an expert in X who prefers concise responses” dramatically improves output relevance. Clarity provides the self-model that generates this context automatically.
Measure alignment, not just accuracy. Add a new evaluation dimension: “Was this response well-suited for this specific user?” Track it alongside your existing accuracy metrics. The delta between accuracy and alignment is the value the user layer captures.

Your context pipeline knows everything about the domain and nothing about the user. Add the missing layer.

References

Building AI that needs to understand its users?

Talk to us →

◉The Clarity Mirror

What did this article change about what you believe?

Select your beliefs

After reading this, which resonate with you?

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →

The Compound Effect of User Understanding

Every interaction where your product learns about a user makes the next interaction more valuable. Over time, this compounds into a moat that competitors cannot replicate by copying features.

Robert Ta's Self-Model

9 min read