Scaling Agents from 1 to 50 Without Losing Context

Multi-agent systems lose user context at handoff boundaries. Self-models provide a shared understanding layer that scales agent architectures without context loss.

Robert Ta's Self-Model CEO & Co-Founder

· November 21, 2025 · 9 min read

TL;DR

Multi-agent systems lose 30-40 percent of user context at each handoff boundary because agents share transcripts, not understanding
The standard fixes (larger windows, better retrieval, shared databases) treat the symptom without addressing the structural problem
Self-models create a shared understanding layer that every agent reads from and contributes to, enabling coherent personalization at scale

Scaling agents from one to fifty without losing context requires a shared understanding layer, because multi-agent systems lose 30-40 percent of user context at each handoff boundary. Standard fixes like larger context windows and shared databases treat the symptom without addressing the structural problem of agents sharing data rather than understanding. This post covers why handoffs destroy context, the N-squared problem of agent scaling, and how self-models create a shared semantic layer that every agent reads from and contributes to.

average context loss per agent handoff

specialized agents in typical enterprise deployment

agents that share user understanding natively

Why Handoffs Destroy Context

When Agent A hands off to Agent B, what gets transferred? In most architectures, the answer is one of three things: a conversation transcript, a structured summary, or nothing.

Transcript handoff. Agent B receives the raw conversation history from Agent A. This preserves information but not understanding. Agent B must re-derive the user’s intent, preferences, and goals from unstructured text,a process that loses nuance and adds latency.

Summary handoff. Agent A generates a summary of the user’s context and passes it to Agent B. This is more efficient but introduces lossy compression. The summary reflects Agent A’s interpretation, which may emphasize different aspects than Agent B needs.

No handoff. Agent B starts fresh. The user repeats themselves. This is more common than most teams admit, especially when agents are built by different teams using different frameworks.

All three approaches share the same fundamental flaw: they transfer data rather than understanding. The difference is the same as handing a new doctor your medical chart versus handing them a deep understanding of your health history, risk factors, and treatment preferences.

Transcript Handoff

Agent B receives raw conversation history. Preserves information but not understanding. Must re-derive intent, preferences, and goals from unstructured text. Adds latency, loses nuance.

Summary Handoff

Agent A generates a summary for Agent B. More efficient but lossy compression. The summary reflects Agent A’s interpretation, which may not match what Agent B needs.

No Handoff

Agent B starts fresh. The user repeats themselves. More common than most teams admit, especially when agents are built by different teams using different frameworks.

Handoff Method	Information Preserved	Understanding Preserved	Latency	Scalability
Full Transcript	High	Near zero	High	Poor
Summary	Moderate	Low	Moderate	Moderate
No Handoff	None	None	None	N/A
Self-Model	Structured	High	Low	Excellent

The N-Squared Problem

The handoff problem compounds as you add agents. With two agents, you have one handoff boundary. With five agents, you have ten potential boundaries. With fifty agents, you have over a thousand.

Each boundary is a point of context loss. Even if each individual handoff preserves 90 percent of context (an optimistic estimate), after four handoffs you have retained only 65 percent. After eight handoffs, 43 percent. The math is merciless.

This is why large-scale agent deployments feel fragmented even when individual agents are excellent. The architecture creates a system where the whole is less than the sum of its parts.

Without Shared Understanding

×Sales agent learns user wants quick ROI,support agent does not know
×Onboarding agent captures user role and team size,product agent starts fresh
×User tells billing agent about budget constraint,upsell agent ignores it
×Each agent independently builds partial, inconsistent user model

With Self-Model Layer

✓Sales agent records quick ROI belief,all agents see it immediately
✓Onboarding captures role and team size,product agent uses it from first interaction
✓Budget constraint recorded once,respected across all agent touchpoints
✓Single evolving user model, contributed to by all agents, consistent everywhere

Self-Models as Shared Understanding

A self-model solves the multi-agent context problem by providing a shared layer of user understanding that exists independently of any single agent.

Every agent in the system reads from and writes to the same self-model. When the sales agent discovers that a user is motivated by quick time-to-value, that understanding is immediately available to the support agent, the onboarding agent, and the product agent. Not as a transcript. Not as a summary. As a structured belief with a confidence score.

shared-self-model.ts

1// Sales agent records an observation← Agent A contributes
2await clarity.observe(userId, {
3  belief: 'Motivated by quick time-to-value over feature depth',
4  confidence: 0.82,
5  context: 'purchase_motivation',
6  source: 'sales_agent'
7});
8
9// Support agent queries the shared model← Agent B benefits
10const model = await clarity.getSelfModel(userId);
11const motivation = model.getBeliefs({ context: 'purchase_motivation' });
12// Returns: 'Motivated by quick time-to-value' (confidence: 0.82)
13// Support agent prioritizes fast resolution over comprehensive explanation

The architecture has three key properties that make it scale:

Write-anywhere, read-everywhere. Any agent can contribute observations to the self-model. All agents see the updated understanding. There is no need for point-to-point handoff protocols between every pair of agents.

Conflict resolution. When two agents observe contradictory information about the same user, the self-model has explicit mechanisms for resolution,confidence weighting, temporal precedence, and evidence comparison. This prevents the inconsistency that plagues shared-database approaches.

Source attribution. Every belief in the self-model is tagged with its source agent, enabling reasoning about observation reliability and enabling agents to weight beliefs based on the source’s domain expertise.

Write-Anywhere, Read-Everywhere

Any agent contributes observations to the self-model. All agents see updated understanding. No point-to-point handoff protocols between every pair of agents.

Conflict Resolution

When two agents observe contradictory information, the self-model resolves via confidence weighting, temporal precedence, and evidence comparison.

Source Attribution

Every belief is tagged with its source agent. Enables reasoning about observation reliability and weighting beliefs based on domain expertise.

The Onboarding Advantage

One of the most immediate benefits of shared self-models in multi-agent systems is the onboarding experience for new agents.

When you add a new specialized agent to your system,say, a financial planning agent alongside your existing general assistant and support agents,the new agent does not start from zero. It reads the same self-model that every other agent contributes to. From its very first interaction, the financial planning agent knows the user’s communication preferences, their stated goals, their expertise level, and their history with the product.

Without shared self-models, every new agent starts cold. The user has to rebuild context with each new agent, explaining the same background, restating the same preferences, and re-establishing the same rapport. With shared self-models, new agents feel like they are already part of the team.

This dramatically reduces the friction of scaling agent capabilities. New agents can be deployed and immediately provide value because they inherit the collective understanding of all previous agents. The user experience of a multi-agent system feels seamless rather than fragmented.

Architecture Patterns for Scale

There are three deployment patterns for self-model-based multi-agent systems, each suited to different scale requirements:

Pattern 1: Hub and Spoke. All agents connect to a centralized self-model service. Simple to implement, easy to reason about, single point of consistency. Best for deployments up to 20 agents.

Pattern 2: Federated. Domain-specific self-model instances with synchronization. Each domain cluster has low-latency access to its primary beliefs while maintaining eventual consistency with the global model. Best for 20-100 agents across multiple domains.

Pattern 3: Hierarchical. Nested self-models at user, team, and organization levels. Individual agents contribute to user-level models, which aggregate into team-level patterns, which roll up to organizational understanding. Best for enterprise deployments with complex organizational structures.

Agent Scale Architecture

1-20 Agents: Hub and Spoke

20-100 Agents: Federated Self-Models

100+ Agents: Hierarchical Models

The pattern changes. The principle does not: shared understanding, not shared transcripts.

The Cost of Context Fragmentation

Context fragmentation has measurable business impact that most teams underestimate because it is distributed across multiple symptoms.

Repeated information requests. When agents lack shared context, users repeat themselves. Every repeated explanation is a friction event that degrades satisfaction. In enterprise deployments, we see users spend 15-20 percent of their agent interaction time re-establishing context that a previous agent already had. Multiply that across thousands of users and the wasted time is substantial.

Contradictory recommendations. When different agents hold different models of the same user, they make contradictory recommendations. The sales agent suggests the enterprise plan because it captured the user’s team size. The product agent suggests the individual plan because it only saw solo usage patterns. The user loses confidence in the system’s intelligence.

Organizational blind spots. Without a shared understanding layer, no one in the organization has a complete picture of any individual user’s journey. Sales sees their slice. Support sees theirs. Product sees theirs. The user’s actual experience,the full sequence of interactions across all touchpoints,is invisible. Self-models make this complete picture available to every agent and every team.

Debugging difficulty. When a user reports a bad experience, diagnosing the root cause in a multi-agent system without shared context is exceptionally difficult. Which agent had the wrong understanding? Where did context get lost? With self-models, you can trace every belief to its source agent and evidence, making diagnosis straightforward.

Real-World Failure Modes

Working with enterprise teams scaling agent deployments, we see the same failure modes repeatedly:

The Echo Chamber. When agents share transcripts, they tend to reinforce each other’s misunderstandings. If the onboarding agent misinterprets a user’s role, that misinterpretation propagates through summaries to every downstream agent. Self-models with confidence scoring prevent this,low-confidence beliefs are visible as uncertain, not treated as fact.

The Stale Context. Agents that cache user context locally fall out of sync with the latest understanding. A user updates their preferences with the settings agent, but the recommendation agent is still working with last week’s model. Shared self-models with real-time updates eliminate this class of bug entirely.

The Specialization Trap. Each agent team optimizes its own context handling without coordinating with others. The result is incompatible context formats, duplicate storage, and no shared vocabulary for user understanding. A self-model provides that shared vocabulary.

The Attribution Gap. When a multi-agent system produces a bad outcome, which agent is responsible? Without shared context, it is nearly impossible to trace a bad recommendation to the specific context failure that caused it. Self-models with source attribution make this traceable,you can see which agent contributed which belief and when.

The Echo Chamber

Agents sharing transcripts reinforce misunderstandings. One misinterpretation propagates through summaries to every downstream agent. Self-models with confidence scoring surface uncertainty visibly.

The Stale Context

Agents caching user context locally fall out of sync. User updates preferences with one agent, but others still work with last week’s model. Shared self-models with real-time updates eliminate this.

The Specialization Trap

Each agent team optimizes its own context handling without coordination. Incompatible formats, duplicate storage, no shared vocabulary. A self-model provides the shared vocabulary.

The Attribution Gap

Bad outcome in a multi-agent system, impossible to trace the cause. Self-models with source attribution show which agent contributed which belief and when.

Understanding these failure modes is not academic. Each one produces specific, measurable user experience degradation. The echo chamber increases user correction frequency. The stale context creates inconsistent recommendations. The specialization trap forces users to re-establish context with each new agent. The attribution gap prevents the team from improving. All four are symptoms of the same root cause: agents sharing data instead of sharing understanding.

Trade-offs and Limitations

Shared self-models introduce their own architectural considerations.

Latency overhead. Reading from a centralized self-model adds network latency compared to local context access. For real-time agents, this latency (typically 10-50ms) must be accounted for. Caching strategies and read-ahead patterns mitigate this for most use cases.

Write contention. When multiple agents observe the same user simultaneously, write conflicts can occur. The self-model needs a conflict resolution strategy,typically last-write-wins for independent beliefs and confidence-weighted merging for overlapping beliefs.

Model coherence. Different agents may have different standards for what constitutes a meaningful observation. Without governance, the self-model can accumulate low-quality beliefs that degrade rather than enhance understanding. Observation quality scoring and minimum confidence thresholds help maintain coherence.

Migration complexity. Retrofitting self-models onto existing multi-agent architectures requires migrating from heterogeneous context systems to a shared model. This is a non-trivial engineering effort, though the Clarity API is designed to make integration incremental,you can start with one agent and expand.

What to Do Next

Map your handoff boundaries. Diagram every point where user context transfers between agents in your system. For each boundary, document what gets transferred and what gets lost. This map reveals where context fragmentation is worst and where self-models will have the highest impact.
Measure handoff quality. After an agent handoff, test the receiving agent’s understanding of the user. Can it answer basic questions about the user’s goals, preferences, and history? Compare this to the sending agent’s understanding. The delta is your context loss metric.
Start with the highest-traffic handoff. Pick the agent pair with the most frequent handoffs and implement a shared self-model between them. Measure context retention, user satisfaction, and repeat-information rates before and after. The improvement will justify broader adoption. Explore the Clarity API to see shared self-models in action.

One agent knows the user. Fifty agents should too. Self-models make it happen. Scale without losing context.

References

Building AI that needs to understand its users?

Talk to us →

◉The Clarity Mirror

What did this article change about what you believe?

Select your beliefs

After reading this, which resonate with you?

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →

Multi-Agent Architectures: Orchestration Patterns That Work in Production

Supervisor, router, chain, and consensus patterns for multi-agent systems. Failure modes, recovery strategies, and production code.

Robert Ta's Self-Model

14 min read