The Feedback Loop That Compounds
Most AI products collect feedback they never use. Self-models turn every interaction into a compounding asset that makes personalization better over time.
TL;DR
- Most AI products collect feedback that sits in a database table and never reaches the model serving users
- Self-models create a closed loop: every interaction updates the user’s belief structure, which immediately changes the next interaction
- The compounding effect means products with self-models get exponentially better per user over time, not just linearly better
AI feedback loops that compound require closing the loop between user signals and the model serving that same user within the same session. Most AI products collect feedback into analytics tables that never reach the individual user’s experience, creating a graveyard of unused signal. This post covers the anatomy of a compounding feedback loop, how self-models enable structured belief updates from both explicit and implicit signals, and why behavioral tracking decays while belief-level modeling appreciates.
The Feedback Graveyard
Every AI product has a feedback mechanism. ChatGPT has thumbs up and thumbs down. Notion AI has “Was this helpful?” buttons. GitHub Copilot has accept and reject on suggestions. These mechanisms feel productive. They give users a sense of agency. Product teams point to the data in dashboards.
But follow the data. Where does a thumbs-down on a ChatGPT response actually go? Into a RLHF training pipeline that affects the global model months later, averaged across millions of users. Your specific thumbs-down, the one that meant “too verbose for my taste,” gets diluted into a gradient update that slightly adjusts the model’s overall verbosity for everyone.
Your feedback improved the average experience by 0.00001%. It did nothing for your experience.
This is the feedback graveyard: signals collected with good intentions, stored in tables that grow forever, occasionally batch-processed into aggregate insights that optimize the median user experience. The individual user, the person who took the time to click that button, gets nothing back.
| Feedback Pattern | Where Signal Goes | Impact on Individual User | Impact Timeline |
|---|---|---|---|
| Thumbs up/down | RLHF training batch | None directly | Months (global model update) |
| Star ratings | Aggregate analytics | None directly | Never (used for reporting) |
| Usage analytics | Product dashboards | Indirect (feature prioritization) | Quarters |
| Self-model update | User belief structure | Immediate and compounding | Next interaction |
Anatomy of a Compounding Loop
A feedback loop that compounds has four properties that distinguish it from feedback collection.
Closed
The signal from the user reaches the model that serves that same user. Not a different model. Not an aggregate. Same model, same user, same context.
Immediate
The update happens fast enough that the user experiences the difference. Delayed past the point of user perception means the loop is not closed.
Structured
Feedback interpreted at the belief level, not behavioral. “User believes concise explanations are more useful” generalizes. “User rejected this response” does not.
Composable
Belief updates interact with existing beliefs. “Prefers concise” + “healthcare compliance” + “10 years expertise” composes into “regulatory summaries with citations.”
Closed: The signal from the user reaches the model that serves that same user. Not a different model. Not an aggregate. The same model, the same user, the same context.
Immediate: The update happens fast enough that the user experiences the difference. If I tell your product I prefer concise answers and my next three responses are still verbose, the loop is not closed. It is delayed past the point of user perception.
Structured: The feedback is interpreted at the belief level, not just the behavioral level. “User rejected this response” is behavioral. “User believes concise explanations are more useful than thorough ones in this domain” is structural. The structural interpretation generalizes. The behavioral one does not.
Composable: Each belief update interacts with existing beliefs to produce emergent understanding. “Prefers concise” plus “works in healthcare compliance” plus “has 10 years of domain expertise” composes into “give regulatory summaries with citations, skip the background explanations.” That composition is not something you could derive from any single feedback signal.
Feedback Collection (No Compound)
- ×Thumbs up/down stored in analytics table
- ×Batch-processed into global model monthly
- ×Individual user experience unchanged
- ×Signal decays: yesterday's feedback is stale
Feedback Loop (Compounding)
- ✓Every interaction updates user belief structure
- ✓Next response immediately reflects the update
- ✓Individual experience improves with each use
- ✓Signal compounds: each belief adds context to all others
Building the Loop With Self-Models
The self-model is the data structure that makes compounding possible. It is not a feature preference store (those are flat and do not compose). It is not a behavioral log (those are temporal and decay). It is a structured representation of beliefs that interact.
Here is what the loop looks like in practice.
1// 1. User interacts with personalized output← Start of loop2const response = await generateWithSelfModel(userId, prompt);← Model-informed response34// 2. User provides signal (explicit or implicit)← Feedback capture5const signal = { action: 'edited_response', diff: edits };← What user changed67// 3. Signal interpreted at belief level← Structural interpretation8const beliefUpdate = await clarity.interpretSignal(userId, signal);← Infers belief shift9// => { belief: 'prefers_active_voice', confidence: 0.82 }← Not just 'rejected'1011// 4. Self-model updated, composes with existing beliefs← Closes the loop12await clarity.updateSelfModel(userId, beliefUpdate);← Immediate persistence13// Next response will reflect this + all prior beliefs← Compounding begins
The critical step is step 3: signal interpretation. When a user edits a response to shorten it, the naive interpretation is “user did not like that response.” The structural interpretation is “user believes shorter responses are more appropriate in this context.” The structural interpretation transfers to future contexts. The naive one does not.
Why Behavioral Tracking Decays
Behavioral data has a half-life. What a user clicked yesterday is less predictive than what they clicked today. Session-level patterns rarely persist across weeks. This is because behavior is contextual. What someone does depends on their current task, mood, time pressure, and a dozen other transient factors.
Beliefs are different. “I prefer concise communication” is true today and will likely be true next month. “I believe that data-driven decisions are more reliable than intuition” is a stable preference that applies across hundreds of product interactions. Beliefs are the slow-moving variables that explain the fast-moving behavioral data.
This is why products that track behavior hit a ceiling. They are constantly re-learning the user because they are tracking symptoms rather than causes. Products that model beliefs build an asset that appreciates rather than depreciates.
Behavioral Tracking: Depreciates
Half-life measured in days. Session-level patterns rarely persist across weeks. Contextual signals (task, mood, time pressure) create noise. The model is constantly re-learning.
Belief Modeling: Appreciates
Half-life measured in months. Stable preferences apply across hundreds of interactions. Beliefs are the slow-moving variables that explain fast-moving behavior. The asset grows.
Consider two products serving the same user for six months:
Product A tracks every click, page view, and hover. After six months, it has millions of behavioral data points. But the user’s recent behavior has diverged from their early behavior (they changed roles, learned new skills, shifted priorities). Product A’s model is confused. Half its data says one thing, half says another.
Product B models beliefs. After six months, it has 40-50 stable beliefs with confidence scores. Some beliefs have been updated as the user grew. The model is not confused. It has a coherent understanding that evolved alongside the user.
The Math of Compounding
Linear feedback improves the experience by a fixed amount per interaction. If each feedback signal improves relevance by 0.1%, then after 1,000 interactions you are 100% better. That sounds decent until you realize that most products deliver thousands of interactions per user per month.
Compounding feedback is different. Each belief interacts with existing beliefs, creating emergent understanding. The tenth belief you learn about a user is more valuable than the first because it composes with the previous nine. The hundredth is more valuable still.
Month 1: Incremental
Each belief stands mostly alone. 10 beliefs with low composition value. The experience improves linearly and modestly.
Month 3: Composing
30 beliefs compose into emergent understanding. “Concise” + “healthcare” + “10yr expertise” = regulatory summaries with citations. Each new belief multiplies value.
Month 6+: Insurmountable Gap
50+ stable beliefs with high confidence create an experience that feels indispensable. Users report the product “just gets them.” The gap with linear products is permanent.
This creates an exponential curve in user experience quality. Products with compounding feedback loops become dramatically more valuable per user over time. After three months, the experience gap between a compounding product and a linear one is noticeable. After a year, it is insurmountable.
For product teams, this means the investment in self-models pays off slowly at first and then very quickly. The first month feels incremental. By month six, users are telling support tickets that your product “just gets them.”
Implicit vs Explicit Signals
The richest feedback is often implicit. When a user edits a generated email to change the tone from casual to formal, that is a stronger signal than any thumbs-up button. When a user consistently skips a feature that you surface prominently, that tells you something about their workflow beliefs.
Self-models should consume both explicit and implicit signals, but weight them differently:
Explicit Signals
Thumbs up/down, ratings, stated preferences. High initial confidence, but users often say one thing and do another. Use to seed beliefs, validate with implicit signals.
Implicit Signals
Edits, usage patterns, feature avoidance, time spent. Lower initial confidence individually but extremely reliable in aggregate. Refine belief confidence over time.
Behavioral Contradictions
When explicit and implicit disagree, implicit wins. “I want detailed reports” + skipping to summaries = “values comprehensiveness in theory, prefers summaries in practice.”
- Explicit signals (thumbs up/down, ratings, stated preferences): High initial confidence, but users often say one thing and do another. Use these to seed beliefs, then validate with implicit signals.
- Implicit signals (edits, usage patterns, feature avoidance, time spent): Lower initial confidence individually, but extremely reliable in aggregate. These are the signals that refine belief confidence over time.
- Behavioral contradictions: When explicit and implicit signals disagree, the implicit signal is almost always more accurate. “I want detailed reports” plus consistently skipping to the summary section means the belief should be “values comprehensiveness in theory but prefers summaries in practice.”
Trade-offs and Limitations
Over-fitting to early signals. A self-model that updates aggressively on the first few interactions can lock into beliefs that were contextual. The user was in a hurry for their first three sessions and now the model thinks they always want terse responses. Mitigation: low initial confidence scores that require multiple confirming signals before beliefs stabilize.
Feedback fatigue. If every interaction feels like a training session, users disengage. The loop must be invisible. Users should feel like the product is getting better, not like they are teaching it. The best feedback loops extract signal from natural usage without adding friction.
Transparency tension. Users want to know the product is learning from them, but they do not want to see the machinery. The ideal is a “your preferences” page where users can see and correct their self-model, available but not required.
Composability is hard. Belief composition (combining “prefers concise” with “works in healthcare” to produce “give regulatory summaries”) requires inference, not just lookup. This is where the self-model architecture earns its complexity, and where naive key-value preference stores fail.
What to Do Next
- Map your current feedback data flow: Follow a single thumbs-down click from the UI to its final destination. If it never reaches the model serving that specific user, you have a feedback graveyard.
- Identify three implicit signals you already have: Look for user edits, feature skips, and time-on-task patterns. These are belief signals hiding in your existing analytics.
- Try the compounding loop: Explore our API playground to see how self-model updates work in practice. Build a single closed loop on one feature and measure the difference in user satisfaction over 30 days.
References
- Twilio Segment’s 2024 State of Personalization Report
- 2016 survey of 2,000 Americans by Reelgood and Learndipity Data Insights
- Product vs. Feature Teams
- only 1 in 26 unhappy customers actually complains
- not a reliable predictor of customer retention
Related
Building AI that needs to understand its users?
What did this article change about what you believe?
Select your beliefs
After reading this, which resonate with you?
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →