How to Build AI That Passes the Mom Test

Rob Fitzpatrick wrote The Mom Test for customer conversations. The same principles apply to AI products, your AI needs to understand what users actually need, not just what they say they want.

Robert Ta's Self-Model CEO & Co-Founder 847 beliefs

· February 24, 2026 · 6 min read

TL;DR

The Mom Test principle, people will tell you what you want to hear, not what is true, applies directly to AI products: users configure preferences that do not match their actual behavior, creating a gap that makes AI feel misaligned
Stated preferences diverge from revealed preferences by 40-60 percent in AI products, meaning any personalization built solely on user-configured settings is optimizing for the wrong signal
Self-models solve this by tracking both stated and revealed preferences with separate confidence scores, building a nuanced understanding that captures what users actually need, not just what they say

AI products that pass the Mom Test infer user needs from behavior patterns rather than trusting stated preferences, because users’ configured settings diverge from their actual behavior by 40 to 60 percent. The stated-versus-revealed preference gap means any personalization built solely on onboarding checkboxes is optimizing for the wrong signal. This post covers the five Mom Test principles adapted for AI products, the dual-track preference model that captures both stated and revealed needs, and how self-models resolve the divergence automatically.

divergence between stated and revealed preferences in AI products

of users who selected 'concise' actually wanted more detail

better satisfaction when AI infers needs vs trusting stated preferences

0 days

to build accurate revealed preference model from behavior data

The Stated vs Revealed Gap

In economics, the distinction between stated preferences and revealed preferences is foundational. Stated preferences are what people say they want. Revealed preferences are what their behavior shows they want. And the gap between these two is the space where most market research goes wrong.

AI products have this gap in extreme form because they ask users to state preferences at the exact moment when users are least equipped to provide them, during onboarding, before they have experienced the product, when they have no basis for knowing what they need.

Consider the common onboarding pattern: “What is your experience level?” A user who has been coding for 5 years but just started learning Python might select “intermediate.” The AI then calibrates to intermediate, too advanced for their Python-specific needs, too basic for their general programming knowledge. The user gets frustrated because the AI seems to oscillate between condescending and confusing.

The user did not lie. The question was unanswerable in the format it was asked. Experience is multidimensional. No single checkbox captures it. And yet AI products reduce this complexity to a dropdown and build their entire personalization strategy on the result.

Preference-Based AI (Fails the Mom Test)

×Ask users what they want during onboarding
×Configure AI behavior based on stated preferences
×Users get output that matches what they said, not what they need
×Satisfaction declines as the gap between stated and real needs widens

Behavior-Informed AI (Passes the Mom Test)

✓Observe what users actually do with AI output
✓Infer preferences from editing patterns, usage frequency, and engagement depth
✓Maintain both stated and revealed preference tracks with separate confidence
✓Satisfaction improves as the model converges on real needs through behavior

The Five Mom Test Principles for AI

Fitzpatrick’s original Mom Test has rules for customer conversations. Here is how those rules translate to AI product design.

Rule 1: Talk about their life, not your idea. In AI terms: observe user behavior, do not rely on user settings. The user’s editing patterns, session length, return frequency, and engagement depth tell you more about their needs than any preference configuration.

Rule 2: Ask about specifics in the past, not generics about the future. In AI terms: learn from historical interactions, do not predict from static profiles. A user’s last 20 interactions reveal their actual working style far more accurately than a one-time profile questionnaire.

Rule 3: Talk less and listen more. In AI terms: infer more, ask less. Every time your AI asks the user a question to calibrate, it is admitting it does not understand them yet. The goal is an AI that infers needs from behavior, reducing the interrogation burden.

Rule 4: Seek out bad news. In AI terms: track where users override, reject, or modify AI output. These negative signals are more informative than positive signals. A user who consistently edits AI output in a specific way is revealing a preference that the AI has not yet learned.

Rule 5: Deflect compliments. In AI terms: do not optimize for user satisfaction surveys. A user who says they are satisfied but whose alignment score is declining is on their way to churn. Satisfaction surveys are the Mom Test, people tell you what you want to hear.

Rule 1: Observe, Don’t Ask

User behavior (editing patterns, session length, engagement depth) tells you more about needs than any preference configuration ever will.

Rule 2: Learn From History, Not Profiles

The last 20 interactions reveal actual working style far more accurately than a one-time profile questionnaire. Specifics in the past over generics about the future.

Rule 3: Infer More, Ask Less

Every calibration question is an admission the AI does not understand yet. The goal is inferring needs from behavior, reducing the interrogation burden on users.

Rule 4: Seek Bad News

Track where users override, reject, or modify AI output. Negative signals are more informative than positive. Consistent edits reveal unlearned preferences.

Rule 5: Deflect Compliments

Do not optimize for satisfaction surveys. A satisfied user with declining alignment is churning. Surveys are the Mom Test: people say what you want to hear.

mom-test-ai.ts

1// Building AI that passes the Mom Test← Behavior over stated preferences
2const selfModel = await clarity.getSelfModel(userId);
3
4// Track both stated and revealed preferences← Dual-track preference model
5// selfModel.stated: { style: 'concise', level: 'intermediate' }
6// selfModel.revealed: { style: 'detailed-with-structure',
7//   level: 'advanced-python-beginner', expandsOutput: 0.47 }
8
9// When stated and revealed diverge, weight revealed higher
10const effectivePreferences = selfModel.resolvePreferences({
11  statedWeight: 0.3,
12  revealedWeight: 0.7,
13  minObservations: 10, // need enough behavior data
14});
15
16// Result: AI produces structured, detailed output
17// for a user who said 'concise' but consistently wants more

Signal Type	Accuracy for Personalization	Availability	Update Frequency
Stated preferences (settings)	Low (40-50% match with actual needs)	Immediate (onboarding)	Rare (users rarely update)
Survey responses	Low-Medium (50-60% match)	Periodic (quarterly)	Periodic
Implicit behavior (clicks, time)	Medium (60-70% match)	After first session	Every session
Revealed preferences (edit patterns)	High (75-85% match)	After 5-10 interactions	Every interaction
Self-model (stated + revealed + beliefs)	Very High (85-90% match)	After 10-20 interactions	Continuous

Building the Dual-Track Model

The practical implementation is a dual-track preference model. Track one captures what users say they want, their configured preferences, their feedback, their explicit requests. Track two captures what users actually do, their editing patterns, their usage rhythms, their selection behavior.

When the tracks align, you have high confidence. The user who says they want concise output and consistently uses concise output without modification genuinely wants concise output. The self-model confidence for that preference is high.

When the tracks diverge, you have valuable information. The user who says they want concise output but consistently expands it has a revealed preference that contradicts their stated preference. The self-model should weight the revealed preference higher while noting the divergence.

Over time, the dual-track model converges on a rich, nuanced understanding of each user, far more accurate than any set of preference checkboxes, and far more predictive of what the user will actually find useful.

Track 1: Stated Preferences

What users say they want: configured preferences, explicit feedback, direct requests. Available immediately from onboarding but often inaccurate.

Track 2: Revealed Preferences

What users actually do: editing patterns, usage rhythms, selection behavior. Takes 5-10 interactions to build but far more accurate (75-85% match).

Convergence: Self-Model

When tracks align, confidence is high. When they diverge, the model weights revealed preferences higher (0.7) while noting the divergence. Accuracy reaches 85-90%.

Trade-offs

Behavioral inference can feel intrusive. Users who learn that the AI is tracking their editing patterns might feel surveilled. The mitigation is transparency, make it clear what the AI observes and why, and give users control over what is tracked. Consent-first personalization is not just ethical, it is better product design.

Revealed preferences take time to develop. The dual-track model is unreliable for new users who have not generated enough behavioral data. For the first 5-10 interactions, stated preferences are all you have. The solution is a graduated approach: start with stated preferences, introduce behavioral weighting progressively, and be transparent about the transition.

Some users genuinely want what they say they want. Not all stated preferences are wrong. The dual-track model must be calibrated to recognize when stated and revealed preferences align, not assume they always diverge. The confidence scoring system handles this naturally, high alignment between tracks produces high confidence.

Behavioral Inference Can Feel Intrusive

Mitigate with transparency: make it clear what the AI observes and why. Give users control. Consent-first personalization is better product design.

Revealed Preferences Take Time

For the first 5-10 interactions, stated preferences are all you have. Graduate the approach: start stated, introduce behavioral weighting progressively.

Some Users Mean What They Say

Not all stated preferences are wrong. High alignment between tracks produces high confidence. The system handles this naturally through confidence scoring.

What to Do Next

Run your own Mom Test audit. Pick your most common user preference setting. Compare the stated preference distribution with actual behavior. If more than 30 percent of users behave differently from what they configured, and the number is usually 40 to 60 percent, your AI is optimizing for stated preferences that do not reflect reality.
Instrument one behavioral signal. Choose the most informative behavioral signal for your product, usually edit distance (how much users modify AI output) or adoption rate (how often they use versus discard output). Start tracking this signal per user alongside their stated preferences. The divergence patterns will reveal exactly where your AI is misunderstanding users.
Build a preference divergence dashboard. Create a view that shows, for each user, the gap between their stated preferences and their revealed behavior. Users with high divergence are the ones your AI understands least, and targeting personalization improvements at those users will produce the largest satisfaction gains.

References

Building AI that needs to understand its users?

Talk to us →

◉The Clarity Mirror

What did this article change about what you believe?

Select your beliefs

After reading this, which resonate with you?

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →

Shipping Fast in the Wrong Direction

AI-assisted development made your team 3x faster. But speed without alignment means you arrive at the wrong destination sooner. Velocity is not progress if the direction is wrong.

Robert Ta's Self-Model

10 min read