The AI Product Maturity Assessment
Most AI products are stuck at Level 1, capable but generic. Here is a 5-level maturity model that maps the journey from basic AI capability to true user alignment, with concrete criteria for each level.
TL;DR
- AI product maturity is a 5-level model: Capable (works correctly), Adaptive (learns from patterns), Personal (understands individuals), Anticipatory (predicts needs), and Aligned (maintains a living model of each user)
- 78 percent of AI products are stuck at Level 1 because the jump to Level 2 requires user modeling infrastructure that standard ML architectures do not include
- Each level requires specific architectural capabilities and produces measurable improvements in retention, satisfaction, and user lifetime value
AI product maturity is a five-level model measured by depth of user understanding, not model capability: Capable, Adaptive, Personal, Anticipatory, and Aligned. 78 percent of AI products are stuck at Level 1 (capable but generic) because the jump to Level 2 requires user modeling infrastructure that standard ML architectures do not include. This post covers the five maturity levels with concrete criteria, the architectural requirements for each level, and how to assess and advance your product’s maturity.
The Five Levels
Level 1: Capable
The AI works. It produces correct, useful output for generic queries. It handles the common cases well and fails gracefully on edge cases. From a model perspective, it is well-trained, properly evaluated, and performant.
But it treats every user the same. A first-time user and a year-long power user receive identical experiences. The AI has no memory, no preferences, no understanding of who is using it. Every session starts from zero.
78 percent of AI products are at this level. They have invested heavily in model capability and almost nothing in user understanding. Their retention curves show the classic AI pattern: strong first-week engagement, steep drop-off by week 4.
Level 2: Adaptive
The AI adjusts its behavior based on observable patterns. It notices that a user prefers concise responses and starts producing shorter output. It detects that a user frequently asks about Python and starts defaulting to Python examples. The adaptation is behavioral, based on what the AI observes, not what it understands.
Level 2 products use session history, preference tracking, and basic behavioral signals to modify AI behavior. The improvement in user experience is noticeable, the product feels less generic. But the adaptation is shallow. It catches surface-level patterns without understanding underlying needs.
About 15 percent of AI products reach Level 2. The architectural requirement is a preference storage system and the ability to condition AI responses on stored preferences.
Level 3: Personal
The AI understands each user as an individual. It knows their goals, their constraints, their communication style, their domain expertise, their evolving needs. The understanding is not just behavioral (what they do) but epistemic (what they believe and know).
Level 3 products have self-models, structured representations of each user that capture beliefs, preferences, and context. The AI does not just adapt to patterns; it understands the person behind the patterns.
About 5 percent of AI products reach Level 3. The architectural requirement is a self-model system with belief tracking, confidence scoring, and continuous updating.
Level 4: Anticipatory
The AI predicts what users will need before they ask. It notices that a user’s beliefs are shifting and proactively surfaces relevant information. It detects that a project deadline is approaching and adjusts its behavior to match the urgency. It anticipates needs based on deep understanding of the user’s context and trajectory.
Level 4 products use self-models predictively, not just to personalize responses, but to anticipate needs and take proactive action. The user experience shifts from “the AI responds well” to “the AI knows what I need.”
Less than 2 percent of AI products reach Level 4. The architectural requirement is predictive self-model inference and proactive interaction capabilities.
Level 5: Aligned
The AI maintains a living model of the user that evolves in dialogue, the user can see, validate, and correct the model. The relationship between the user and the AI is transparent, consensual, and genuinely bilateral. The AI is not just personal; it is aligned, its understanding matches the user’s self-understanding.
Level 5 is the aspiration. No current AI product fully achieves it. The architectural requirement is transparent self-models, user-facing model inspection, and bilateral model evolution.
Level 1: Capable (78% of products)
Correct output for generic queries. Treats every user the same. No memory between sessions. Retention: 20-30% at 90 days.
Level 2: Adaptive (15% of products)
Adjusts behavior based on observable patterns. Shallow adaptation without deep understanding. Requires preference storage and conditioning.
Level 3: Personal (5% of products)
Understands each user as an individual with beliefs, preferences, and context. Requires self-model system with belief tracking and confidence scoring.
Level 4: Anticipatory (less than 2%)
Predicts what users need before they ask. Uses self-models predictively for proactive interaction. Requires predictive self-model inference.
Level 5: Aligned (aspiration)
Living model the user can see, validate, and correct. Transparent, consensual, bilateral. Requires user-facing model inspection and bilateral evolution.
Level 1: Capable (78% of products)
- ×Correct output for generic queries
- ×Every user gets the same experience
- ×No memory between sessions
- ×Retention: 20-30% at 90 days
Level 3: Personal (5% of products)
- ✓Individually tailored output based on deep understanding
- ✓Each user has a unique, evolving experience
- ✓Structured memory that compounds over time
- ✓Retention: 55-65% at 90 days
The Assessment Framework
For each level, there are specific capabilities to evaluate. Here is the assessment framework I use when evaluating AI products.
1// AI Product Maturity Assessment← Self-assessment framework2const assessment = {3level1_capable: {4criteria: ['Model accuracy > 85%', 'P95 latency < 500ms',5'Graceful error handling', 'Consistent output quality'],6score: null, // assess each criterion: 0-17},8level2_adaptive: {9criteria: ['Session preference tracking', 'Behavioral pattern detection',10'Response style adaptation', 'Cross-session preference persistence'],11requires: 'preference storage + conditioning',12},13level3_personal: {14criteria: ['Structured self-models per user', 'Belief tracking with confidence',15'Goal and context awareness', 'Evolving understanding over time'],16requires: 'self-model infrastructure (e.g., Clarity API)',17},18level4_anticipatory: {19criteria: ['Predictive need detection', 'Proactive suggestions',20'Context-aware timing', 'Trajectory-based personalization'],21requires: 'predictive self-model inference',22},23};
| Level | Key Capability | Infrastructure Required | Retention Impact | Products at This Level |
|---|---|---|---|---|
| 1: Capable | Correct generic output | Standard ML pipeline | 20-30% (90-day) | 78% |
| 2: Adaptive | Pattern-based adjustment | Preference storage + conditioning | 35-45% (90-day) | 15% |
| 3: Personal | Individual understanding | Self-model system | 55-65% (90-day) | 5% |
| 4: Anticipatory | Predictive personalization | Predictive self-model inference | 70-80% (90-day) | Less than 2% |
| 5: Aligned | Transparent bilateral models | User-facing model + bilateral evolution | 85%+ (90-day) | 0% (aspiration) |
Why Teams Get Stuck at Level 1
The jump from Level 1 to Level 2 is the hardest because it requires a fundamentally different architectural mindset.
Level 1 is a model problem. You invest in better training data, better fine-tuning, better prompt engineering. The feedback loop is clear: improve the model, improve the output, improve the metrics.
Level 2 is a user problem. You need to capture, store, and act on individual user information. This requires infrastructure that standard ML pipelines do not include: preference storage, behavioral tracking, conditional response generation. The model is no longer the only variable, the user model matters too.
Most teams try to reach Level 2 by improving their Level 1 capabilities, better models, better prompts, more training data. This does not work because the limitation is not model quality. A Level 1 product with a perfect model is still a generic product. The user does not care how good your model is if it treats them like a stranger.
The architectural shift is conceptual as much as technical. The team must move from “how do we make the AI smarter?” to “how do we make the AI understand each user?” Those are different questions with different answers and different infrastructure requirements.
The Model Trap
Teams try to reach Level 2 by improving Level 1 capabilities: better models, better prompts, more training data. This does not work because the limitation is not model quality.
The Architecture Shift
Level 2 requires user modeling infrastructure: preference storage, behavioral tracking, and conditional response generation. The user model matters as much as the AI model.
Wrong Question
”How do we make the AI smarter?” A Level 1 product with a perfect model is still a generic product that treats users like strangers.
Right Question
”How do we make the AI understand each user?” Different question, different answers, different infrastructure requirements, and different outcomes.
Trade-offs
Maturity levels are descriptive, not prescriptive. Not every product needs to reach Level 5. A search engine at Level 1 can be perfectly successful. A health coaching app at Level 1 is inadequate. The right maturity target depends on the product’s relationship depth with users.
Higher maturity levels require more data. Level 3 and above require significant per-user data to build effective self-models. Products with low engagement frequency may struggle to accumulate enough observations. The mitigation is focusing self-model investment on high-engagement user segments first.
The assessment can be gamed. A product could technically implement preference tracking (Level 2) without meaningfully improving user experience. The assessment should be validated against outcome metrics, retention, satisfaction, alignment scores, not just feature checkboxes.
Step 1: Assess Your Current Level
Evaluate against each level’s criteria. Be honest. Most teams overestimate by one level. If uncertain between two levels, you are at the lower one.
Step 2: Identify the Blocking Capability
For your current level plus one, what specific capability do you lack? Level 2 blocker: preference persistence. Level 3 blocker: self-model infrastructure. Level 4: predictive inference.
Step 3: Set a 6-Month Maturity Target
Advancing one level takes 3-6 months. Define capabilities needed, allocate engineering resources, measure through the retention improvement each level produces.
What to Do Next
-
Assess your current level. Use the framework above to evaluate your AI product against each level’s criteria. Be honest, most teams overestimate their maturity by one level. If you are uncertain between two levels, you are at the lower one.
-
Identify the blocking capability. For your current level plus one, identify which specific capability you lack. The blocker for Level 2 is usually preference persistence. The blocker for Level 3 is usually self-model infrastructure. The blocker for Level 4 is usually predictive inference. Focusing on the specific blocking capability produces faster maturity improvement than general investment.
-
Set a 6-month maturity target. Advancing one maturity level takes 3-6 months of focused investment. Set a target level, define the specific capabilities needed, allocate engineering resources, and measure progress through the retention improvement that each level produces. The retention delta between levels is the ROI justification for the investment.
References
- Product vs. Feature Teams
- only 1 in 26 unhappy customers actually complains
- Qualtrics notes in their churn prediction framework
- Continuous Discovery Habits
- 80% of features in the average software product are rarely or never used
Related
Building AI that needs to understand its users?
What did this article change about what you believe?
Select your beliefs
After reading this, which resonate with you?
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →