Self-Models vs Fine-Tuning: When Each Makes Sense

Enterprise teams debate fine-tuning vs RAG vs prompting for personalization. Self-models are the missing fourth option,per-user context without retraining.

Robert Ta's Self-Model CEO & Co-Founder 847 beliefs

· February 23, 2026 · 8 min read

TL;DR

Enterprise AI teams typically evaluate three approaches for personalization: fine-tuning, RAG, and prompt engineering. Self-models are a fourth option that is complementary to all three.
Fine-tuning adapts the model to a domain. RAG retrieves relevant documents. Prompt engineering shapes behavior. Self-models adapt the output to a specific user. These solve different problems.
Self-models layer on top of the others, providing per-user context injection without model retraining, scaling linearly across your user base.

Self-models vs fine-tuning is a false choice because they solve different problems. As IBM’s comparison of RAG, fine-tuning, and prompt engineering [1] explains, each method optimizes a different dimension of LLM output. Fine-tuning adapts the model to a domain. Self-models adapt the output to a specific user. Enterprise teams that spend months fine-tuning for “personalization” often plateau on per-user relevance because the model improves at domain vocabulary while still treating every user identically. This post covers how the four approaches (fine-tuning, RAG, prompt engineering, and self-models) operate at different layers, when each makes sense, and how to combine them for maximum impact.

approaches to AI personalization

that solve per-user adaptation alone

challenges of fine-tuning per Lakera

layers in a combined architecture

The Four Approaches, Decomposed

Each approach operates at a different layer of the stack and solves a different problem. The confusion happens because all four can feel like “personalization,” but they personalize different things.

Fine-Tuning: Domain Adaptation

Fine-tuning modifies the model’s weights to encode domain-specific knowledge, vocabulary, and response patterns. NVIDIA’s guide to LLM customization techniques [2] positions fine-tuning as the highest-effort, highest-accuracy option on its customization spectrum, requiring the most training data and compute but delivering maximum domain specialization. A fine-tuned model for healthcare knows medical terminology. A fine-tuned model for legal knows case citation formats. It applies uniformly to every user.

What it personalizes: The model’s domain expertise. What it cannot personalize: The output per user. A fine-tuned medical AI still gives the same explanation to a cardiologist and a first-year resident.

RAG: Document Retrieval

RAG, first introduced by Lewis et al. in 2020 [3], retrieves relevant documents at query time and injects them into the prompt. It is dynamic (the context changes based on the query) and scales well. But it retrieves based on query similarity, not user relevance.

What it personalizes: The information source per query. What it cannot personalize: How that information is presented per user. Two users asking the same question get the same retrieved documents and the same synthesized response.

Prompt Engineering: Behavior Shaping

Prompt engineering defines the system prompt: the tone, constraints, and instructions that shape model behavior. IBM describes it as the approach that excels in “open-ended situations with a potentially diverse array of outputs” [4] and is the least resource-intensive starting point. You can write a prompt that says “be concise” or “explain like a beginner,” but you cannot write a prompt that dynamically adapts to each user without managing separate prompts per person.

What it personalizes: The model’s behavior globally. What it cannot personalize: The model’s behavior per user. Unless you write a separate system prompt per user, which does not scale.

Self-Models: User Adaptation

Self-models maintain a structured, evolving representation of each user: their expertise, goals, preferences, and beliefs. Research in user modeling and user profiling [5] has established that constructing accurate per-user representations from interaction data enables fundamentally different personalization than model-level or document-level approaches. Self-models inject per-user context into the prompt at inference time, adapting the output to the individual without retraining the model.

What they personalize: The output per user, dynamically. What they cannot personalize: The model’s domain knowledge. You still need fine-tuning or RAG for that.

Fine-Tuning: Domain Adaptation

Modifies model weights for domain vocabulary and reasoning. Applies uniformly to every user. High compute cost, not per-user.

RAG: Document Retrieval

Retrieves relevant documents at query time. Dynamic per query, but not per user. Same documents, same synthesis for everyone.

Prompt Engineering: Behavior Shaping

Defines global behavior through system prompts. Cheapest starting point, but cannot adapt per user without separate prompts per person.

Self-Models: User Adaptation

Per-user context injection at inference time. Dynamic, scales linearly, adapts output to the individual without retraining.

Approach	What It Adapts	Per-User?	Dynamic?	Cost to Scale
Fine-tuning	Model weights (domain)	No	No (retrain)	High (compute)
RAG	Context (documents)	No	Yes (per query)	Medium (infra)
Prompt engineering	Behavior (global)	No	No (manual)	Low (time)
Self-models	Output (per user)	Yes	Yes (per request)	Low (API call)

Without Self-Models

×Fine-tuned model gives expert-level domain answers, same style for every user
×RAG retrieves the right documents, presents them identically to everyone
×Prompt engineering sets a global tone, no per-user calibration
×Personalization requires manual segmentation that breaks at scale

With Self-Models Layered On

✓Fine-tuned domain expertise + output adapted to each user's level
✓RAG-retrieved documents synthesized differently based on user goals
✓System prompt augmented with per-user context at inference time
✓Personalization scales linearly: one API call per user per request

How Self-Models Layer on Top

The key architectural insight is that self-models are not an alternative to the other three. They are a layer that makes each of the others more effective. OpenAI’s own Context Engineering for Personalization cookbook [6] demonstrates this pattern: separating memory into structured profiles, global preferences, and session-specific overrides, then injecting the right slices at inference time. You do not choose between fine-tuning and self-models. You use both.

Layer 1: Fine-Tuned Model (Domain)

The model knows your vertical: medical terminology, legal citation formats, financial compliance vocabulary.

Layer 2: RAG (Documents)

Retrieves current, relevant documents grounded in the specific query context. Provides source attribution.

Layer 3: Self-Model (User)

Injects per-user context: expertise level, communication preferences, current goals. Calibrates the output for the specific human receiving it.

layered-personalization.ts

1// Layer 1: Fine-tuned model (domain adaptation)← Knows your vertical
2const model = 'ft:gpt-4:your-org:medical-v3';
3
4// Layer 2: RAG (document retrieval)← Knows the query
5const docs = await vectorStore.query(userQuery, { topK: 5 });
6
7// Layer 3: Self-model (user adaptation)← Knows the person
8const selfModel = await clarity.getSelfModel(userId);
9const userContext = {
10  expertise: selfModel.getBeliefs({ context: 'medical_expertise' }),
11  role: selfModel.getBeliefs({ context: 'clinical_role' }),
12  preferences: selfModel.getBeliefs({ context: 'communication_style' })
13};
14
15// All three layers combined← Domain + documents + user
16const response = await llm.generate({
17  model,
18  context: docs,
19  userModel: userContext,
20  query: userQuery
21});

The fine-tuned model ensures domain-accurate responses. RAG ensures the response is grounded in current, relevant documents. The self-model ensures the response is calibrated for the specific human receiving it. Three layers, three problems, one output.

When to Use Each

The decision matrix is not “which one” but “which combination.”

Fine-tuning makes sense when: You have significant domain-specific vocabulary, formatting, or reasoning patterns that the base model handles poorly. Regulated industries (healthcare, legal, finance) often benefit from fine-tuning because the domain conventions are precise and the cost of getting them wrong is high. As Lakera’s fine-tuning guide [7] notes, the trade-off includes risks of catastrophic forgetting, overfitting on small datasets, and substantial compute requirements.

RAG makes sense when: Your knowledge base changes frequently, you need source attribution, or your domain is too large to encode in model weights. Almost every enterprise AI product should use RAG.

Prompt engineering makes sense always: It is the cheapest, fastest tool. Use it as your baseline. But recognize its ceiling: it cannot adapt per user without becoming unmanageable.

Self-models make sense when: You have repeat users whose needs vary. If every user asks the same question and expects the same answer, you do not need self-models. If different users asking the same question need different responses (different depth, different framing, different emphasis), self-models fill the gap that the other three cannot.

Fine-Tuning: When to Use

Domain-specific vocabulary, formatting, or reasoning. Regulated industries where conventions are precise. Justify the compute and retraining cost.

RAG: When to Use

Knowledge base changes frequently. Source attribution required. Domain too large for model weights. Almost always the right choice.

Prompt Engineering: When to Use

Always. Cheapest, fastest baseline. Recognize its ceiling: cannot adapt per user without separate prompts per person.

Self-Models: When to Use

Repeat users with varying needs. Different users asking the same question need different depth, framing, and emphasis.

For most enterprise AI products, the right architecture is: RAG (always) + prompt engineering (always) + self-models (when users are diverse) + fine-tuning (when domain precision justifies the cost).

The Fine-Tuning Trap

A common mistake in enterprise AI teams is reaching for fine-tuning when the real problem is user adaptation. Research on catastrophic forgetting during continual fine-tuning [8] shows that fine-tuning introduces its own risks: models ranging from 1B to 7B parameters exhibit forgetting of previously acquired knowledge, and the severity can increase with model scale. The investment is significant, and the payoff is domain-level, not user-level.

The pattern looks like this: users complain that the AI “does not understand them.” The team interprets this as a model quality problem. They invest months in fine-tuning on domain data. The fine-tuned model is measurably better on benchmarks. Users still complain that it does not understand them.

The users were never saying the model lacked domain knowledge. They were saying the model did not adapt to their individual context. A fine-tuned model that knows everything about oncology but explains immunotherapy the same way to an oncologist and a patient’s family member has not solved the understanding problem.

The Two Axes of AI Personalization

Fine-tuning → smarter about the domain

Self-models → smarter about the user

Most teams invest on one axis and wonder why the other does not improve.

Trade-offs

Self-models are not free, and there are cases where they add complexity without proportional value.

Cold start exists. New users have thin self-models. The cold start problem [9]) is well-documented in recommender systems research: when a system has not yet gathered sufficient information about a user, it cannot make intelligent suggestions. The mitigation is graceful degradation with population-level defaults that improve as individual context accumulates. The first interaction will not be personalized. The tenth will be.

Latency adds up. Fetching user context adds a small overhead per request. For most enterprise applications this is negligible. For real-time streaming use cases, preload the self-model at session start.

Context window competition. User context competes for tokens with RAG-retrieved documents. A 500-token user context summary means 500 fewer tokens of retrieved content. The trade-off is usually favorable (a smaller, better-targeted context outperforms a larger, generic one) but it requires tuning.

Not every product needs it. If your users are homogeneous (same role, same expertise, same goals), per-user adaptation adds complexity without value. Self-models shine when user diversity is high and the same information needs different delivery.

What to Do Next

Audit your personalization axis. Map your current architecture against the four approaches. Where are you investing? Most teams have strong domain adaptation (fine-tuning or RAG) and weak user adaptation (nothing). The imbalance is your opportunity.
Run the two-user test. Pick your top five queries. Generate responses for two personas: a domain expert and a beginner. If the responses are identical, you have a user adaptation gap. Measure how much they should differ.
Add a self-model layer to your existing pipeline. You do not need to rebuild your architecture. Inject per-user context into your generation prompt alongside your existing RAG context. Start with Clarity’s self-model API: one API call to get structured user context you can inject at inference time.

Fine-tuning makes your AI expert. Self-models make your AI personal. You need both. See how the user layer works.

References

Building AI that needs to understand its users?

Talk to us →

◉The Clarity Mirror

What did this article change about what you believe?

Select your beliefs

After reading this, which resonate with you?

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →

Customer World Models: How to Build the AI Layer Anthropic Can't Ship

Per-user self-models that predict behavior, connect marketing to revenue, and compound with every interaction. The customer world model is the moat the frontier labs can't replicate.

Robert Ta's Self-Model

4 min read