Bianca Starling
Essay AI product design

Designing AI Features That Augment, Not Replace

The AI features that fail are the ones trying to replace human judgment. The ones that succeed make humans better at being human. Here's the design framework I use to tell the difference.

In 2024, there was a brief moment when every product team on earth decided to add a chatbot. Just: take the existing product, add a floating chat window in the corner, integrate GPT-4, call it AI.

Most of these didn’t work. Not because the technology was bad — it was genuinely remarkable — but because the design was wrong. The chatbot was dropped onto a product that hadn’t been redesigned to support it. It replaced the browse experience rather than complementing it. It hallucinated product-specific details that the LLM couldn’t actually know. It created a new interaction surface with no clear mental model for when to use it.

The instinct was correct — AI genuinely does make products better. The execution missed a distinction that I’ve come to think is the most important one in AI product design: augment vs. replace.

The Distinction That Matters

A feature that augments makes the user better at a thing they’re already doing, or want to do. A feature that replaces does the thing instead of the user.

Both have their place. But in most products, in most contexts, augmentation is harder to build and far more durable in terms of user trust and engagement.

Why? Because replacement is brittle. When AI does something for a user and gets it wrong, the user loses confidence in the whole system. When AI does something with a user and makes a suggestion that’s slightly off, the user corrects it, learns something about the product, and keeps going. Failure mode in augmentation is tolerable. Failure mode in replacement is catastrophic.

This isn’t just a philosophical preference. It has design implications.

Three Patterns That Augment Successfully

1. The Informed Starting Point

Instead of generating the full output, AI gives you a high-quality first draft or first option that the user finishes, customizes, and owns.

GitHub Copilot doesn’t write your program — it suggests the next line. You evaluate, accept or modify, and keep writing. The code is still yours. Copilot made you faster. If a suggestion is wrong, you delete it and move on.

In EdTech, this pattern looks like: “Based on your learning history, here’s a recommended study path” — shown as something the user can modify, not a mandate. The user sees the AI’s reasoning, adjusts for things the AI doesn’t know about their schedule or goals, and ends up with a path that’s better than either pure algorithm or pure self-direction would produce.

2. The Ambient Catch

AI runs silently in the background and surfaces something the user almost missed, almost got wrong, or almost forgot.

Grammarly doesn’t rewrite your email. It catches the passive voice you habitually write when nervous, the comma splice you never learned to spot, the sentence that reads fine in isolation but creates ambiguity in context. You decide what to fix.

In a marketplace, this pattern is: “You’ve been searching for X. You might have missed this creator who launched last week and has an unusual take on the topic.” The user still browses, still decides. AI just lowered the cost of serendipity.

3. The Synthesized Insight

The user has too much data to read. AI reads it and surfaces what’s relevant for a specific decision.

This is the most powerful pattern and the hardest to execute. The key is precision of use case — the AI needs to know what decision the user is actually making in order to synthesize toward it. “Summarize all 500 user interviews” is less useful than “summarize the themes most relevant to onboarding friction.”

At Skillshare, we explored AI-synthesized learner feedback for creators — instead of reading hundreds of reviews, a creator could see: “Learners who didn’t complete your course most often mentioned pacing in the middle chapters.” That’s the synthesized insight pattern. It doesn’t replace the creator reading reviews; it makes reading reviews 10x faster.

Where Replacement Makes Sense

Sometimes replacement is the right call. Customer service triage. Tagging and categorization at scale. Spam detection. Image labeling. These are high-volume, low-stakes, high-repetition tasks where the cost of occasional error is low and the benefit of automation is enormous.

The test: Would a skilled human do this differently in a meaningful way? If yes, augment. If no, automate.

Tagging a video “beginner-level French cuisine” doesn’t require a skilled human’s judgment — it requires pattern recognition across a taxonomy. Replace away.

Deciding whether a learner’s portfolio is ready to present to employers — that requires judgment, context, and accountability. Augment at most.

The Mental Model I Use in Design Reviews

When evaluating an AI feature in a design review, I ask the team three questions:

“When this is wrong, what happens?” Walk through the specific failure scenario. If the AI gets it wrong and takes an irreversible action, or the user is unable to override, or the user loses trust in everything else — that’s replacement gone wrong. If the user corrects it and moves on — that’s augmentation.

“What does the user believe they’re responsible for?” If the answer is “nothing, the AI handles it,” you’ve replaced human agency. That’s fine for some tasks. For most tasks in most products, users want to feel like they’re the author of their outcomes. AI should make them better authors, not ghost-write their life.

“What’s the smallest version of this that proves the value?” AI features are expensive to build and easy to over-scope. The smallest version usually reveals the augment/replace tension clearly. “Show users a suggested next lesson” (augment, easy) vs. “automatically enroll users in the next lesson when the AI predicts they’re ready” (replace, risky). Ship the first. Validate before the second.

The Trust Gradient

Trust in AI features is not binary. It builds over time through repeated accurate predictions.

This means the right strategy for most products is to start with low-stakes, high-visibility augmentations, let users validate them repeatedly, and then (carefully) expand to higher-stakes, less-visible automations once trust is established.

At Skillshare, we started AI-assisted discovery with explicit recommendations in a dedicated “For You” section that users knew was algorithmic. We watched engagement patterns, improved the algorithm, and saw users start treating the recommendations as reliable over time. Only then did we begin surfacing inline recommendations throughout the product.

That’s the trust gradient: visible suggestion → trusted recommendation → ambient guidance → confident automation. Most teams try to skip to automation. Start at the beginning.

The Design Principle Worth Tattooing

Always keep a human in the loop until the AI has earned the right to be trusted with that loop.

This isn’t anti-AI. It’s pro-user. The products I’ve seen fail at AI were almost always trying to go faster than the trust they’d earned. The ones that succeeded were methodical about earning it — and then the scale came naturally.

AI is a tool. Design it like one.


These principles shaped my approach to AI product design at Skillshare, including the MCP integration and semantic search strategy. For context, see the Skillshare AI case study.

All writing See my work