The AI Wrapper Playbook: Building a Real Business on Top of Foundation Models
The complete playbook for building defensible AI wrapper businesses — from data moats and model-switching to unit economics and surviving platform risk.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: AI wrapper businesses are real, defensible, and investable — but only when you build the right layers of defensibility on top of the model. The five layers are: data flywheel, workflow integration, domain expertise, UX quality, and fine-tuning. Distribution beats technology. Model-agnosticism beats model loyalty. And the businesses that win will be the ones that treat the foundation model as a commodity input, not a competitive advantage.
Every serious investor I've spoken with in the last 18 months has heard some version of this pitch: "We're building an AI-powered [X] that uses GPT-4 / Claude / Gemini to [do thing]." And every serious investor has, at some point, responded with some version of: "Isn't this just a wrapper?"
That dismissal is both completely fair and completely wrong at the same time.
Fair, because most AI startups pitching right now are just wrappers — thin UI layers slapped on top of an API call with no moat, no differentiation, and no plan for when OpenAI ships a native feature that undercuts them. Wrong, because some of the most valuable software companies being built today are, technically speaking, wrappers. Jasper crossed $80M ARR. Harvey is valued at over $3 billion and focuses entirely on AI for legal work. Cursor hit a $100M ARR run rate faster than almost any dev tool in history. Perplexity is valued at over $9 billion. All wrappers. All built on foundation models they don't own.
So the question isn't "should I build an AI wrapper startup?" The question is "what separates a real AI wrapper business from a feature someone ships in a weekend?" That's what this playbook is about.
I've spent the last two years building in this space, talking to founders across every vertical, and watching the category evolve from a curiosity into a legitimate asset class. Here's everything I know.
Let's start with the most honest version of the criticism: "You're just calling an API. OpenAI could ship this in a week. Why would I invest?"
This argument has merit. If your entire value proposition is "we put a better UI on ChatGPT," you're right to be worried. OpenAI ships features constantly. Anthropic is shipping Claude artifacts, Projects, and deep integrations. Google is putting Gemini everywhere. If your differentiation is purely the prompt you're sending to the API, you have no business.
But here's the thing — almost no serious AI wrapper startup today is differentiating on the prompt. The good ones are differentiating on entirely different dimensions, and the "it's just a wrapper" criticism misunderstands what those dimensions are.
Think about it this way: Shopify is a "wrapper" around payment processing, hosting, and inventory management. Stripe is a "wrapper" around banking rails. Twilio is a "wrapper" around telecoms infrastructure. Every B2B SaaS company that built on AWS is, technically, a "wrapper" around compute and storage. Nobody calls these businesses undefensible. Nobody says "AWS will just ship this."
The reason we don't say that about Shopify is because Shopify built genuine value on top of the commodity infrastructure — a distribution network of merchants, an ecosystem of apps, a brand that means something, workflows so deep into merchants' daily operations that switching costs are enormous.
The same logic applies to AI wrappers. The model is the commodity input, like AWS compute. What you build on top of it is the business.
What makes a wrapper valuable:
When Cursor launched, GitHub Copilot already existed. Microsoft owned Copilot and was deeply integrated into VS Code. The obvious prediction would have been that Cursor dies within months. Instead, Cursor's IDE-native experience — where the AI understood your entire codebase, not just the file you had open — created an experience so differentiated that developers started paying $20/month for something Microsoft was offering for less. And then developers started paying $40/month. And then Cursor hit $100M ARR.
The "wrapper" framing breaks down the moment you look at what Cursor actually built. It wasn't a wrapper around a code completion API. It was a new IDE built with AI as the primary interaction paradigm, deeply integrated into how developers actually think about code. The model was an input. The product was the IDE.
This distinction — model as input vs. model as product — is the entire game.
There are five layers of defensibility in an AI wrapper business. Most startups have one or two. The great ones have four or five. Think of this as a defensibility stack — each layer adds compounding protection.
The most powerful moat in AI is proprietary data that makes your model better than anyone else's for a specific task. This isn't about having more data than OpenAI — you won't. It's about having domain-specific data that OpenAI doesn't have, or can't use effectively, or isn't incentivized to collect.
Harvey, the legal AI, has ingested millions of legal documents — contracts, case law, briefs, memos — that have been annotated and validated by actual lawyers. When a lawyer uses Harvey and the output is wrong, they correct it. Those corrections go back into training. After enough cycles, Harvey's model understands legal language, legal reasoning, and legal risk in ways that a general-purpose model cannot replicate without that data.
This is a flywheel: more users → more corrections → better model → more users. The key insight is that the data you're collecting is behavioral data — how domain experts respond to AI outputs — which is far more valuable than the raw text data the foundation model was trained on.
Your data flywheel needs three components:
If you can't answer all three, you don't have a data flywheel. You have a data pond — stagnant, not self-reinforcing.
The deeper your product lives in a user's daily workflow, the harder it is to remove. This is the SaaS switching cost principle applied to AI tools.
The best AI wrappers don't sit outside the user's existing tools — they live inside them. Cursor replaced VS Code. Notion AI lives inside Notion. Jasper integrates with Google Docs, WordPress, HubSpot. When your AI product is where the user already does their work, the activation energy to use it drops to near zero, and the switching cost to leave it rises significantly.
Ask yourself: when a user produces output with your product, does that output live in your system or theirs? If it lives in your system, you have lock-in. If it lives in their system (Google Docs, Salesforce, GitHub), you need a different moat strategy.
Foundation models are trained to be generally competent across a wide range of tasks. That generality is a strength and a weakness. It means they're never truly expert at any specific domain — they're calibrated to be reasonable, not to be the best possible legal analyst, medical researcher, or software security auditor.
Domain expertise in an AI wrapper means:
Harvey didn't hire engineers first. They hired lawyers who understood what good legal output looked like, and then built engineering around that. The domain expertise came first; the AI layer came second. This is backwards from how most AI startups build.
Underrated and under-discussed: UX is a real moat in AI products. Foundation models are increasingly commoditized. The model quality gap between GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro is shrinking. What isn't commoditized is the experience of using an AI product that feels exactly right for your specific workflow.
Perplexity's UX moat is the answer page design — citations inline, follow-up suggestions, source cards that let you dig deeper. You can replicate the underlying model calls in an afternoon. You cannot replicate six months of UX iteration on how users want to consume AI-generated research.
Cursor's UX moat is the multi-file context window, the inline diff view, and the way it integrates with the terminal. None of these are impossible to build. All of them took months of iteration. The accumulation of those micro-decisions is what separates Cursor from "another Copilot."
At scale, fine-tuning is the nuclear option for differentiation. A fine-tuned model on proprietary data can outperform GPT-4 on domain-specific tasks while costing 10x less per inference. This is the endgame for most mature AI wrapper businesses.
Writer — the enterprise AI writing platform — fine-tunes models on each customer's brand voice, style guide, and terminology. A Writer-generated piece for Accenture sounds like Accenture. That's not possible with a zero-shot call to a foundation model. It requires fine-tuning, and fine-tuning requires data, and data requires users, which brings us back to the flywheel.
The five layers compound. A business with a data flywheel builds fine-tuning data. Fine-tuning improves UX. Better UX drives workflow integration. Workflow integration generates more data. Domain expertise guides what data to collect. This is the moat stack.
What they wrapped: OpenAI's GPT models, initially GPT-3, later GPT-4.
The differentiation: Jasper built the marketing AI category before most people knew there was one. In 2021, when they launched, GPT-3 was technically accessible but practically unusable for non-technical marketers. Jasper built an interface — templates, brand voice settings, a document editor — that made GPT-3's capabilities accessible to marketing teams.
The real moat wasn't the templates. It was the distribution: a best-in-class affiliate program that paid creators to promote Jasper, building a word-of-mouth flywheel in the marketing community. Jasper didn't win on technology. They won on GTM. By the time ChatGPT made everyone aware of what language models could do, Jasper had tens of thousands of paying customers and a brand synonymous with "AI for marketing."
Current position: Under pressure from ChatGPT's free tier and native AI in Google Docs and Microsoft Word. They've pivoted toward enterprise, adding brand voice controls, multi-user workflows, and compliance features that general tools don't offer. The pivot is the right call — the consumer marketing writer market is increasingly commoditized.
Lesson: First-mover advantage in an emerging AI category can build an enormous distribution lead, but you need to convert that lead into enterprise lock-in before the category commoditizes.
What they wrapped: GPT-4 and Claude, with heavy fine-tuning on legal corpora.
The differentiation: Harvey is the canonical example of domain expertise as a moat. Lawyers are one of the highest-value knowledge workers on earth — their time costs $400-1,000/hour, their output quality has enormous financial and legal stakes, and their work requires precision that general models handle poorly.
Harvey's founders — Winston Weinberg (lawyer) and Gabriel Pereyra (ML researcher at DeepMind) — understood both sides. Weinberg knew what legal work actually looked like. Pereyra knew how to fine-tune models. Together, they built a product that passes bar-caliber legal tasks that general models fail consistently.
Harvey works with over 100 Am Law 100 firms. These aren't marketing experiments — they're deep workflow integrations where Harvey is used for contract analysis, due diligence, legal research, and drafting. The switching cost is enormous because Harvey has trained on each firm's document corpus.
Revenue model: Enterprise SaaS, annual contracts, per-seat + usage pricing. Not disclosed publicly but reportedly $10M+ ARR growing fast.
Lesson: High-stakes professional domains (law, medicine, finance, engineering) are the best markets for AI wrappers because (a) the value of quality AI output is enormous, (b) buyers will pay for compliance and reliability that general tools can't provide, and (c) domain data is proprietary and hard to replicate.
What they wrapped: Claude, GPT-4, and their own fine-tuned cursor-small model.
The differentiation: Cursor is the most technically sophisticated AI wrapper on this list. They didn't just build on top of foundation models — they built their own IDE (a fork of VS Code), their own model routing layer, and their own fine-tuned model for specific tasks like code completion.
Cursor's key insight was that code completion is a fundamentally different problem from code generation. Completion requires understanding the entire codebase — file structure, existing functions, naming conventions, architecture patterns. Context windows in 2023 weren't large enough to fit a full codebase. Cursor built proprietary context retrieval that identified the most relevant code snippets for any given completion request. This — not the underlying model — was the core technical innovation.
The result is an AI coding experience that developers describe as genuinely magical. The model knows your codebase better than a new engineer would after a month of onboarding. It suggests refactors in your style. It catches errors that are only errors given your specific architecture.
Revenue model: $20/month Pro, $40/month Business. Growing through PLG — developers adopt, then bring Cursor into their companies.
Lesson: Deeply understanding the technical problem in your domain (the context problem for code completion) and solving it in ways the foundation model provider doesn't creates genuine technical differentiation, even in a "wrapper" business.
What they wrapped: Multiple LLMs including GPT-4, Claude, Mistral, with a proprietary retrieval layer.
The differentiation: Search. Perplexity recognized that the web search UX hadn't meaningfully changed in 20 years — ten blue links, click, navigate, read, synthesize, repeat. They asked: what if the synthesis happened for you, with sources, in real time?
The moat isn't the language model. The moat is the retrieval infrastructure, the indexing of the web, the answer-page UX, and increasingly the proprietary data partnerships that give Perplexity access to premium sources (financial data, academic papers, news) that general models don't have.
Perplexity also moved aggressively on distribution — mobile apps, browser extensions, API access — creating touchpoints across every surface where people do research. The result is a product that feels native to search behavior in a way that ChatGPT, with its chat-first paradigm, doesn't.
Revenue model: Freemium with Pro tier at $20/month; enterprise API; exploring advertising revenue.
Lesson: You can take on incumbents with a 20+ year head start (Google Search) if your core insight — that users want synthesized answers, not links — is correct and your execution is fast enough to establish brand before the incumbent reacts.
What they wrapped: Originally GPT-4; increasingly their own Palmyra model family.
The differentiation: Enterprise content governance. Writer's insight was that enterprises don't just need AI that generates content — they need AI that generates content consistent with brand guidelines, compliant with legal review, and auditable for the regulatory environments they operate in.
Writer built a full content ops platform: brand voice settings, terminology management, style guides that actually affect model outputs, workflow integration with major enterprise tools (Salesforce, Microsoft 365, Google Workspace), and an enterprise-grade security model (SOC 2 Type II, HIPAA, on-premise deployment options).
Crucially, Writer invested early in developing their own Palmyra model family. This wasn't vanity — it was strategic. Owning the model means they can fine-tune it per customer without exposing customer data to third parties, a critical selling point for regulated industries.
Revenue model: Enterprise SaaS, annual contracts, significant per-seat pricing in the $20-50/seat range with enterprise contracts in the six-to-seven figure range.
Lesson: Enterprise software is about more than the AI quality — it's compliance, security, auditability, and integration depth. Building the enterprise trust infrastructure around an AI product is a genuine moat that pure model providers struggle to replicate quickly.
Here's a principle I hold firmly: your application should be model-agnostic from day one. If switching from GPT-4 to Claude requires more than a config change, you've made a strategic error.
The model landscape changes every three to six months. New models arrive. Pricing drops. Quality leapfrogs. The model that's best for your use case today will likely not be best 18 months from now. If you've hard-coded to a single model provider, every model switch becomes a major engineering project.
The abstraction pattern:
Your Application
↓
AI Abstraction Layer (LiteLLM, your own router, or similar)
↓
Model Routing Logic (cost, latency, task-type)
↓
Model Providers: OpenAI | Anthropic | Google | Mistral | Local
The abstraction layer should expose a single interface regardless of which model is underneath. Your prompt templates may need model-specific tuning, but your application logic should never know which model it's talking to.
Model routing by task type is where this gets powerful:
| Task Type | Recommended Approach |
|---|---|
| Long-form generation | Claude 3.5 Sonnet (best long-context coherence) |
| Code generation | GPT-4o or Claude 3.5 Sonnet |
| Fast completions | GPT-4o-mini or Haiku |
| Image understanding | GPT-4o Vision |
| Embeddings | text-embedding-3-small (cost-efficient) |
| Function calling | GPT-4o (most reliable tool use) |
You route by task type, not by preference. The model that wins each category changes quarterly. Your routing logic adapts; your application doesn't.
The open-source hedge: For any task that doesn't require frontier model quality, consider routing to open-source models (Llama 3.3, Mistral Large, Qwen 2.5) hosted on your own infrastructure or via providers like Together AI, Fireworks, or Groq. The cost difference can be 10-50x, and for high-volume tasks, this is the difference between profitable unit economics and burning cash.
LiteLLM is the standard library for model abstraction in Python. For Node.js/TypeScript, the AI SDK from Vercel provides similar abstractions. Build your routing layer on top of these, not from scratch.
Prompt portability matters too. Different models respond differently to the same prompt. Claude tends to be more verbose; GPT-4o tends to follow instructions more literally; Gemini has quirks with structured output. Your prompt templates should be tested on every model you route to. This is non-trivial work but it's a one-time investment that pays dividends every time a new model ships.
The data moat question isn't "how do I collect more data?" It's "what data, generated by my users, makes my product meaningfully better than a product that doesn't have it?"
There are four mechanisms for building a data moat:
Every user interaction with your AI product generates potential training data. The key is capturing the signal — which outputs were good, which were bad, and why.
Practically, this means:
This data — structured as (input, output, quality_signal) triples — is the raw material for RLHF (reinforcement learning from human feedback), which is how OpenAI aligned GPT and how Harvey built legal-specific alignment.
You don't need to do full RLHF from day one. Start by collecting the data now, even if you don't have the ML capability to use it yet. The data becomes valuable the moment you have the resources to exploit it.
Retrieval-Augmented Generation (RAG) lets you extend a foundation model's knowledge with your own document corpus without fine-tuning. You store documents as vector embeddings, retrieve the most relevant chunks for any given query, and include them in the model's context.
For domain-specific products, the proprietary knowledge base is the moat. Harvey's corpus of annotated legal documents. A medical AI company's database of clinical notes and treatment outcomes. A code assistant trained on your company's internal codebase.
Building a good RAG pipeline is genuinely hard. The common failure modes:
Fix retrieval quality first. Everything else depends on it. For domain-specific RAG, you often need to fine-tune the embedding model on domain-specific terminology — a general-purpose embedding model doesn't know that "consideration" means something very specific in contract law.
The most underrated data source is implicit behavioral feedback. When 10,000 users use your product every day, you learn things about what works that no amount of pre-launch testing can reveal.
Build analytics into your AI product that captures:
These metrics tell you where your model is failing before users churn. They also generate a constant stream of "harder" examples — queries where the model struggled — that are exactly the data you need to improve.
If you're in a domain where real data is scarce or expensive to collect (medical, legal, financial), synthetic data generation is increasingly viable. The core approach: use a frontier model to generate (input, output) pairs, have domain experts validate the outputs, and use the validated pairs for fine-tuning.
This sounds circular — using a model to generate training data for a model — but it works because:
The technique is called "distillation" — training a smaller, domain-specialized model on outputs from a larger frontier model. Cursor's cursor-small model is likely a distillation of larger models on code completion tasks. It's cheaper to run and tuned for the specific patterns that matter in code completion.
The brutal truth about AI wrapper businesses: your cost of goods sold (COGS) is variable in a way that traditional SaaS COGS isn't. Every query costs real money. Getting unit economics right is the difference between a business and a subsidized product.
Current API pricing (March 2026):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
These prices drop roughly 50% every 12-18 months. But planning your unit economics around future price drops is dangerous — build for today's prices, treat future drops as upside.
A real cost breakdown for a document analysis use case:
Say you're building a contract review product. A typical contract is 10,000 words ≈ 12,500 tokens. Your prompt adds 2,000 tokens. You generate a 1,500-token analysis. Total: ~16,000 tokens.
If you're charging $50/month for up to 50 contract reviews, your blended API cost is somewhere between $0.15 (GPT-4o-mini) and $3.30 (Claude Sonnet) — a 22x range depending on model choice. That's the difference between 99% gross margins and 93% gross margins on that plan.
The pricing framework for AI wrapper businesses:
Traditional SaaS uses seat-based or flat-rate pricing because COGS are fixed. AI wrappers need to account for variable COGS. The options:
Flat rate with usage cap — easiest for users, requires careful cap setting. If users blow through the cap often, churn rises. If they don't reach it, you're leaving money on the table.
Credit system — users buy credits, each action costs credits. Allows precise COGS alignment but adds friction. Works well for power users who understand their usage patterns.
Tiered usage — $X for Y queries/month, overage at $Z. Predictable for users, scalable for you. The standard enterprise approach.
Outcome-based pricing — charge per document analyzed, per lead generated, per code review completed. Aligns your revenue with the value delivered. Hard to implement but powerful for high-value domains.
The 70% gross margin floor: SaaS businesses typically target 70%+ gross margins. AI wrapper businesses can achieve this, but it requires:
At Cursor's scale ($100M ARR), the difference between routing 20% of completions to their fine-tuned cursor-small model vs. GPT-4o is likely millions of dollars per year.
Prompt caching is an underutilized lever. If 80% of your queries share the same 2,000-token system prompt, you can cache that prefix — both OpenAI and Anthropic offer prompt caching at 90% discount for cached tokens. For a product with a fixed system prompt, this can drop COGS by 30-40%.
Every AI wrapper founder lives with a specific existential dread: what happens when OpenAI ships your feature?
This is real. It has happened. It will happen again. The question is how to build a business that survives it.
Historical examples:
Notion AI (2023): Every AI writing tool built as a standalone product lost significant business when Notion added native AI directly into Notion. Users who were paying for a separate AI writing tool stopped seeing the value of the standalone product when the exact same capability appeared in the tool they were already using. Jasper lost approximately 20% of its smaller customers to this dynamic.
Canva AI: Every AI image generation tool aimed at non-designers lost market share when Canva added Magic Studio — AI image generation, background removal, and content generation — directly into Canva's workflow. Products that were standalone image generation tools had to pivot to serve users who needed capabilities Canva couldn't match.
GitHub Copilot → Copilot Workspace: Every AI code review and refactoring tool lost market narrative (if not customers) when GitHub shipped Copilot Workspace, which handles multi-file edits and issue-to-PR workflows. The standalone "AI code review" category got harder to fundraise in overnight.
The common thread: Platform risk materializes when your product lives adjacent to a platform and the platform ships your exact feature. It's least severe when your product is deeply integrated into the platform (they'd be cannibalizing their own integration) or when your product serves a buyer that the platform doesn't.
The Platform Risk Framework:
Assess your proximity to platform — rate 1-5 how easily your product could be shipped as a feature of OpenAI, Anthropic, Google, or Microsoft. If it's 4 or 5, you need additional moats.
Monitor model capabilities releases — set up alerts for every model provider's changelog, blog, and announcement. Platform risk usually has a 30-90 day public signal before launch.
Build across multiple platforms — if you're only integrated with one platform (say, Notion), your risk is concentrated. Build integrations that span multiple workflow tools.
Move upmarket faster than you think you need to — consumer and SMB segments are the first to commoditize. Enterprise is the last, because enterprise buyers need compliance, SLAs, audit trails, and integrations that a general-purpose tool can't offer quickly.
Own the category narrative — Harvey is "legal AI." Cursor is "AI code editor." Perplexity is "AI search." A category-defining brand makes it harder for a platform to ship a feature that completely replaces you — they'd need to replicate not just the functionality but the market position.
The vertical depth defense — the deeper your product goes in a specific vertical, the less a horizontal platform can replace you. A contract analysis tool for M&A lawyers, trained on deal-specific document types and annotated by M&A partners, is not replaceable by a general AI assistant. The depth of specialization is the defense.
The openAI-ships-your-feature survival checklist:
If you can answer yes to three or more of these, you'll survive. If you can only answer yes to one, you're vulnerable.
I'll say this plainly: I have never seen an AI wrapper startup fail because the prompts weren't good enough. I have seen dozens fail because they couldn't get customers.
The model quality gap between the top three foundation models has narrowed to the point where, for most use cases, it's imperceptible to end users. The distribution gap — between companies that know how to acquire and retain customers and companies that don't — is enormous and widening.
This is counterintuitive to technically-minded founders. We believe that if we build something genuinely better, the market will find us. It won't. The market finds the product that appears in front of it at the right moment with the right message.
The five distribution channels that work for AI wrappers:
The highest ROI long-term channel for AI tools. Users searching "AI for [use case]" have intent. Capture that intent with content that demonstrates your tool's capability better than a competitor's generic landing page.
Cursor's early growth was driven by developer influencers making YouTube videos comparing Cursor to Copilot. The content wasn't paid — developers genuinely had strong opinions. Perplexity's early growth was driven by tech Twitter where thought leaders started recommending it as an alternative to Google. Neither of these required a marketing budget. Both required a product worth talking about.
The content flywheel for AI tools: (1) build something genuinely better for a specific use case, (2) let the people who care most about that use case discover it, (3) those people create content about it, (4) content drives organic search, (5) organic search compounds.
This takes 12-18 months to build. Start now.
The best AI wrapper businesses are PLG-native. The product demonstrates its value immediately — users don't need a sales call or a demo to understand what Cursor or Perplexity does; they need to use it for five minutes.
PLG requires:
Cursor's PLG: the free tier gives you 2,000 completions. Anyone who uses those 2,000 completions is converted — they've experienced the product's value, and $20/month is a trivial cost to continue.
Developer and creator communities are extremely high-leverage for early-stage AI tools. A single post from a respected developer showing "here's what I built with [tool]" can drive thousands of signups.
Build in the communities where your users already are. For dev tools: GitHub, Hacker News, Reddit (r/programming, r/MachineLearning), Twitter/X, Discord servers for developers. For marketing tools: Product Hunt, LinkedIn, specific marketing communities. For legal/professional tools: bar associations, continuing education platforms, professional forums.
Community distribution is slower to start but more durable than paid channels. Community members are higher-intent buyers who have already self-selected as the type of person who cares deeply about tools for their profession.
Every major platform has a marketplace or integration directory. Salesforce AppExchange, HubSpot App Marketplace, Slack App Directory, Notion's integrations page, Chrome Web Store — these are high-intent, low-cost distribution channels that most AI startups ignore.
Getting featured in a platform marketplace requires building an integration that's genuinely useful within that platform's context, then applying for partner status. The bar is high but the reward is a distribution channel that's self-sustaining — users search the marketplace for AI tools, find yours, and install it without any marketing spend on your part.
For AI tools targeting professional buyers (legal, medical, financial, enterprise IT), enterprise sales is the channel. Not PLG. Not SEO. Cold outreach to the right titles, a demo that shows domain-specific capability, a proof-of-concept engagement, a pilot, a contract.
Enterprise sales requires:
Harvey won enterprise legal by having real Am Law 100 firms as reference customers. "Skadden uses Harvey" is a more powerful close than any feature comparison. Enterprise buyers buy what their peers buy.
Below is a business model canvas filled out specifically for a defensible AI wrapper business. Use this as a template — customize each section for your specific vertical.
Target gross margins: 70%+ for SMB/PLG tier; 75%+ for enterprise (lower COGS relative to ACV)
Q: Is an AI wrapper actually fundable? I keep hearing VCs say they won't invest.
The VC sentiment on wrappers has evolved significantly from 2023 to 2026. In 2023, top-tier VCs were genuinely reluctant to fund wrappers. Today, the ones who passed on Harvey, Cursor, and Perplexity have recalibrated. The question isn't "is it a wrapper?" — it's "does it have a moat?" If you can demonstrate data flywheel, workflow integration, and a clear path to enterprise revenue, it's fundable. Andreessen Horowitz, Sequoia, and Accel have all made significant investments in AI wrapper businesses in the last 18 months.
Q: Should I build on OpenAI or Anthropic?
Neither, exclusively. Build model-agnostic from day one. OpenAI has the largest ecosystem and best function calling. Anthropic's Claude has the best long-context coherence and instruction following for complex tasks. Google's Gemini has the best multimodal capabilities and cost structure. Route by task type, not provider loyalty. For most products, a mix of Claude (long-form generation, complex reasoning) and GPT-4o-mini (quick completions, classification) and Gemini Flash (high-volume, cost-sensitive tasks) is optimal.
Q: How do I think about the open-source vs. closed model question?
For tasks where quality is the primary constraint, use frontier closed models. For high-volume tasks where quality is "good enough," use open-source models hosted on your infrastructure. The rule of thumb: if a user could tell the difference between the frontier model output and the open-source output, use the frontier model. If they can't tell the difference, use the cheaper option. Over time, the quality gap between open-source and closed models narrows. Design your routing layer to swap in new models without application-layer changes.
Q: When does fine-tuning make sense vs. prompt engineering?
Prompt engineering should be exhausted before fine-tuning. Fine-tuning is expensive (compute cost, data cost, ongoing maintenance), slow (model training takes time), and inflexible (hard to update without retraining). Good prompt engineering with few-shot examples can get you 80% of the way to fine-tuned performance. Fine-tune when: (1) you have >10,000 high-quality (input, output, quality_signal) triples, (2) the specific task is well-defined enough to train for, (3) cost or latency requirements can't be met by the frontier model, or (4) privacy requirements prevent sending data to a third-party provider. Don't fine-tune prematurely — it's a trap for teams who want to feel like they're doing "real AI."
Q: What's the minimum viable data moat?
There's no precise answer, but a practical threshold: your data moat is defensible when (a) it would take a competitor more than 12 months to replicate through normal user acquisition, and (b) you can demonstrate a measurable improvement in model quality attributable to the proprietary data. Early-stage, focus on collection mechanisms rather than moat size. The moat builds over time if the flywheel is running. If you're not actively collecting and structuring behavioral data from day one, you're leaving the most valuable asset of your company uncollected.
Q: How do I prevent users from just copying my prompts?
You don't, and you shouldn't spend energy trying. Prompts are not a moat. If your entire defensibility is a system prompt someone could reverse-engineer in an afternoon, you don't have a business. Build the moats that can't be copy-pasted: proprietary data, workflow integrations, enterprise relationships, brand, and fine-tuned models. Any founder worried about prompt theft is thinking about the wrong problem.
Q: What are the leading indicators that my AI wrapper is becoming defensible?
Watch these metrics: (1) DAU/MAU ratio above 40% — users are coming back every day, which means workflow integration is working; (2) net revenue retention above 120% — existing customers are expanding, which means value delivery is real; (3) organic/word-of-mouth as a significant acquisition channel — means you're building a category, not just a product; (4) time to complete a user task dropping quarter-over-quarter — means your model is improving via feedback loops; (5) enterprise deal size growing — means you're moving upmarket faster than the category commoditizes.
Q: Should I hire ML engineers first or product engineers?
Product engineers first. The most common mistake I see is founding teams that spend 6 months building ML infrastructure and never ship a product. Your first 12 months should be about (1) building a product users love, (2) acquiring users who generate data, and (3) establishing distribution. ML sophistication becomes critical after you've validated the product-market fit and have enough users to generate meaningful training data. Hire ML engineers when you have data. Hire product engineers to get data.
The "AI wrapper" debate is a distraction. What matters isn't whether you're using a foundation model API — everything is, at some level, using a foundation model API. What matters is what you build on top of it.
The businesses that will define the AI application layer are the ones that build:
The "it's just a wrapper" criticism has a half-life. Every time a so-called wrapper company reaches a billion-dollar valuation, the criticism gets harder to sustain. We're two years into the current AI application cycle, and the evidence is clear: wrappers with genuine defensibility are among the most valuable software companies being built.
Build the moat. Not the prompt.
Found this useful? I write about building in AI, startup strategy, and product thinking at udit.co. If you're building an AI wrapper and want to talk through your moat strategy, reach out directly — uditgoenka.co.
Why vertical AI startups have 3x higher retention and can charge 10x more than horizontal tools — the complete playbook for building industry-specific AI products.
A structured framework for deciding when to pivot your startup — with data thresholds, execution playbooks, and case studies from Slack, Instagram, and Shopify.
The complete guide to building AI agent startups — from the agent taxonomy and technical stack to business models, UX design, and the trust problem.