TL;DR: Per-seat pricing assumes a human sits in the seat. AI agents don't sit — they hammer your API at 10,000 requests per minute, execute multi-step workflows autonomously, and never log out. The foundational assumption of SaaS pricing has broken. This guide covers the billing models built for machine customers: credit wallets, task-based billing, agent tiers, metered consumption, rate-limit architectures, and the infrastructure required to build them. With real numbers from companies already doing this, and a framework for designing your own agent-compatible pricing before a competitor does it first.
Table of Contents
- The End of the Seat as a Pricing Unit
- Who Are Your Machine Customers?
- The Five Agentic Billing Models
- The Credit Wallet Model: Prepaid Consumption for Agents
- Task-Based Billing: Per Completed Workflow
- Agent Tiers: Speed, Priority, and Capability Pricing
- Rate Limiting for Machine Callers
- Designing APIs for Machine Consumers
- Billing Model Comparison Table
- Revenue Forecasting When Consumption Is Unpredictable
- Billing Infrastructure: What You Actually Need to Build
- Abuse Prevention and Cost Floors
- Case Studies: Companies Already Pricing for Agents
- How to Transition Existing Human Customers to Agent-Compatible Plans
- Frequently Asked Questions
The End of the Seat as a Pricing Unit
In 2010, when Salesforce was building the SaaS playbook, the pricing unit was obvious: a seat. One human, one seat, one monthly fee. The human sat at a computer for eight hours, used the software, logged off. The constraint on consumption was biological — humans get tired, go to lunch, sleep. Revenue was predictable because consumption was predictable, because consumption was human.
That constraint no longer exists.
An AI agent running on GPT-5 or Claude Opus 4 doesn't sleep. It doesn't go to lunch. It doesn't need onboarding sessions, it doesn't get confused by the UI, and it doesn't wait for a manager to approve its expense report before it takes action. It receives a task — "research and qualify these 500 leads," "process these 2,000 support tickets," "run these 50 A/B test analyses" — and it executes autonomously, at machine speed, until the task is complete or it hits a wall.
The seat model breaks in at least three ways when the customer is an agent:
The consumption volume problem. One AI agent can execute in an hour what a human user would execute in a month. If you have 10 seats priced at $200/seat/month, a company deploying 10 agents can blow through the equivalent of $20,000 worth of human usage in 48 hours. You've priced yourself out of your own product's value.
The identity problem. Seats map to identities. An agent doesn't have a stable identity the way a human employee does. It might be one agent, or five parallel agents, or a hundred ephemeral agent instances running simultaneously under one API key. "How many seats does an agent count as?" is not a question the seat model can answer gracefully.
The accountability problem. With human seats, you can inspect behavior — audit logs show which human did what, when. With agents, a single "user" might trigger 50,000 downstream API calls, each of which propagates to further API calls, all within a single billing cycle. The question isn't whether to charge for this — you absolutely should — it's whether your billing model can even express it.
I've spoken with founders at a dozen SaaS companies grappling with this right now. The common pattern: a customer deploys an AI agent using their API, consumption spikes 40x in a month, the customer hits rate limits, gets confused, emails support, and the conversation immediately becomes "wait, we're on a 5-seat plan — are agents counted as seats?" Nobody has a good answer. The pricing model wasn't designed for the question.
The SaaS companies that figure this out first will capture disproportionate share of the agentic wave. The ones that don't will watch their best enterprise customers deploy agents, exhaust their plans, hit artificial walls, and switch to a competitor who built for machine customers.
This is related to but distinct from usage-based pricing, which emerged as a response to the limitations of seat pricing for variable consumption. Agent pricing goes a level further — it's not just "charge per use," it's "rearchitect your entire billing model for a non-human consumer that has fundamentally different consumption patterns, identity structures, and value metrics than any human user you've ever priced for."
Who Are Your Machine Customers?
Before building a pricing model for agents, you need to understand who — or what — is actually calling your API. There are at least four distinct categories of machine customer, and they warrant different pricing approaches.
Type 1: Orchestrated Internal Agents
These are agents deployed by your existing customers to automate their own workflows. A marketing team deploys an agent to pull analytics from your platform, generate weekly reports, and push them to Slack. A sales team deploys an agent to update CRM records based on meeting transcripts. The agent is an internal productivity tool for a human organization.
These agents have relatively predictable, bounded consumption. They run on schedules (weekly reports, daily syncs), they operate within defined scopes, and a human somewhere configured them and can be held accountable for their behavior. This is the easiest category to price for because the consumption patterns are knowable.
Type 2: Agentic SaaS Products (AI Agents as the Product)
These are companies whose entire product is an AI agent. Companies like Dust, Lindy.ai, and a growing cohort of vertical AI agents — legal research agents, due diligence agents, customer support agents — whose product value is delivered by agents that call dozens of APIs in sequence to complete a task.
These customers are deeply sensitive to your pricing model because your costs flow directly into their cost of goods sold. If you price per API call and they need 200 API calls to complete one customer task, your pricing directly determines whether their unit economics work. They will scrutinize your billing model with more rigor than any human customer, because their CFO models it in a spreadsheet before signing anything.
Type 3: Autonomous Research and Data Agents
Web scrapers, market research agents, competitive intelligence agents, data pipeline agents. These callers are often the most aggressive on volume — they're designed to exhaust sources rapidly, index comprehensively, and move on. They have no patience for human-speed rate limits and no human watching a screen to notice when they've hit a wall.
This category requires the most careful rate-limit architecture. A single misconfigured agent in this category can generate more traffic in six hours than your entire human customer base generates in a month.
Type 4: Multi-Agent Pipelines
The most complex category: agents calling agents. A primary orchestrator agent breaks down a complex task and spawns sub-agents, each of which calls your API independently. What looks like one "customer" in your user database might be a pipeline of 20 coordinated agents, each executing a portion of a workflow. The originating human is two steps removed from the API calls.
This is the frontier. The billing model for multi-agent pipelines doesn't really exist yet, and getting it right — or even approximately right — is worth serious engineering and product investment.
Understanding which of these four types represents your current and future customer mix is the first step in designing a pricing model that works. The answers drive very different billing architectures.
The Five Agentic Billing Models
There is no standard. That's the honest starting point. Every company that's built agent-compatible pricing has invented it more or less from scratch, borrowing from infrastructure pricing, API pricing, and outcome-based pricing traditions. Here are the five models that have emerged as viable:
Model 1: Consumption Credits (Prepaid Wallet)
Customers purchase a block of credits upfront. Each API call, workflow step, token, or resource consumed deducts from the wallet. When the wallet hits zero, consumption stops unless auto-refill is enabled.
Best for: Agent-native products where consumption varies dramatically month to month, companies serving agentic SaaS builders (Type 2), products with well-defined "units" of work.
Failure mode: Credit pricing is confusing if the exchange rate between credits and real work isn't clear. "1,000 credits — what does that actually get me?" is a question you will hear constantly unless you do the translation work for customers.
Model 2: Task-Based Billing
Charge per completed task, not per API call. A "task" is a defined unit of work — one lead qualified, one document analyzed, one ticket resolved. The agent can make 50 API calls internally to complete the task; the customer pays for the task outcome.
Best for: Products with well-defined workflows, companies whose customers think in outcomes not infrastructure, vertical AI agents with clear task types.
Failure mode: Defining what constitutes a "completed" task is harder than it sounds. Partial completions, retries, and edge cases require explicit handling.
Model 3: Tiered Agent Plans
Separate pricing tiers specifically for agent usage, with limits on agent concurrency, throughput (requests per second), and monthly consumption volume. Human users and agent callers are explicitly distinct.
Best for: Products serving both human and agent customers, companies transitioning from human-first to agent-capable pricing.
Failure mode: Tier walls create cliff effects — a customer needing slightly more than the mid tier allows faces a 3x price jump to the enterprise tier.
Model 4: Revenue Share / Percentage of Agent-Generated Value
If your product enables agents to create measurable economic value — closed deals, resolved tickets, generated revenue — charge a percentage. This is closer to outcome-based pricing than infrastructure pricing.
Best for: Very high-value workflows where you can instrument the outcome, products with direct revenue attribution.
Failure mode: Attribution is hard and contested. Customers resist sharing revenue when they feel the agent did the easy part. Requires contract-level sophistication.
Model 5: Metered Subscription with Overage
A base monthly subscription includes an allocation of consumption (API calls, tokens, tasks). Usage beyond the allocation triggers overage charges at a per-unit rate. Agents can consume as much as they need, with predictable base cost and variable overage.
Best for: Enterprises that need budget predictability, customers whose agent consumption is moderate and bounded, hybrid human/agent usage patterns.
Failure mode: Overage shock. An agent that runs amok can generate a bill 10x the base subscription in a week. Customers hate this and it creates trust problems.
The Credit Wallet Model: Prepaid Consumption for Agents
The credit wallet model is the most agent-native billing design I've seen. It solves several problems simultaneously: it gives customers budget control, it eliminates overage shock, and it creates a natural consumption vocabulary that works for both human and machine callers.
Here's how it works at the implementation level:
Credit definition. Define what a credit is worth in concrete terms. Don't be abstract. "1 credit = 1 API call" is clear. "1 credit = 1,000 tokens processed" is clear. "1 credit = 1 workflow step" is clear. The more specific, the less friction in sales conversations and the less confusion in support queues. OpenAI uses tokens as their credit unit — it's granular, consistent, and maps to actual compute cost. Anthropic uses the same model. Most infrastructure-layer AI products do.
Wallet tiers. Sell credits in bundles at volume discounts. Starter: 10,000 credits for $50 ($0.005/credit). Growth: 100,000 credits for $400 ($0.004/credit). Scale: 1,000,000 credits for $3,000 ($0.003/credit). Enterprise: negotiated. The discount at volume incentivizes larger upfront commitments, which improves cash flow and reduces churn risk.
Auto-refill. Offer automatic refill when the wallet drops below a threshold. Set a minimum balance trigger (e.g., 10% remaining) and auto-purchase a pre-configured top-up. This is critical for agent customers — an agent hitting a zero-balance wall at 2am on a Monday is a support nightmare. Auto-refill eliminates it.
Wallet expiry. Credits should expire, but with a long window — 12 months is standard, 24 months is agent-friendly. Expiry creates urgency to use purchased credits and prevents massive stranded balances that become a liability on your books. Be explicit about expiry in your pricing page. Hidden expiry terms create customer trust problems.
Consumption transparency. Build a real-time credit consumption dashboard. Agents consume credits at machine speed — customers need to be able to see, in near-real-time, how fast credits are being depleted. If an agent is misconfigured and consuming 10x the expected rate, the customer needs to catch that before the wallet empties. This isn't nice-to-have; it's table stakes for agent customers.
Rollover and float. Some credit wallet implementations allow unused credits to roll over to the next billing period, up to a cap (e.g., roll over up to 50% of monthly allocation). This is a customer-friendly feature that's relatively low-cost to offer if you've priced your credits correctly.
The credit wallet model also has a clean mapping to enterprise procurement. Enterprise customers can purchase a $50,000 annual credit commitment, which maps to a line item in their budget. The purchasing team doesn't need to model variable monthly bills — it's a prepaid allocation. This simplifies procurement significantly compared to variable metered billing.
Brex, Stripe, and Twilio all use variants of this model for their developer/agent-heavy customer segments. Twilio's credit model in particular is worth studying — they price SMS, voice, and programmable channels as separate credit pools, allowing complex multi-channel agents to budget precisely for each consumption type.
Task-Based Billing: Per Completed Workflow
Task-based billing is the most intuitive model for customers who think in outcomes rather than infrastructure. Instead of asking "how many API calls does this cost?", the customer asks "how many tasks can I complete?"
The challenge is defining "task" with enough precision that the billing is unambiguous. I've seen companies get this right and get this catastrophically wrong.
Getting it right. Zendesk's AI agent product (launched in 2024) charges per resolved ticket — where "resolved" means the ticket was closed without human escalation within a defined window. This is a clean definition. The customer knows exactly what they're paying for, the condition for billing is measurable, and the price maps directly to business value (a resolved support ticket has clear ROI versus a human agent handling it).
Getting it wrong. A research agent product I've followed charged per "research task completed," but their definition of "completed" was opaque — it meant the agent had returned a response, not that the response was useful. Customers ran agents that returned empty results, got charged anyway, and churned. The billing model had no quality gate.
The framework for defining billable tasks correctly:
-
Define the output condition. What does the system produce that is objectively verifiable? A closed ticket. A generated document with at least N characters. A database record with all required fields populated. A classification with a confidence score above a threshold. Be specific.
-
Define the failure condition. What does non-billable look like? Errors, timeouts, empty responses, sub-threshold quality. Customers accept paying for completed work. They resist paying for failed work.
-
Define partial completions. Complex workflows often have intermediate steps. If your task is "qualify a lead," does qualification require all five data fields or does three out of five count? Does a partial completion bill at a reduced rate? You need policy here before you have a billing dispute.
-
Build verification into the pipeline. Task-based billing requires a verification layer — code that checks whether the output meets the completion condition before recording a billable event. This is non-trivial engineering, but it's the foundation that makes the billing model trustworthy.
Pricing the task. The math works backwards from value. If your agent automates a workflow that a human would complete in 30 minutes at $30/hour labor cost, the agent task is worth up to $15 (50% of human cost, leaving the customer with clear savings). If your tasks complete in 2 seconds and you need 20 API calls to complete one, the floor is your API cost times 20 plus margin. The range between your floor and the value ceiling is your pricing space.
For context: Zapier's AI tasks were initially priced at $0.01-$0.05 per task. Salesforce's Agentforce was announced at $2 per conversation. Intercom's Fin AI charges $0.99 per resolved conversation. The variance is wide because these products operate at different layers — Zapier is closer to infrastructure, Agentforce and Fin are closer to delivered business outcomes.
Agent Tiers: Speed, Priority, and Capability Pricing
Agents don't just differ in how much they consume — they differ in the quality of service they need. An internal reporting agent running overnight can tolerate slower response times and lower throughput. A customer-facing agent handling real-time support conversations needs sub-second response times and dedicated capacity. These are fundamentally different products that can carry fundamentally different prices.
Agent tiers price this differentiation explicitly:
Standard Agent Tier
- Rate limit: 100 requests/minute
- Response time SLA: best effort (no guarantee)
- Concurrency: up to 5 parallel requests
- Intended for: batch jobs, overnight processing, non-time-sensitive workflows
- Pricing: lowest cost per call/credit
Professional Agent Tier
- Rate limit: 1,000 requests/minute
- Response time SLA: 99th percentile under 2 seconds
- Concurrency: up to 50 parallel requests
- Priority queue: ahead of standard tier
- Intended for: time-sensitive workflows, moderate-scale automation
- Pricing: 3-5x standard tier
Enterprise Agent Tier
- Rate limit: custom (negotiated, can be 10,000+ requests/minute)
- Response time SLA: contractual, typically 99.9% under 500ms
- Concurrency: custom
- Dedicated compute capacity (no shared queue)
- Intended for: large-scale agent deployments, mission-critical automation
- Pricing: custom contract, typically $5,000-$50,000/month plus consumption
The rationale: speed and capacity are real infrastructure costs. Serving an Enterprise Agent Tier customer requires reserved compute that sits idle unless they're using it. You're selling capacity, not just consumption. The pricing reflects the reservation premium.
There's also a capability dimension to agent tiers. Not all agent callers need access to your full feature surface. A lightweight agent tier might include only the high-volume, low-cost operations. Advanced agent tiers unlock access to compute-intensive capabilities — deep analysis, large context windows, complex multi-step operations. This mirrors how AWS prices Lambda vs. ECS vs. EC2 — the capability tiers are real and the pricing differences are justified.
Capability gating examples:
- Standard: up to 32K context window, basic operations only
- Professional: up to 200K context window, advanced analytics, batch processing endpoints
- Enterprise: unlimited context, custom fine-tuned models, dedicated inference endpoints
One implementation consideration: agent tiers should have explicit capability documentation in your API reference. Agents (and the engineers who configure them) need to know exactly what's available at each tier level. Ambiguity here creates support overhead and pricing disputes.
Rate Limiting for Machine Callers
Rate limiting is infrastructure policy for humans. For agents, it's a billing and cost control mechanism that requires a completely different design philosophy.
Human users hit rate limits accidentally and infrequently — they're working at human speed. When they hit a limit, they see an error, pause, and try again. The limit is a guardrail.
Agents hit rate limits deliberately or as a design constraint. A well-engineered agent has retry logic, exponential backoff, and circuit breakers built in. A poorly engineered agent (and there are many) will slam your API at maximum throughput until it either gets what it needs or crashes. The rate limit is a fence, and some agents are very determined to get over it.
Design principles for agent-compatible rate limits:
1. Separate human and agent rate limits. Don't throttle your agent customers using limits designed for human users. A human user making 100 requests per minute is probably a bot. An agent making 1,000 requests per minute might be entirely legitimate batch processing. Track rate limit budgets separately by caller type, not just by API key.
2. Use a token bucket algorithm, not a fixed window. Fixed windows create "thundering herd" problems — agents learn the window resets every minute and slam the API exactly at reset. Token buckets (where capacity refills continuously at a defined rate) smooth consumption and are fairer to well-behaved agents.
3. Return 429 responses with Retry-After headers. When an agent is rate-limited, the response should include an explicit Retry-After timestamp (not just a delay in seconds). Agents can read this and schedule retries precisely. Without it, you'll see exponential backoff that still generates huge volumes of rejected requests.
4. Build queue depth visibility. Enterprise agent customers want to know their queue position and estimated wait time when they're throttled, not just a rejection. This is standard in infrastructure products (SQS has it, Kafka has it) and agents expect it.
5. Offer "burst" credit for predictable spikes. Many agents have known high-volume periods — end-of-month processing, nightly batch jobs, weekly reporting runs. Allow customers to pre-announce burst windows and temporarily increase their limits for those windows, either as part of their tier or purchasable as burst credits. This is significantly better UX than having agents fail silently during critical processing runs.
The cost-floor dimension of rate limiting. There's an economics argument for rate limits that goes beyond stability: very high-volume agent callers with aggressive discounts can be unprofitable at unlimited throughput. Rate limits enforce a cost floor by bounding the maximum consumption in a billing period. If your marginal cost per API call is $0.0001 and you've priced credits at $0.0005, you're fine at 1 million calls but potentially stretched at 100 million calls per customer per day without limits. Rate limits are also pricing architecture.
Designing APIs for Machine Consumers
Most SaaS APIs were designed with human developers in mind — human developers building features for human users. The documentation assumes someone reads it, the error messages assume someone sees them, the authentication flows assume someone clicks buttons. Agents change all of these assumptions.
Structured, machine-readable responses. Agents don't parse prose. If your API returns verbose error messages like "Sorry, we couldn't process your request because the document format you provided doesn't match our expected input structure. Please review the documentation for acceptable formats," an agent will fail to act on it. Agents need structured error codes (INVALID_FORMAT, QUOTA_EXCEEDED, PARTIAL_FAILURE) that map to programmatic retry or fallback logic.
Idempotency keys. Agents retry operations. Retrying a non-idempotent operation creates duplicate data, double charges, or corrupted state. Every write operation in your API should support an idempotency key — a client-provided unique identifier that ensures duplicate requests return the same result without side effects. Stripe has had this right for 15 years. Many SaaS products still don't implement it.
Batch endpoints. An agent processing 10,000 records one at a time is both slow and wasteful. Batch endpoints — submit 1,000 records in a single request, get 1,000 results back — are a standard optimization for agent callers. Batch endpoints also dramatically reduce your per-transaction overhead: fewer HTTP connections, fewer auth validations, better cache utilization. Offer batch endpoints at a discount per unit versus single-record endpoints.
Async with webhooks. For long-running operations (document analysis, complex data processing, multi-step workflows), synchronous request/response is a terrible pattern for agents. An agent waiting on a blocking HTTP connection for 30 seconds is burning resources and blocking concurrency. Design long-running operations as async: submit a job, get a job ID, receive a webhook when complete. Agents are excellent at this pattern — they can submit thousands of jobs, do other work, and process results as webhooks arrive.
Machine-readable billing events. Agents should be able to query their own billing status programmatically — current credit balance, consumption in the current period, projected depletion date, recent consumption events. Build a /billing/status endpoint. Build a /billing/events endpoint with filtering and pagination. Agents monitoring their own consumption can implement self-throttling before hitting limits, which is a better outcome for both parties.
Semantic versioning with long deprecation windows. Human developers adapt to breaking changes through documentation and announcement emails. Agents don't read announcements. An agent configured six months ago will keep running the same API version forever unless the configuration is explicitly updated. Breaking changes to your API will silently break agent integrations. Use proper semantic versioning, maintain old versions for at least 12 months, and never change the behavior of existing endpoints without bumping the version.
Agent-specific API keys and permissions. Implement scoped API keys that limit what an agent can do — read-only keys for agents that only need data, scoped write keys for agents that update specific record types. This is both a security measure and a billing control — you can apply different rate limits and pricing to keys of different scopes. This connects to MCP integration patterns where permission scoping is critical for agent authentication.
Billing Model Comparison Table
The hybrid recommendation for most SaaS products: Combine a metered subscription base (predictable floor, works for budget planning) with a credit wallet layer for agent-specific consumption (prepaid, controllable, no overage shock) and explicit agent tier pricing for throughput and priority differentiation. This three-layer approach gives enterprise customers predictability, gives technical customers control, and gives you margin protection.
Revenue Forecasting When Consumption Is Unpredictable
This is the part that keeps SaaS CFOs up at night. Human user consumption is relatively predictable — DAU/MAU ratios are stable, seasonal patterns are known, seat counts track headcount which tracks ACV. Agent consumption follows none of these patterns.
An agent-heavy customer can go from 10,000 API calls one month to 10,000,000 the next because they ran their first large batch job, or deployed a new agent workflow, or onboarded five new enterprise clients into their own agentic product. This volatility is incompatible with standard SaaS revenue forecasting models.
Four approaches to managing the unpredictability:
1. Contractual minimums. Enterprise agent customers should sign contracts with minimum annual consumption commitments. A customer who thinks they'll use 5 million credits per year signs a commitment for that floor, with overage rates above. This gives you revenue floor predictability even in variable consumption months. This is how cloud providers (AWS, GCP, Azure) manage their revenue forecasting — committed use discounts anchor the relationship and the floor.
2. Cohort-based consumption modeling. Rather than forecasting at the individual customer level, model consumption by customer cohort — customers who deployed their first agent in month M tend to ramp consumption over months M+1 through M+6 following a predictable curve. Historical data from your first agent customers is the input. Build this model early, before you have many agent customers, because it will become your primary forecasting tool.
3. Leading indicator tracking. Human SaaS uses logins and feature adoption as leading indicators of renewal and expansion. Agent SaaS should track agent deployment events — number of agents configured, number of workflows activated, number of API key types issued, batch job frequency. These leading indicators of consumption ramp predict revenue 60-90 days earlier than the consumption itself.
4. Conservative cash management for prepaid credits. Credit wallets create a liability — you've received cash but haven't yet delivered the service. Standard accounting treats prepaid credits as deferred revenue, not recognized revenue. This is correct but requires cash management discipline: don't spend the prepaid credit cash until the credits are consumed. When consumption spikes, you recognize deferred revenue rapidly, which looks great on paper but requires the infrastructure to actually deliver at scale.
The honest reality: the first 12-18 months of agent-heavy revenue will be harder to forecast than human SaaS revenue. Build scenario models (conservative: agents ramp slowly, moderate: agents ramp at infrastructure-SaaS norms, aggressive: agents ramp like a viral B2C product) and manage to the conservative case in cash and infrastructure while executing against the aggressive case in sales and product.
Billing Infrastructure: What You Actually Need to Build
Most SaaS companies use Stripe Billing or a similar platform for their billing infrastructure. Stripe works fine for seat-based and simple usage-based billing. It starts to strain under the weight of agent-scale metering. Here's what you actually need to build:
High-throughput metering pipeline. If agents are making 1,000 API calls per second across your customer base, your metering system needs to record, aggregate, and attribute every one of those calls without adding latency to the critical path. This means:
- Async event emission from your API handlers (don't wait for metering to complete before returning a response)
- A high-throughput event queue (Kafka, Kinesis, or Pub/Sub) that buffers events before aggregation
- A real-time aggregation service that rolls up raw events into billing-period totals
- A periodic reconciliation job that writes aggregated totals back to your billing platform (Stripe, custom database)
Doing this wrong — synchronous metering on every API call — will add 50-200ms latency to every request and become a bottleneck at scale. The architecture is not optional.
Stripe metering objects (if using Stripe). Stripe's Billing Meter API (launched 2024) is purpose-built for high-throughput usage metering. You define a meter, emit events to it, and Stripe aggregates them for billing. It handles the hard parts — deduplication, aggregation, period rollover — but it has throughput limits (around 1,000 events per second per meter, though this is increasing). For very high-volume agent customers, you'll need pre-aggregation before sending to Stripe.
Credit wallet service. If you're building a credit wallet model, you need a transactional credit ledger — a database that records every debit (consumption) and credit (purchase, refund) to a customer's balance with ACID guarantees. This is not a feature Stripe handles natively at the granularity you need. Build this as a separate service:
- Balance table: current balance per customer
- Transaction log: every debit/credit with timestamp, amount, description, reference ID
- Debit endpoint: atomic decrement with guard against negative balance
- Auto-refill trigger: checks balance after each debit, triggers purchase if below threshold
- Balance API: queryable by customers and their agents
Real-time alerts. Customers need to know when their consumption is anomalous. Build threshold-based alerts: notify when 50% of wallet consumed, 80% consumed, 95% consumed. Notify when consumption rate in the last hour is 3x above the 7-day average. These alerts prevent the "my agent ran away and emptied my account" support tickets.
Billing event webhooks. Your billing system should emit webhooks to customers when billing events occur — credit purchase confirmed, credits consumed (daily summary), low balance alert, refill triggered, period invoice generated. Agents can subscribe to these webhooks to self-manage their consumption. This is a significant DX improvement over customers needing to poll your API for billing status.
Metering observability. Build internal dashboards that track consumption patterns across your agent customer base. Alert internally when a single customer's consumption exceeds X% of total platform consumption — a signal that one runaway agent could materially impact your infrastructure costs. Alert when consumption patterns deviate from the customer's historical norm, which can indicate either legitimate scaling (opportunity to upsell) or a misconfigured agent (risk of support cost and customer trust damage).
Abuse Prevention and Cost Floors
Agents introduce abuse vectors that don't exist in human SaaS. A human can scrape data manually for maybe 8 hours a day. An agent can scrape indefinitely. A human can spam your API with invalid requests for maybe as long as they're watching the screen. An agent can maintain that behavior for weeks. The financial exposure from abusive or misconfigured agent callers is real and requires explicit defensive architecture.
Cost floors and minimum viable billing. Every agent API key should have a minimum monthly spend or minimum credit consumption requirement before it can operate at meaningful throughput. Free tier agents running at 10 requests per minute with 1,000 credits per month are low-risk. The moment an agent starts hitting high-throughput endpoints, it should be on a paid plan that covers your cost of serving it.
Anomaly detection. Track per-customer consumption baselines. When a customer's hourly consumption exceeds 5x their 30-day average, flag the account and optionally throttle or require confirmation before continuing. This protects customers from their own misconfigured agents as much as it protects you.
Spend caps. Allow customers to set hard spending caps — a maximum dollar amount (or credit amount) that can be consumed in a billing period. When the cap is reached, consumption stops until the customer manually increases the cap or the period resets. Some customers will resist this as a limitation; others — especially those deploying agents they don't fully control — will actively want it.
Authentication hygiene. Agents often have long-lived API keys that get embedded in configuration files, environment variables, and repositories. Leaked keys can result in unauthorized agents consuming your platform at the customer's expense. Implement key rotation reminders, key-last-used tracking, and notifications for access from new IP ranges or regions. This is a security feature that also protects your billing integrity.
Rate limit fingerprinting. Some bad-faith callers attempt to circumvent rate limits by rotating API keys, rotating IP addresses, or using distributed agent networks. Fingerprint consumption patterns (user-agent strings, request patterns, timing distributions) to detect distributed abuse and apply rate limits at the fingerprint level, not just the key level.
Case Studies: Companies Already Pricing for Agents
Anthropic's API Pricing
Anthropic publishes straightforward token-based pricing for its Claude API. Claude Sonnet 4 is priced at $3 per million input tokens and $15 per million output tokens. This is pure consumption metering — no seats, no tiers by user type, just tokens consumed. For context, a typical customer support conversation is roughly 2,000-5,000 tokens, meaning Anthropic charges approximately $0.006-$0.075 per resolved conversation at the API level.
The interesting design choice: Anthropic's pricing makes no distinction between human and agent callers. An agent making 100,000 API calls per hour pays exactly the same per-token rate as a human developer making 10 calls per hour. There are no agent tiers, no priority queues, no agent-specific features in the base API. This is appropriate for a foundational model API, but companies building higher-level products on top of it need to add the agent-specific billing layer themselves.
Salesforce Agentforce
Agentforce launched in late 2024 at $2 per conversation. A conversation is defined as a single interaction between a customer and the AI agent, up to a context window of significant depth. This is task-based billing at the product layer — the underlying Salesforce infrastructure costs are abstracted, and customers pay for the outcome unit (a handled conversation) rather than the infrastructure unit (tokens, API calls).
The $2 price point was chosen carefully. The fully-loaded cost of a human support interaction (fully-burdened labor, training, tools) is typically $8-$15. Agentforce at $2 per conversation positions as 75-85% cheaper than human cost, which is an easy ROI calculation for any enterprise customer with a large support organization.
By Q1 2026, Salesforce had disclosed over $100 million in Agentforce ARR, with an average agent customer running thousands of conversations per month. The task-based billing model is working at scale.
Stripe launched its Agent Toolkit (a set of tools for AI agents to interact with Stripe APIs) with explicit rate limits and metering for programmatic callers. Agent API keys are issued separately from human API keys, have dedicated rate limit buckets (10,000 requests per minute vs. 1,000 for human keys), and Stripe is building usage-metered billing specifically for agent-scale consumption.
Stripe's design philosophy — instrument everything, charge for consumption, make the billing transparent and auditable — is the right model for any infrastructure-adjacent product serving agent customers.
Zapier's AI Task Pricing
Zapier introduced AI task pricing at $0.01-$0.05 per AI task, layered on top of their existing Zap-based pricing. This created a de facto agent tier: Zap execution (human-configured automation) is included in the subscription, AI task execution (agent-driven reasoning and action) is metered separately.
The differentiation is telling. Zapier recognized that AI agent tasks are fundamentally different from rule-based automations — they're more compute-intensive, more variable, and more valuable. Pricing them separately is both economically rational and pedagogically useful: it teaches customers to think about agent consumption as a distinct category from workflow automation.
Intercom's AI agent, Fin, uses per-resolution pricing at $0.99 per resolved conversation. Resolution is defined rigorously: the conversation closed without human handoff, the customer confirmed satisfaction or left without further contact within 24 hours. This quality gate is critical — customers pay only for outcomes, not attempts.
Intercom has reported that Fin resolves over 50% of conversations without human intervention, with CSAT scores comparable to human agents. At $0.99 per resolution vs. the true cost of a human agent interaction, the value proposition is compelling and the billing model is trusted because the definition of "resolved" is clear and verifiable.
How to Transition Existing Human Customers to Agent-Compatible Plans
If you have existing human customers on seat-based or simple subscription plans, and you're adding agent-compatible pricing, you face a transition problem. Your best customers — the ones who will deploy agents most aggressively — are exactly the customers who will blow through their existing plan caps first. Handle this transition wrong and you'll create churn precisely among your most engaged customers.
Phase 1: Identify agent usage before it happens. Before you announce agent pricing, instrument your API to identify which customers are already exhibiting agent-like patterns — high request volumes, programmatic user agents, API key usage without corresponding UI session activity. These customers are your first cohort for the agent pricing conversation.
Phase 2: Add agent plans alongside existing plans, not replacing them. Don't remove the plans your human customers are on. Add new agent-compatible plans as an option and let customers self-select. This avoids the "forced migration" narrative that creates customer anger, even when the new plan is objectively better for their use case.
Phase 3: Grandfather early adopters. Your first agent customers are taking a risk on a new pricing model. Honor that with a grandfathering commitment — they get their initial agent plan pricing for life, or for at least 24 months. This builds the trust that makes case studies possible and creates vocal advocates for the new model.
Phase 4: Set a sunset date for incompatible legacy plans. You can't maintain legacy seat plans forever alongside agent plans. Set a 12-18 month timeline for legacy plan sunset and communicate it clearly. Give customers the tools, migration guides, and support to transition. Offer migration incentives — the first three months on an agent plan at legacy pricing, for example.
The pricing framing that works. When presenting agent plans to existing customers, frame it as capability expansion, not cost increase. "Your current plan was designed for your team members. We've built new plans designed for your AI agents. Here's what changes." This frames the new pricing as a product upgrade, not a price hike — even if the actual cost for an agent-heavy customer will increase.
This connects to the broader theme in why seat-based pricing has broken down — the transition away from seats is inevitable, and the companies that manage it proactively and transparently will retain more customers than those who wait until customers hit walls and start asking hard questions.
Frequently Asked Questions
Q: My SaaS is small — do I need to think about agent pricing now, or is this a future problem?
If you have any API surface and any customers deploying automation, it's a now problem. You don't need to build the full infrastructure immediately, but you should start tracking which customers exhibit agent-like consumption patterns and ensure your current pricing doesn't create perverse incentives (like agents blowing past all your plan limits and you either losing money or creating bad customer experiences). Start with a simple "Agent Add-On" — additional API capacity at a defined per-call rate, sold as a line item add-on to any plan. That buys you time to build the full model.
Q: How do I handle multi-agent pipelines where one customer runs dozens of coordinated agents?
Define the billing entity at the API key level, not the agent level. One API key = one billing account, regardless of how many agents use it. This simplifies attribution significantly. If a customer needs to break out agent-specific consumption for internal chargeback purposes, offer sub-key metering (API keys that inherit billing from a parent account but have separate consumption reports) as an enterprise feature.
Q: What's the right price for a credit?
Work backwards from cost. Calculate your fully-loaded cost per unit of consumption (per API call, per token, per workflow step). Multiply by your target gross margin (typically 70-80% for SaaS at scale). That's your floor. Calculate the value delivered to the customer per unit of consumption (what would it cost them to do this without your product?). That's your ceiling. Set your price in the lower third of the range between floor and ceiling. Start lower than you think you should — you can always raise credit prices with sufficient notice, but lowering them is hard.
Q: Should I offer unlimited plans for agents?
No. Never. Unlimited plans have no place in agent pricing. The whole point of agents is that they can and will consume unlimited resources if given the opportunity. An "unlimited" agent plan is a financial liability. If customers push back on credit or consumption limits, sell them larger buckets at better rates — but always with a cap.
Q: How do I handle an agent that goes rogue and empties a customer's credit wallet maliciously or accidentally?
Two defenses: hard spend caps (the customer sets a maximum, you enforce it mechanically) and anomaly detection alerts (when consumption rate spikes, pause and notify before the wallet is empty). For customers who are victims of their own misconfigured agents, a refund policy for demonstrable errors (agent clearly looping, agent calling wrong endpoint, documented misconfiguration) builds trust and is worth the cost. Make this a published policy, not a case-by-case exception.
Q: How should I think about pricing for the Model Context Protocol (MCP) specifically?
MCP standardizes how agents connect to external tools and data sources, which means your product may be called via MCP from agents you didn't anticipate and didn't price for. The MCP integration question is partly a pricing question: when an MCP client calls your tool, it should authenticate with a valid API key that has appropriate agent-tier permissions and credit balance. Treat MCP calls identically to direct API calls from a billing perspective — same metering, same rate limits, same credit consumption. Don't create separate MCP pricing; it's unnecessary complexity.
Q: What happens to my existing per-seat pricing for human customers — does agent pricing replace it?
No. Maintain seat pricing for human users and add agent pricing as a separate track. Many of your customers will have both human users (seats) and agents (agent plans) running simultaneously. The billing should be additive: seats for the humans, credits or agent tier for the agents. Tying them together into a single "AI-enhanced seat" creates pricing confusion and makes it harder for customers to understand what they're paying for.
Q: How do other AI agent companies handle refunds for failed tasks?
The consensus is: don't charge for clearly failed tasks (errors, timeouts, explicit failures where the system reports non-completion). Do charge for completed tasks even if the customer later argues the output quality wasn't what they expected — that's a product quality issue, not a billing issue. Define your quality floor explicitly in your terms, and build an audit trail (logs, output records, completion events) that lets you verify whether a disputed task actually completed. Zendesk, Intercom, and Salesforce all maintain detailed completion event logs for exactly this reason.
The window for first-mover advantage in agent billing design is open, but it won't stay open long. The companies shipping agent-compatible pricing models now — credit wallets, task-based billing, agent tiers, machine-readable APIs — are building infrastructure that becomes a competitive moat. The companies waiting for the industry to standardize will inherit whatever their competitors built.
Your AI agent customers are already there. They're hitting your rate limits, exhausting your per-seat plans, and emailing your support team with questions your pricing page doesn't answer. The question isn't whether to price for agents — it's how fast you can build the model that works.
Start with the simplest viable version: an agent API key type, a credit wallet with a clear exchange rate, and a basic rate limit tier above your human user limits. Ship it. Learn from the first 10 agent customers. Iterate. The infrastructure you're building now will serve you for the next decade of SaaS, in a world where most of your customers are machines.
Related reading: