Apple rebuilds Siri from scratch with LLM power in iOS 26.4
Apple ships a fully rebuilt Siri in iOS 26.4, replacing its 13-year-old intent engine with large language models for real conversational AI.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: Apple is shipping a completely rebuilt Siri in iOS 26.4, replacing the 13-year-old intent-classification architecture with large language models. The new assistant handles multi-step tasks, holds conversational context across turns, and plugs into third-party apps via the App Intents framework. This is Apple's biggest platform bet since it announced Apple Intelligence at WWDC 2024.
The original Siri launched in 2011 as a voice-first interface to a narrow set of phone functions. Apple acquired it from SRI International and shipped it as a centerpiece of the iPhone 4S launch. For years it worked well enough: set a timer, send a text, call mom.
But the architecture underneath never scaled gracefully. Siri ran on a hand-crafted intent-classification pipeline, a system that tried to match what you said against a finite list of pre-defined actions. If your request didn't fit a known pattern, it failed. The system had no model of conversational context, no capacity for follow-up, and no way to reason across information from multiple sources at once.
By 2024 that looked catastrophically inadequate. Google Assistant had been rebuilt around Gemini. Samsung shipped Galaxy AI on every flagship. OpenAI's ChatGPT voice mode started conversations that felt genuinely natural. Siri, by contrast, still confused "remind me to call John when I get home" with a simple calendar entry if John happened to live across town.
Bloomberg's Mark Gurman, who has tracked Apple's AI struggles closely, reported in late 2024 that Apple's AI teams were under intense pressure and that the Siri LLM project had become a top executive priority. The decision to throw out the old codebase and start fresh was not taken lightly. But the gap had grown too wide to patch.
iOS 26.4 represents the payoff. The old intent graph is gone. In its place is a model-driven approach where Siri reads your intent, constructs a plan, and executes it step by step.
The core change is architectural. Old Siri matched utterances to intents using a combination of natural language processing rules and statistical classifiers trained on labeled data. When you said "play jazz," the system looked up a "play music" intent, extracted the genre slot, and fired off a MediaPlayer API call.
New Siri treats requests as prompts to a large language model. The model reads the full input, considers available context (location, current app, recent messages, calendar state), and generates a structured action plan. That plan is then executed via a task-dispatch layer that calls native iOS APIs or App Intents exposed by third-party developers.
The model running on-device is not the same as what powers ChatGPT. Apple has trained its own foundation models optimized for the Apple Neural Engine. The on-device variant handles most everyday requests: composing messages, adjusting settings, answering questions from personal data in Apple apps. Complex queries, especially those requiring broader world knowledge, get routed to Apple's Private Cloud Compute infrastructure, which runs on Apple Silicon servers.
This bifurcation is intentional. Apple's privacy argument is that data processed on your device never leaves it. Even when queries escalate to Private Cloud Compute, Apple's published technical documentation claims those servers process requests ephemerally, with no persistent logging and no ability for Apple employees to inspect the data.
The model itself was trained in-house. Apple has quietly built one of the larger ML training operations in the industry, with significant investment in JAX-based training infrastructure and custom model architectures designed to fit the memory and power budgets of iPhone-class devices.
This is where the difference becomes tangible. Ask old Siri to "find a good sushi restaurant near my hotel and add it to my trip itinerary in Notes." You get silence, an error, or a web search. The request spans location awareness, restaurant discovery, and a write operation to a notes document. Old Siri cannot chain those steps.
New Siri can. The model breaks the request into a sequence: (1) identify the user's current or upcoming hotel from Calendar, (2) run a Maps search filtered by cuisine and proximity, (3) pick a result based on rating signals, (4) open the relevant Notes document, and (5) append the restaurant detail. Each step is a tool call. The model manages the chain.
Context retention across conversation turns is equally important. If you ask "what's the weather this weekend" and then say "and what about next weekend in Paris," Siri now understands "what" refers to weather and "Paris" overrides the location context from the prior turn. This sounds trivial. It has not worked reliably in Siri for 13 years.
9to5Mac coverage from early developer testing showed that the context window holds across a typical interaction session of several minutes. Longer gaps, say leaving Siri idle for an hour, reset the context. That is a reasonable tradeoff given the memory constraints of running a conversational model continuously in the background.
One of the most significant shifts for the iOS developer community is how new Siri connects to third-party apps. The mechanism is the App Intents framework, which Apple introduced in iOS 16 and has been expanding ever since.
App Intents lets developers declare structured actions their app can perform: "book a ride," "add item to cart," "start a workout." New Siri uses the LLM to map user language onto these declared intents, which means developers do not need to anticipate every possible phrasing. If your app declares a "create invoice" intent, Siri can invoke it even if the user says "bill that client for the hours I logged."
The quality of integration depends entirely on how well a developer has described their intents. Apple's WWDC sessions on App Intents have stressed writing descriptive intent summaries, parameter labels, and result types. Apps that have invested in this work get a dramatically more capable Siri integration. Apps that haven't declared any intents remain invisible to the new system, just as they were to the old one.
Apple has also exposed Siri's new conversational capabilities to third-party apps through the Siri Intent UI framework, which lets apps render custom result views inside the Siri interface. A travel app, for example, can display an interactive booking card rather than a text response.
Apple's privacy positioning hinges on where computation happens. The on-device model runs locally on the Apple Neural Engine. For an iPhone 15 Pro or later, this is a purpose-built matrix multiplication unit with enough throughput to run a model in the 3-7 billion parameter range at reasonable latency.
When a request exceeds the on-device model's capability, it routes to Private Cloud Compute. These are Apple Silicon-based servers, reportedly running M-series chips at Apple's data centers, that handle more complex inference tasks. Apple's security documentation for Private Cloud Compute describes a design where each request is processed in a stateless container with no persistent memory and no network egress path to external services.
Independent verification of these privacy claims is difficult. Apple has invited security researchers to audit the system, but the audit scope is limited. That said, the on-device-first architecture is meaningfully different from Google's approach, where most Assistant queries leave the device. For users who prioritize privacy, new Siri's architecture is credibly stronger.
The practical implication is latency. On-device requests return faster. Private Cloud Compute adds a round-trip to Apple's servers, which introduces 200-600ms of additional latency depending on network conditions. For simple requests this is imperceptible. For complex multi-step tasks the user typically waits longer regardless, so the extra cloud latency is less noticeable.
The scale of Apple's AI investment has come into focus through a combination of job postings, supplier reports, and analyst estimates. Apple has not published a number, but multiple analysts estimate the company has allocated over $5 billion to AI infrastructure build-out across hardware procurement, data center expansion, and model training capacity.
This is consistent with Apple's capital expenditure patterns. The company announced in early 2025 that it would spend $500 billion in the United States over five years, with a significant portion directed at AI and advanced manufacturing. Apple has also been hiring aggressively: roles in machine learning research, model training infrastructure, and on-device inference optimization have grown substantially in Apple's job listings over the past two years.
The Private Cloud Compute infrastructure represents a particularly large spend. Apple Silicon server chips are expensive to produce, and building out the capacity to handle billions of Siri queries per day at acceptable latency requires a substantial server footprint. Apple's approach of using its own silicon rather than NVIDIA GPUs or cloud provider infrastructure is both a cost decision and a vertical integration strategy.
Whether this investment produces returns depends on whether developers and users actually adopt the new capabilities. Apple is betting that tight hardware-software integration, combined with a credible privacy story, will give it an edge that pure-cloud AI assistants cannot match.
The voice assistant market has consolidated around a few credible players. Here is how new Siri compares on the dimensions that matter most.
| Feature | New Siri (iOS 26.4) | Google Gemini Assistant | Samsung Galaxy AI | ChatGPT Voice Mode |
|---|---|---|---|---|
| LLM-based responses | ✓ | ✓ | ✓ | ✓ |
| On-device processing | ✓ | Partial | Partial | ✗ |
| Third-party app actions | ✓ (App Intents) | ✓ (Google plugins) | Limited | ✗ |
| Multi-step task chaining | ✓ | ✓ | Partial | ✓ |
| Conversational context | ✓ | ✓ | ✓ | ✓ |
| Privacy: no cloud logging | ✓ (Apple PCC) | ✗ | ✗ | ✗ |
| Cross-platform availability | ✗ (Apple only) | ✓ | ✗ (Samsung only) | ✓ |
| Voice naturalness | Improving | Strong | Moderate | Strong |
| Proactive suggestions | Limited | ✓ | Partial | ✗ |
Apple's clearest differentiator is privacy combined with deep OS integration. No other assistant can read your local messages, calendar, notes, and photos on-device without sending that data to a cloud provider. That matters to a specific segment of users, particularly in enterprise and regulated industries.
Where Apple trails is proactive intelligence. Google Gemini can surface relevant information before you ask. New Siri is still primarily reactive: you invoke it, it responds. Truly ambient AI assistance, where the assistant monitors context and surfaces suggestions unprompted, remains a Google strength.
The numbers are not flattering. Siri held roughly 35% of voice assistant usage in the United States in 2022, according to estimates from research firms tracking smart speaker and mobile assistant interactions. By early 2026, multiple estimates place Siri under 25%, while Google Assistant and ChatGPT-based interfaces have grown.
The decline reflects a few forces. First, Google has been more aggressive about embedding Gemini across Android and its web surfaces. Second, the ChatGPT app's voice mode attracted significant usage from users who wanted a more capable conversational experience. Third, the smart speaker category, where Siri powers HomePod, has remained a niche relative to Amazon Echo and Google Nest.
Apple's response to this has been two-pronged. The iOS 26.4 Siri rebuild addresses capability. Separately, Apple has reportedly been in discussions about deeper integrations with AI providers, including the existing ChatGPT integration that shipped in iOS 18, where users can optionally route requests to OpenAI's models.
The market share numbers are worth watching closely through the second half of 2026. If the rebuilt Siri genuinely handles the tasks users complained about, the install base advantage kicks in hard. Over 1.2 billion active iPhones means any meaningful improvement in quality has enormous reach.
If you build iOS apps, the rebuilt Siri creates both an opportunity and a risk. The opportunity: if your app does useful things, declaring them as App Intents gives Siri a way to invoke them in response to natural language. Users who would never navigate to your app's interface might trigger its core functionality through Siri without opening the app at all.
The risk: if your competitor's app has well-declared App Intents and yours doesn't, Siri will route users to them. Voice and AI-surface discoverability will become a real distribution channel, similar to how App Store search optimization emerged as a discipline.
Apple's developer documentation on App Intents is comprehensive. The key investment areas are: writing clear, descriptive intent summaries, declaring parameter types precisely so the model can extract them from natural language, and handling result types gracefully when Siri renders them inline.
For apps in categories like productivity, travel, finance, and health, the near-term action is an App Intents audit: what can users do in your app, and have you declared all of it? WWDC 2026 will almost certainly bring additional App Intents capabilities, so this is also a foundation to build on.
Several things could go wrong.
Accuracy and hallucination are the obvious concerns. LLM-based systems can confidently execute the wrong action. If Siri misunderstands a multi-step task and books the wrong flight or deletes the wrong message, user trust collapses quickly. Apple will need strong confirmation patterns for destructive or irreversible actions.
Privacy claims need ongoing scrutiny. Apple's Private Cloud Compute architecture is well-documented and appears thoughtfully designed, but the security community has only begun to audit it. Any breach or discovered logging behavior would be a significant reputational problem given how prominently Apple markets privacy.
App Intents adoption will be uneven. The most popular apps will invest quickly. The long tail of smaller apps will take time. Until the intent catalog is broad, the new Siri will be powerful for a subset of tasks and frustrating for the rest.
Battery and performance impact is not yet clear from early reports. Running a larger on-device model continuously in the background has implications for iPhone battery life. Apple has likely optimized for this carefully, but real-world impact will only become clear after wide deployment.
Finally, user habits are sticky. Many iPhone owners have learned not to bother with Siri for complex requests. Rebuilding that expectation and getting users to actually try the new capabilities is a marketing and UX challenge, not just a technology one.
Apple is replacing Siri's 13-year-old intent-classification engine with a large language model architecture. The new system can handle multi-step requests, maintain conversational context across turns, and invoke third-party app actions via App Intents. This is a ground-up rebuild, not an incremental update.
Full on-device LLM capabilities require Apple Neural Engine hardware from iPhone 15 Pro and later. Older devices may receive a subset of features that rely more heavily on server-side processing. Apple has not published a definitive compatibility matrix as of March 2026.
Simple requests are processed entirely on-device. Complex queries that require broader reasoning are routed to Apple's Private Cloud Compute infrastructure, which Apple describes as stateless and non-logging. Apple's security documentation provides technical details on this architecture.
Developers declare structured actions in their apps using the App Intents framework. Siri's LLM maps natural language requests to these declared actions, meaning users can invoke third-party app features without knowing the exact command syntax. Apps without declared intents remain inaccessible to Siri.
Both use LLMs and can handle multi-step tasks. Google Gemini has stronger proactive capabilities and broader cross-platform reach. New Siri's advantages are on-device privacy processing and deeper iOS system integration, including access to local data that Google cannot read without cloud transmission.
The ChatGPT integration from iOS 18 remains as an optional fallback for users who want to route complex queries to OpenAI. The new Siri LLM is Apple's own model. Users can still choose to forward requests to ChatGPT, but Apple's native model is now the default first responder.
Apple has not confirmed a specific date. Bloomberg's reporting suggested a March 2026 timeframe, consistent with Apple's pattern of shipping point releases in the first quarter. Developers with access to the beta have been testing builds since early 2026.
LLM-based speech understanding generally handles accent variation better than narrow intent classifiers trained on limited speaker data. Early reports from developers testing international builds suggest improvement, but Apple has not published data on accent performance specifically.
Only apps that have implemented App Intents. The framework is opt-in for developers. Apple provides default intents for its own apps (Mail, Messages, Calendar, Notes, Maps, etc.), so those work out of the box. Third-party app coverage will grow as developers update their apps.
Private Cloud Compute is Apple's server infrastructure for handling AI queries that exceed on-device capabilities. Apple says requests are processed ephemerally with no persistent logging and no ability for employees to access query data. The infrastructure runs on Apple Silicon hardware in Apple-controlled data centers.
Siri held roughly 35% of U.S. voice assistant usage in 2022 and has since fallen to under 25%, according to multiple analyst estimates. The causes include Google's aggressive Gemini integration, the rise of ChatGPT voice mode, and Siri's known limitations with complex or conversational requests. The iOS 26.4 rebuild is Apple's answer to this trend.
The on-device model handles many requests without a network connection. Tasks requiring real-time data (weather, web search, restaurant recommendations) still need connectivity. Core productivity tasks like composing messages, setting reminders, and controlling device settings can work offline.
Apple has not published full training details. The company has described using data from opt-in Siri feedback programs, synthetic data generation, and licensed datasets. Apple trains on its own infrastructure using custom ML frameworks. The models are then quantized and optimized for deployment on the Apple Neural Engine.
Apple has not announced additional AI provider integrations. The existing ChatGPT partnership was notable because it was the first time Apple formally embedded an external AI model into iOS. Whether Apple pursues similar arrangements with Anthropic, Google, or others is an open question.
Apple's developer tools include an App Intents simulator that lets developers test how Siri interprets their declared intents before submitting to the App Store. The Xcode debugger also includes Siri intent logging to help developers diagnose cases where the model misroutes a request.
Apple has not published battery impact data. Running a larger on-device model requires more compute, but the Apple Neural Engine is power-efficient relative to the CPU or GPU. Real-world battery data will emerge after wide public deployment. Beta testers have not reported dramatic regressions, but systematic testing is limited.
Apple Intelligence is the umbrella brand for Apple's AI features, which includes image generation, writing tools, and smart replies. New Siri is the conversational AI component of Apple Intelligence. They share the same underlying model infrastructure but serve different use cases.
Health data accessed through HealthKit stays on-device. Siri can answer questions about your health data using the on-device model without transmitting that data to any server. This is a specific area where Apple's on-device-first architecture provides a concrete privacy benefit over cloud-native alternatives.
The same LLM architecture is expected to roll out to HomePod and Apple Watch over 2026, though the on-device model tier will differ based on hardware capability. HomePod with its limited compute will likely rely more on Private Cloud Compute. Apple Watch Series 10 and Ultra 2 have Apple Silicon chips capable of running smaller on-device models.
Mark Gurman's Power On newsletter at Bloomberg covers Apple AI developments closely. 9to5Mac provides detailed iOS beta teardowns. Apple's own developer documentation is the authoritative technical reference for App Intents and Siri capabilities.
Apple's planned iOS 26.4 Siri AI upgrade has hit testing snags, pushing key features to iOS 26.5 and iOS 27. The delays expose the real difficulty of integrating cloud AI (Google Gemini) with on-device execution at Apple's scale.
Apple is overhauling Siri with Google's 1.2 trillion-parameter Gemini model in iOS 26.4, running queries through Apple Private Cloud Compute — an unprecedented cross-company AI partnership.
Apple is negotiating with Google to build dedicated cloud infrastructure for a Gemini-powered Siri overhaul, reportedly offering ~$1B/year in licensing fees for iOS 26.4.