Apple replacing Core ML with Core AI at WWDC 2026 changes everything for iOS 27
Apple is replacing Core ML with a new Core AI framework at WWDC 2026. Third-party model support, Gemini integration, and what iOS 27 developers need to know.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: Apple is set to introduce Core AI at WWDC 2026 as a full replacement for Core ML, the machine learning framework it has shipped since 2017. The new framework brings third-party model support, a cleaner Swift-native API, and a structured split between on-device inference and Private Cloud Compute. Google Gemini is expected to run inside the Core AI stack via PCC. For developers who have built on Core ML, a migration path exists -- but the architecture underneath is fundamentally different.
Core ML launched in 2017 as Apple's answer to a simple problem: how do you run a trained neural network on an iPhone without requiring developers to write low-level GPU kernels? It worked. For nine years, Core ML handled image classification, natural language processing, and sound recognition on hundreds of millions of Apple devices. You could drop a .mlpackage file into Xcode, call a few lines of Swift, and have a model running on the Neural Engine with surprisingly little friction.
The problem is not that Core ML failed. The problem is that the world it was designed for no longer exists.
When Core ML shipped, the state of the art was a 200-million-parameter model that needed heavy quantization just to fit in 2GB of RAM. Today, Apple ships devices with 24GB of unified memory, a 38-trillion-operations-per-second Neural Engine, and a user population that now expects conversational AI, real-time image synthesis, and multi-step reasoning to happen on the device in their pocket. Core ML's original design assumptions -- fixed model files, controlled model formats, and a closed Apple-only model pipeline -- are not wrong; they are just too narrow.
Core AI is Apple's architectural reset. Rather than patching Core ML to support modern workflows, Apple has rebuilt the layer from the ground up with three goals: run any model (not just Apple-trained ones), split inference intelligently between on-device silicon and Private Cloud Compute, and present a unified API that developers use whether the computation happens locally or in the cloud.
The name change matters symbolically. Apple is not calling this "Core ML 2" or "Core ML Pro." It is a new framework with a new identity, which signals that Apple expects developers to treat it as a new starting point rather than an incremental update to existing code.
"Core AI represents the most significant change to how Apple silicon processes intelligence tasks since we introduced the Neural Engine in 2017."
The announcement is expected at WWDC 2026 in June, alongside iOS 27, macOS 16, and a hardware cycle that sources indicate includes updated MacBook Pro and iPhone 17 models with further Neural Engine improvements.
This is the single biggest architectural change in Core AI and the one with the most immediate implications for developers.
Core ML required you to either use Apple's own models or convert third-party models into Apple's .mlpackage format using a tool called coremltools. The conversion process worked reasonably well for standard architectures like ResNet or MobileNet, but it broke frequently for cutting-edge models, introduced precision loss during quantization, and required you to stay on top of every new model architecture yourself. If you wanted to run a model that had not been ported to Core ML format, your options were limited.
Core AI removes that constraint. The framework is expected to support model loading from standard open formats including GGUF and SafeTensors, which are the two dominant formats used by the open-source AI community. This means a developer can take a model directly from Hugging Face, load it into Core AI without a conversion step, and have it running on the Neural Engine with hardware-accelerated matrix operations.
The practical implications of this are significant:
| Scenario | Core ML | Core AI |
|---|---|---|
| Run a Hugging Face model | Requires coremltools conversion | Direct load (GGUF/SafeTensors) |
| Use latest open-source LLM | Often unsupported architectures | Supported via standard loaders |
| Fine-tuned models | Re-conversion required | Load directly |
| Model updates | Re-package and re-submit | Swap model file at runtime |
| Third-party model hosting | Not supported | Via Private Cloud Compute |
The shift to open format support also means Apple is, for the first time, explicitly building a framework that embraces the broader AI ecosystem rather than requiring everything to flow through Apple's own pipeline. That is a significant cultural change for a company that has historically preferred tight vertical integration.
There is nuance here worth noting. Apple is not abandoning its own models or its own .mlpackage format. First-party Apple Intelligence models will still run through optimized Apple pipelines. Core AI simply adds a second path for developers who want to bring external models into the system without the friction of the old conversion workflow.
The most unexpected element of the Core AI story is Google Gemini.
Apple introduced Private Cloud Compute (PCC) with Apple Intelligence in iOS 18. The premise was straightforward: tasks too large or too complex to run on-device get routed to Apple's own cloud servers, which run on Apple silicon, execute requests inside a secure enclave, and are architecturally designed so that Apple itself cannot inspect your data. PCC is a privacy-preserving cloud inference system, and it earned significant praise from security researchers when Apple published its technical specification last year.
With Core AI, Apple is extending PCC to support third-party model providers. Google is confirmed as the first partner. Gemini will be callable from within the Core AI framework via PCC, meaning your request goes to Google's models through Apple's privacy layer rather than directly to Google's infrastructure.
The privacy architecture of this is worth understanding carefully. When you call a Gemini model through Core AI:
Whether this arrangement holds up under scrutiny from privacy researchers remains to be seen. The promise -- Google's model capability without Google's data collection -- is compelling. The implementation details Apple releases at WWDC will matter enormously.
For developers, the Gemini integration means you can call Gemini models from Swift code through the same Core AI API you use for on-device inference. You do not need a separate Google SDK. You do not need to manage API keys inside your app. The capability is surfaced as a system resource, similar to how you call the camera or microphone.
Apple is betting that developers will prefer a unified API with privacy guarantees over a direct Google SDK integration, even if the underlying model is the same.
One of the more technically interesting aspects of Core AI is its handling of the on-device versus cloud decision. In Core ML, this decision was entirely manual: you either ran a model locally or you wrote your own networking code to call a cloud API. There was no framework-level intelligence about which approach to use.
Core AI introduces an inference routing layer. When you make an inference request through Core AI, the framework evaluates several factors before deciding where to run it:
Model size and device capability. If the model fits in available memory and the task complexity is within the Neural Engine's throughput for the current battery and thermal state, it runs on-device. Apple's Neural Engine performance figures suggest modern iPhones can handle models up to approximately 7B parameters at practical latencies.
Privacy sensitivity. Core AI exposes a privacy flag in the request API. If you mark a request as privacy-sensitive, it will not be routed to PCC even if on-device would otherwise be slower. This gives developers explicit control over the routing decision.
Network availability and latency. PCC routing requires a network connection. If the device is offline, Core AI falls back to the best available on-device model for the task.
Task type. Apple is expected to expose predefined task types (text generation, image understanding, structured extraction) that the framework uses to select the optimal model for the request automatically, rather than requiring the developer to specify a model file for every call.
This routing architecture is the clearest expression of what Core AI is trying to be: not just a model runner, but an intelligence infrastructure layer that makes smart decisions about compute placement on your behalf.
The split also addresses a real tension in the current Apple Intelligence implementation. Tasks like real-time transcription, on-device writing suggestions, and personal context queries work well locally. Tasks like complex reasoning chains, large document summarization, and multi-modal generation benefit from server-side compute. Core ML had no vocabulary for this distinction. Core AI makes it a first-class API concern.
Apple is not deprecating Core ML on day one. Based on what is expected at WWDC, the transition plan mirrors how Apple has handled other major framework transitions: a parallel availability period where both frameworks are supported, with Core ML entering a maintenance-only mode and Core AI receiving all new features going forward.
If you have existing Core ML code, the migration path looks roughly like this:
For model inference calls, Core AI will ship a compatibility shim that accepts .mlpackage files. Your existing model loading code will not break immediately. Apple's tools team is expected to ship an Xcode migration assistant similar to what was provided during the Swift 6 concurrency migration.
For custom model pipelines, the work is heavier. If your app uses coremltools to convert and optimize models as part of a deployment pipeline, you will want to evaluate whether switching to GGUF or SafeTensors format makes sense. For teams that maintain their own model training infrastructure, the answer is almost certainly yes.
For Apple Intelligence integrations, nothing changes in the short term. The Writing Tools, Image Playground, and other system-level Apple Intelligence features will continue to work exactly as before. Core AI is a developer-facing framework change, not an end-user feature change.
The key developer actions to take now, before WWDC:
async/await-native throughout, unlike Core ML's mixed synchronous/callback design. Plan for that refactor.Apple is expected to provide a detailed migration guide in the WWDC session materials and in the updated developer documentation at developer.apple.com.
Apple's Core AI does not exist in a vacuum. Google and Microsoft have both shipped framework-level AI infrastructure, and the comparison reveals where Core AI is ahead and where it is catching up.
Android ML Kit has supported a broader range of model formats for longer than Core ML. Google's on-device AI story is, in some ways, more mature: ML Kit has had MediaPipe task APIs, LiteRT (formerly TensorFlow Lite) support, and Gemini Nano on-device since Android 14. Where Core AI is likely to pull ahead is in the coherence of the on-device and cloud split. Android's on-device vs. cloud story is fragmented across multiple APIs; Core AI promises a single API surface for both.
Windows Copilot Runtime, introduced with Windows 11 24H2, is Microsoft's answer to the same problem. It supports ONNX models, DirectML acceleration, and Phi-3 on-device inference through a unified SDK. Like Core AI, it exposes AI features as system resources rather than per-app installations. The key difference is hardware: Apple's Neural Engine is purpose-built for Apple silicon and consistently outperforms Qualcomm's NPU in sustained inference workloads on comparable hardware. Microsoft's Copilot+ PC initiative is constrained by the fragmented Windows hardware ecosystem in ways Apple simply does not face.
| Capability | Core AI (iOS 27) | Android ML Kit | Windows Copilot Runtime |
|---|---|---|---|
| Open model format support | GGUF, SafeTensors | LiteRT, MediaPipe | ONNX |
| On-device LLM | Up to ~7B params | Gemini Nano (~2B) | Phi-3 (~3.8B) |
| Cloud inference routing | PCC (privacy-preserving) | Vertex AI (direct) | Azure OpenAI (direct) |
| Third-party cloud model | Gemini via PCC | Gemini natively | Multiple via Azure |
| Unified API (on+cloud) | Yes | Partial | Yes |
| Developer migration path | Core ML compat shim | N/A | ONNX Runtime |
Apple's competitive advantage here is the combination of hardware and privacy architecture. No other platform can route cloud inference requests through a hardware-attested, auditable privacy layer the way PCC does. That is a genuine differentiator, particularly as enterprise customers apply more scrutiny to where employee data goes during AI inference.
Apple Intelligence launched in iOS 18 as a relatively cautious debut: writing tools, a smarter Siri that could act within apps, image generation via Image Playground, and the beginnings of PCC for cloud requests. It was real technology, but clearly version one of a longer roadmap.
iOS 26 (the current release) accelerated the timeline with expanded Siri proactivity, broader app integration via SiriKit extensions, and the first wave of third-party app integrations through the Apple Intelligence API. The foundation was getting more solid.
Core AI at iOS 27 is the infrastructure layer that enables the next phase:
Multi-model reasoning chains. Today, a single Siri request dispatches to a single model. With Core AI's routing architecture, a complex request can chain multiple models: an on-device model for personal context retrieval, a cloud model for reasoning, and a second on-device model for output formatting. This is how modern AI agents work, and Core AI provides the infrastructure for Apple to build this natively.
Developer-accessible reasoning. Right now, third-party apps can call Apple Intelligence features through a limited API. Core AI significantly widens that surface. Developers will be able to invoke models directly, control routing, and build custom intelligence pipelines that sit within Apple's privacy architecture rather than working around it via their own cloud backends.
Persistent model sessions. The current Core ML inference model is stateless: each call is independent. Core AI is expected to introduce session primitives that allow context to persist across multiple inference calls, enabling conversational interfaces and multi-turn reasoning without the developer managing state manually.
Siri as a coordinator. Apple's longer-term vision, consistent with everything the company has communicated since the Apple Intelligence launch, is Siri becoming a coordinator across models and apps rather than a single model trying to do everything. Core AI provides the routing and session infrastructure that makes a coordinator model practical.
The shift from Core ML to Core AI is, in the end, a shift in Apple's bet about where AI value lives. Core ML was a bet that the value was in running Apple's models efficiently on Apple silicon. Core AI is a bet that the value is in the infrastructure: the privacy architecture, the routing logic, the developer API, and the model ecosystem that all sit on top of that silicon. That is a more ambitious bet, and WWDC 2026 will show whether Apple has the execution to back it up.
No. Apple is not deprecating Core ML at the iOS 27 launch. Existing .mlpackage files and Core ML inference calls will continue to work through a compatibility layer in Core AI. Apple is expected to provide a migration period of at least two major iOS releases before Core ML enters end-of-life. That said, new features will only be added to Core AI, so migration is worth planning now even if it is not urgent.
Yes, more than at any prior point in Apple's history. Support for GGUF and SafeTensors formats means developers can load models directly from the open-source ecosystem without going through Apple's conversion tools. Apple is not endorsing any specific third-party model, but it is building the infrastructure to run them. This is a significant philosophical shift for a company that has traditionally kept its model pipeline tightly controlled.
Gemini runs through Apple's Private Cloud Compute infrastructure, which acts as a privacy-preserving intermediary between your device and Google's model. Your requests are encrypted on-device, processed through Apple's attested servers, and forwarded to Gemini without a persistent identifier linking your device to the request on Google's side. Apple has published the technical architecture of PCC for external security research, and that audit trail is the main privacy guarantee. Whether that architecture adequately protects user data is a question security researchers will scrutinize carefully once the implementation is available.
Apple Intelligence is the consumer-facing brand for AI features in iOS, macOS, and iPadOS. Core AI is the developer framework that powers those features. The Apple Intelligence API gives developers access to specific, curated features like Writing Tools and Image Playground. Core AI is lower-level and more flexible: it lets developers load and run arbitrary models, control inference routing, and build custom AI pipelines that are not limited to Apple's predefined feature set.
The beta SDK is expected alongside the iOS 27 and macOS 16 developer betas at WWDC 2026 in June. A public beta is typically available a few weeks after WWDC. General availability follows with the iOS 27 and macOS 16 fall release, expected September 2026.
Apple Music announces metadata tags for AI-generated music, creating opt-in disclosure labels that could set a new standard for streaming platforms and the creator economy.
Apple is negotiating with Google to build dedicated cloud infrastructure for a Gemini-powered Siri overhaul, reportedly offering ~$1B/year in licensing fees for iOS 26.4.
Apple's iPhone 17e brings the complete Apple Intelligence suite to a $599 device with the A19 chip, 16-core Neural Engine, and iOS 26. Pre-orders open March 4, available March 11.