1. What Core AI is and why Apple is replacing Core ML now 2. Third-party model support: the technical implications 3. Gemini integration via Private Cloud Compute 4. On-device vs cloud: how the processing split works 5. Developer migration path from Core ML to Core AI 6. How Core AI compares to Android ML Kit and Windows Copilot Runtime 7. Apple Intelligence timeline and what Core AI enables next 8. Frequently asked questions ---

Apple replacing Core ML with Core AI at WWDC 2026 changes e…

TL;DR: Apple is set to introduce Core AI at WWDC 2026 as a full replacement for Core ML, the machine learning framework it has shipped since 2017. The new framework brings third-party model support, a cleaner Swift-native API, and a structured split between on-device inference and Private Cloud Compute. Google Gemini is expected to run inside the Core AI stack via PCC. For developers who have built on Core ML, a migration path exists -- but the architecture underneath is fundamentally different.

What you will learn

What Core AI is and why Apple is replacing Core ML now
Third-party model support: the technical implications
Gemini integration via Private Cloud Compute
On-device vs cloud: how the processing split works
Developer migration path from Core ML to Core AI
How Core AI compares to Android ML Kit and Windows Copilot Runtime
Apple Intelligence timeline and what Core AI enables next
Frequently asked questions

What Core AI is and why Apple is replacing Core ML now

Core ML launched in 2017 as Apple's answer to a simple problem: how do you run a trained neural network on an iPhone without requiring developers to write low-level GPU kernels? It worked. For nine years, Core ML handled image classification, natural language processing, and sound recognition on hundreds of millions of Apple devices. You could drop a .mlpackage file into Xcode, call a few lines of Swift, and have a model running on the Neural Engine with surprisingly little friction.

The problem is not that Core ML failed. The problem is that the world it was designed for no longer exists.

When Core ML shipped, the state of the art was a 200-million-parameter model that needed heavy quantization just to fit in 2GB of RAM. Today, Apple ships devices with 24GB of unified memory, a 38-trillion-operations-per-second Neural Engine, and a user population that now expects conversational AI, real-time image synthesis, and multi-step reasoning to happen on the device in their pocket. Core ML's original design assumptions -- fixed model files, controlled model formats, and a closed Apple-only model pipeline -- are not wrong; they are just too narrow.

Core AI is Apple's architectural reset. Rather than patching Core ML to support modern workflows, Apple has rebuilt the layer from the ground up with three goals: run any model (not just Apple-trained ones), split inference intelligently between on-device silicon and Private Cloud Compute, and present a unified API that developers use whether the computation happens locally or in the cloud.

The name change matters symbolically. Apple is not calling this "Core ML 2" or "Core ML Pro." It is a new framework with a new identity, which signals that Apple expects developers to treat it as a new starting point rather than an incremental update to existing code.

"Core AI represents the most significant change to how Apple silicon processes intelligence tasks since we introduced the Neural Engine in 2017."

The announcement is expected at WWDC 2026 in June, alongside iOS 27, macOS 16, and a hardware cycle that sources indicate includes updated MacBook Pro and iPhone 17 models with further Neural Engine improvements.

Third-party model support: the technical implications

This is the single biggest architectural change in Core AI and the one with the most immediate implications for developers.

Core ML required you to either use Apple's own models or convert third-party models into Apple's .mlpackage format using a tool called coremltools. The conversion process worked reasonably well for standard architectures like ResNet or MobileNet, but it broke frequently for cutting-edge models, introduced precision loss during quantization, and required you to stay on top of every new model architecture yourself. If you wanted to run a model that had not been ported to Core ML format, your options were limited.

Core AI removes that constraint. The framework is expected to support model loading from standard open formats including GGUF and SafeTensors, which are the two dominant formats used by the open-source AI community. This means a developer can take a model directly from Hugging Face, load it into Core AI without a conversion step, and have it running on the Neural Engine with hardware-accelerated matrix operations.

The practical implications of this are significant:

Scenario	Core ML	Core AI
Run a Hugging Face model	Requires coremltools conversion	Direct load (GGUF/SafeTensors)
Use latest open-source LLM	Often unsupported architectures	Supported via standard loaders
Fine-tuned models	Re-conversion required	Load directly
Model updates	Re-package and re-submit	Swap model file at runtime
Third-party model hosting	Not supported	Via Private Cloud Compute

The shift to open format support also means Apple is, for the first time, explicitly building a framework that embraces the broader AI ecosystem rather than requiring everything to flow through Apple's own pipeline. That is a significant cultural change for a company that has historically preferred tight vertical integration.

There is nuance here worth noting. Apple is not abandoning its own models or its own .mlpackage format. First-party Apple Intelligence models will still run through optimized Apple pipelines. Core AI simply adds a second path for developers who want to bring external models into the system without the friction of the old conversion workflow.

Gemini integration via Private Cloud Compute

The most unexpected element of the Core AI story is Google Gemini.

Apple introduced Private Cloud Compute (PCC) with Apple Intelligence in iOS 18. The premise was straightforward: tasks too large or too complex to run on-device get routed to Apple's own cloud servers, which run on Apple silicon, execute requests inside a secure enclave, and are architecturally designed so that Apple itself cannot inspect your data. PCC is a privacy-preserving cloud inference system, and it earned significant praise from security researchers when Apple published its technical specification last year.

With Core AI, Apple is extending PCC to support third-party model providers. Google is confirmed as the first partner. Gemini will be callable from within the Core AI framework via PCC, meaning your request goes to Google's models through Apple's privacy layer rather than directly to Google's infrastructure.

The privacy architecture of this is worth understanding carefully. When you call a Gemini model through Core AI:

Your request is encrypted on-device before transmission
It routes to Apple's PCC infrastructure, not Google's servers directly
PCC forwards the request to Google's Gemini endpoint with Apple acting as a privacy intermediary
The response returns through PCC back to your device
No persistent identifier links your device to the request on Google's side

Whether this arrangement holds up under scrutiny from privacy researchers remains to be seen. The promise -- Google's model capability without Google's data collection -- is compelling. The implementation details Apple releases at WWDC will matter enormously.

For developers, the Gemini integration means you can call Gemini models from Swift code through the same Core AI API you use for on-device inference. You do not need a separate Google SDK. You do not need to manage API keys inside your app. The capability is surfaced as a system resource, similar to how you call the camera or microphone.

Apple is betting that developers will prefer a unified API with privacy guarantees over a direct Google SDK integration, even if the underlying model is the same.

On-device vs cloud: how the processing split works

One of the more technically interesting aspects of Core AI is its handling of the on-device versus cloud decision. In Core ML, this decision was entirely manual: you either ran a model locally or you wrote your own networking code to call a cloud API. There was no framework-level intelligence about which approach to use.

Core AI introduces an inference routing layer. When you make an inference request through Core AI, the framework evaluates several factors before deciding where to run it:

Model size and device capability. If the model fits in available memory and the task complexity is within the Neural Engine's throughput for the current battery and thermal state, it runs on-device. Apple's Neural Engine performance figures suggest modern iPhones can handle models up to approximately 7B parameters at practical latencies.

Privacy sensitivity. Core AI exposes a privacy flag in the request API. If you mark a request as privacy-sensitive, it will not be routed to PCC even if on-device would otherwise be slower. This gives developers explicit control over the routing decision.

Network availability and latency. PCC routing requires a network connection. If the device is offline, Core AI falls back to the best available on-device model for the task.

Task type. Apple is expected to expose predefined task types (text generation, image understanding, structured extraction) that the framework uses to select the optimal model for the request automatically, rather than requiring the developer to specify a model file for every call.

This routing architecture is the clearest expression of what Core AI is trying to be: not just a model runner, but an intelligence infrastructure layer that makes smart decisions about compute placement on your behalf.

The split also addresses a real tension in the current Apple Intelligence implementation. Tasks like real-time transcription, on-device writing suggestions, and personal context queries work well locally. Tasks like complex reasoning chains, large document summarization, and multi-modal generation benefit from server-side compute. Core ML had no vocabulary for this distinction. Core AI makes it a first-class API concern.

Developer migration path from Core ML to Core AI

Apple is not deprecating Core ML on day one. Based on what is expected at WWDC, the transition plan mirrors how Apple has handled other major framework transitions: a parallel availability period where both frameworks are supported, with Core ML entering a maintenance-only mode and Core AI receiving all new features going forward.

If you have existing Core ML code, the migration path looks roughly like this:

For model inference calls, Core AI will ship a compatibility shim that accepts .mlpackage files. Your existing model loading code will not break immediately. Apple's tools team is expected to ship an Xcode migration assistant similar to what was provided during the Swift 6 concurrency migration.

For custom model pipelines, the work is heavier. If your app uses coremltools to convert and optimize models as part of a deployment pipeline, you will want to evaluate whether switching to GGUF or SafeTensors format makes sense. For teams that maintain their own model training infrastructure, the answer is almost certainly yes.

For Apple Intelligence integrations, nothing changes in the short term. The Writing Tools, Image Playground, and other system-level Apple Intelligence features will continue to work exactly as before. Core AI is a developer-facing framework change, not an end-user feature change.

The key developer actions to take now, before WWDC:

Audit your Core ML model inventory. Identify which models are Apple-provided (will migrate automatically) and which are custom (require evaluation).
Check model formats. If your models exist in their original PyTorch or TensorFlow formats, consider whether exporting to GGUF or SafeTensors is feasible before you need to.
Review your inference call sites. Core AI's Swift API is expected to be async/await-native throughout, unlike Core ML's mixed synchronous/callback design. Plan for that refactor.
Test on Neural Engine. Core AI's routing logic depends on accurate device capability detection. Test your most compute-intensive inference requests on target hardware before release.

Apple is expected to provide a detailed migration guide in the WWDC session materials and in the updated developer documentation at developer.apple.com.

How Core AI compares to Android ML Kit and Windows Copilot Runtime

Apple's Core AI does not exist in a vacuum. Google and Microsoft have both shipped framework-level AI infrastructure, and the comparison reveals where Core AI is ahead and where it is catching up.

Android ML Kit has supported a broader range of model formats for longer than Core ML. Google's on-device AI story is, in some ways, more mature: ML Kit has had MediaPipe task APIs, LiteRT (formerly TensorFlow Lite) support, and Gemini Nano on-device since Android 14. Where Core AI is likely to pull ahead is in the coherence of the on-device and cloud split. Android's on-device vs. cloud story is fragmented across multiple APIs; Core AI promises a single API surface for both.

Windows Copilot Runtime, introduced with Windows 11 24H2, is Microsoft's answer to the same problem. It supports ONNX models, DirectML acceleration, and Phi-3 on-device inference through a unified SDK. Like Core AI, it exposes AI features as system resources rather than per-app installations. The key difference is hardware: Apple's Neural Engine is purpose-built for Apple silicon and consistently outperforms Qualcomm's NPU in sustained inference workloads on comparable hardware. Microsoft's Copilot+ PC initiative is constrained by the fragmented Windows hardware ecosystem in ways Apple simply does not face.

Capability	Core AI (iOS 27)	Android ML Kit	Windows Copilot Runtime
Open model format support	GGUF, SafeTensors	LiteRT, MediaPipe	ONNX
On-device LLM	Up to ~7B params	Gemini Nano (~2B)	Phi-3 (~3.8B)
Cloud inference routing	PCC (privacy-preserving)	Vertex AI (direct)	Azure OpenAI (direct)
Third-party cloud model	Gemini via PCC	Gemini natively	Multiple via Azure
Unified API (on+cloud)	Yes	Partial	Yes
Developer migration path	Core ML compat shim	N/A	ONNX Runtime

Apple's competitive advantage here is the combination of hardware and privacy architecture. No other platform can route cloud inference requests through a hardware-attested, auditable privacy layer the way PCC does. That is a genuine differentiator, particularly as enterprise customers apply more scrutiny to where employee data goes during AI inference.

Apple Intelligence timeline and what Core AI enables next

Apple Intelligence launched in iOS 18 as a relatively cautious debut: writing tools, a smarter Siri that could act within apps, image generation via Image Playground, and the beginnings of PCC for cloud requests. It was real technology, but clearly version one of a longer roadmap.

iOS 26 (the current release) accelerated the timeline with expanded Siri proactivity, broader app integration via SiriKit extensions, and the first wave of third-party app integrations through the Apple Intelligence API. The foundation was getting more solid.

Core AI at iOS 27 is the infrastructure layer that enables the next phase:

Multi-model reasoning chains. Today, a single Siri request dispatches to a single model. With Core AI's routing architecture, a complex request can chain multiple models: an on-device model for personal context retrieval, a cloud model for reasoning, and a second on-device model for output formatting. This is how modern AI agents work, and Core AI provides the infrastructure for Apple to build this natively.

Developer-accessible reasoning. Right now, third-party apps can call Apple Intelligence features through a limited API. Core AI significantly widens that surface. Developers will be able to invoke models directly, control routing, and build custom intelligence pipelines that sit within Apple's privacy architecture rather than working around it via their own cloud backends.

Persistent model sessions. The current Core ML inference model is stateless: each call is independent. Core AI is expected to introduce session primitives that allow context to persist across multiple inference calls, enabling conversational interfaces and multi-turn reasoning without the developer managing state manually.

Siri as a coordinator. Apple's longer-term vision, consistent with everything the company has communicated since the Apple Intelligence launch, is Siri becoming a coordinator across models and apps rather than a single model trying to do everything. Core AI provides the routing and session infrastructure that makes a coordinator model practical.

The shift from Core ML to Core AI is, in the end, a shift in Apple's bet about where AI value lives. Core ML was a bet that the value was in running Apple's models efficiently on Apple silicon. Core AI is a bet that the value is in the infrastructure: the privacy architecture, the routing logic, the developer API, and the model ecosystem that all sit on top of that silicon. That is a more ambitious bet, and WWDC 2026 will show whether Apple has the execution to back it up.

Frequently asked questions

Will my existing Core ML app break when iOS 27 ships?

No. Apple is not deprecating Core ML at the iOS 27 launch. Existing .mlpackage files and Core ML inference calls will continue to work through a compatibility layer in Core AI. Apple is expected to provide a migration period of at least two major iOS releases before Core ML enters end-of-life. That said, new features will only be added to Core AI, so migration is worth planning now even if it is not urgent.

Does Core AI mean Apple is opening up to third-party AI models in a meaningful way?

Yes, more than at any prior point in Apple's history. Support for GGUF and SafeTensors formats means developers can load models directly from the open-source ecosystem without going through Apple's conversion tools. Apple is not endorsing any specific third-party model, but it is building the infrastructure to run them. This is a significant philosophical shift for a company that has traditionally kept its model pipeline tightly controlled.

How does the Gemini integration protect user privacy?

Gemini runs through Apple's Private Cloud Compute infrastructure, which acts as a privacy-preserving intermediary between your device and Google's model. Your requests are encrypted on-device, processed through Apple's attested servers, and forwarded to Gemini without a persistent identifier linking your device to the request on Google's side. Apple has published the technical architecture of PCC for external security research, and that audit trail is the main privacy guarantee. Whether that architecture adequately protects user data is a question security researchers will scrutinize carefully once the implementation is available.

What is the difference between Core AI and the Apple Intelligence API?

Apple Intelligence is the consumer-facing brand for AI features in iOS, macOS, and iPadOS. Core AI is the developer framework that powers those features. The Apple Intelligence API gives developers access to specific, curated features like Writing Tools and Image Playground. Core AI is lower-level and more flexible: it lets developers load and run arbitrary models, control inference routing, and build custom AI pipelines that are not limited to Apple's predefined feature set.

When will Core AI be available to developers?

The beta SDK is expected alongside the iOS 27 and macOS 16 developer betas at WWDC 2026 in June. A public beta is typically available a few weeks after WWDC. General availability follows with the iOS 27 and macOS 16 fall release, expected September 2026.

Let's Build Something Together

Apple replacing Core ML with Core AI at WWDC 2026 changes everything for iOS 27

On this page

Weekly Newsletter