HP IQ puts a 20 billion parameter AI model on your laptop

TL;DR: HP has drawn a clear line in the on-device AI race. At HP Imagine 2026 on March 24, the company unveiled HP IQ — a workplace intelligence layer for HP devices that ships with a locally-running 20 billion parameter language model, making it one of the most capable on-device AI systems to reach commercial enterprise hardware. No internet connection required. No prompts routed through a third-party data center. The model runs directly on the PC, using the silicon already inside it.

The announcement positions HP squarely against Microsoft's cloud-tethered Copilot+ strategy and signals a broader industry argument about where enterprise AI inference should actually live. For IT departments that have spent three years telling employees not to paste sensitive documents into ChatGPT, HP IQ represents a fundamentally different proposition: all of that capability, behind the firewall, on the endpoint.

Early access is targeted for Spring 2026 on select AI PCs. Full shipping begins Fall 2026, with a broader rollout across the HP portfolio expected in the second half of 2026.

What HP IQ Actually Is

HP IQ is not a chatbot bolted onto a taskbar. According to HP's official announcement, it is a three-layer system: a local 20B-parameter model at the foundation, a set of specialized tools that handle discrete task categories above it, and an orchestration layer that routes user requests to the appropriate component and stitches results together.

The base model is built on OpenAI's gpt-oss-20b — a variant from OpenAI's open-weight model series that HP has licensed and optimized for on-device deployment. At 20 billion parameters, it is substantially larger than most on-device models shipped to date. Microsoft's Phi-4 family tops out around 14 billion parameters in its on-device configurations. Apple's on-device models for Apple Intelligence are estimated at roughly 3 billion parameters for the primary inference model, with Apple's servers handling heavier tasks. HP is claiming the largest locally-resident general-purpose model in commercial enterprise hardware, and the parameter count appears to support that claim.

But parameter count is only part of the story. The specialized tools layer is where HP IQ differentiates itself from a raw model deployment. HP has built task-specific modules for document analysis, calendar and meeting intelligence, device diagnostics, and cross-device workflow coordination. The orchestrator decides whether a given request is best handled by the base model, routed to one of these specialized tools, or — when enterprise policy explicitly permits it — sent to a cloud service for tasks that exceed local capability.

That last clause matters. Local by default is the explicit design principle. Cloud escalation is an opt-in behavior gated by enterprise IT policy, not the default fallback. For most queries on a properly configured enterprise deployment, the inference chain never leaves the device.

The Privacy Architecture

HP IQ's privacy-first design is the feature HP is leading with in its enterprise sales pitch, and for good reason. The enterprise AI market has a trust problem that no amount of Microsoft compliance certifications has fully resolved. When a financial services firm's analyst pastes a client portfolio summary into a cloud AI assistant, that data traverses infrastructure the firm does not control, passes through model serving systems subject to their own data retention policies, and potentially contributes to training pipelines the firm has no visibility into.

HP IQ's architecture short-circuits that entire concern by design. The gpt-oss-20b model weights are stored locally on the device. Inference computation runs on the device's NPU (neural processing unit), which is now standard in HP's AI PC lineup. The context window for any given conversation — the document being summarized, the meeting notes being analyzed, the code being reviewed — never leaves the machine unless the user or IT policy explicitly routes it to a cloud service.

For regulated industries, this is potentially transformative. Healthcare organizations operating under HIPAA constraints have been largely locked out of cloud AI assistants for anything touching patient data. Legal firms cannot send privileged communications through third-party inference infrastructure. Financial institutions under FINRA and SEC oversight have built elaborate data loss prevention systems specifically to catch employees attempting to use AI tools with sensitive data. HP IQ does not eliminate those compliance obligations — but it makes a large class of AI-assisted tasks compliant by default rather than requiring active prevention.

HP has been deliberate about framing the IT control story as well. Enterprise deployments can configure which capabilities are available, set the threshold at which local inference escalates to cloud, and determine which data types are permitted to flow outside the device. The model itself can be updated and managed through existing endpoint management tools — HP's own BIOS-level management suite and standard MDM platforms. This is not a consumer AI feature that IT must scramble to block; it is an enterprise system built to be administered through existing governance infrastructure.

On-Device at 20 Billion Parameters: The Technical Bet

Running a 20 billion parameter model locally on a laptop is not trivial. The engineering choices required to make this work constrain what devices can support it and what performance users should realistically expect.

Modern large language models at this scale typically require 40+ GB of memory in their full-precision (fp32) floating point representations. That is far beyond what any consumer or enterprise laptop ships with as system RAM. HP IQ's deployment relies on aggressive quantization — reducing the numerical precision of model weights from 32-bit or 16-bit floating point to 4-bit or 8-bit integer representations. Well-implemented quantization can reduce a 20B parameter model's memory footprint to roughly 10-20 GB while preserving most of the model's performance on typical task benchmarks, a tradeoff that on-device AI researchers have spent years optimizing.

The compute side of the equation runs on the NPU — the dedicated neural processing hardware that Intel, Qualcomm, and AMD have been building into their AI PC chips at HP's request and the request of the broader industry. HP's select AI PCs, the hardware cohort getting early access, are built on Qualcomm Snapdragon X Elite, Intel Core Ultra 200V series, and AMD Ryzen AI 300 series chips, all of which ship with NPUs rated at 40-50+ TOPS (tera-operations per second). That is the compute budget HP IQ is designed to run within.

The Register's coverage notes that HP has not disclosed specific tokens-per-second benchmarks for HP IQ's local inference, which is the metric that matters most for user experience. At 40 TOPS and aggressive quantization, a 20B parameter model should be capable of producing several tokens per second — fast enough for background tasks like document summarization and meeting transcription, potentially slow enough to feel laggy for interactive chat interfaces compared to cloud inference speeds. HP's choice to frame HP IQ as a workplace intelligence layer rather than a conversational chatbot suggests the company is optimizing for async task completion rather than real-time dialogue speed.

Proximity-Based Multi-Device Connectivity

One of HP IQ's more architecturally interesting features is what HP calls proximity-based multi-device connectivity — the ability for HP IQ to recognize and coordinate with other HP devices in the physical vicinity without requiring cloud intermediation.

The practical scenario HP is describing: a user walks into a conference room with their HP laptop. HP IQ detects the HP display, HP conference camera, and HP speakerphone in the room and automatically establishes a coordinated session. Meeting audio goes through HP IQ's local transcription pipeline on the laptop. The display mirrors or extends intelligently based on the meeting context. Device control inputs — raising the camera, muting the microphone — can be issued through HP IQ's interface rather than through separate hardware controls.

This is local, proximity-based orchestration. The coordination happens over local wireless protocols, not cloud relay. For enterprise environments with HP hardware throughout meeting rooms — a significant installed base — the feature could meaningfully reduce the friction of context-switching between personal and shared devices.

The deeper technical ambition here is more significant than the meeting room demo suggests. HP is building toward a model where the intelligence layer on a user's laptop can serve as the orchestration hub for an ecosystem of HP devices — not just PCs and peripherals, but printers, displays, workstations, and eventually whatever HP-branded hardware becomes relevant in hybrid work environments. The local model becomes the persistent context engine that knows about the user's work regardless of which device they are operating at any given moment.

This is HP's answer to the ecosystem lock-in that Apple has built with Handoff, Continuity, and the broader Apple Intelligence integration across iPhone, iPad, and Mac. Where Apple's ecosystem requires the cloud (or at least Apple's servers) to coordinate, HP is betting that local proximity protocols and on-device inference can replicate that seamless handoff experience within an enterprise context — without the compliance headaches that come with routing work context through consumer cloud infrastructure.

The Competitive Landscape

HP IQ enters a market that every major PC manufacturer and every major cloud AI provider is competing for simultaneously, which makes the on-device-first positioning meaningful rather than merely defensive.

Microsoft's Copilot+ PC initiative, launched in 2024, established the marketing category of AI PCs with neural processing hardware. But Microsoft's AI features are substantially cloud-dependent. Copilot itself routes through Microsoft's Azure OpenAI Service. The local NPU is used primarily for specific tasks like real-time video enhancement and Windows Studio Effects — not for running a general-purpose language model locally. Microsoft has been deliberately cautious about fully local LLM deployment, partly to protect Azure revenue and partly because local inference at meaningful model sizes remained an engineering challenge.

Dell's AI PC strategy has leaned into the hardware specs — NPU TOPS ratings, memory bandwidth, thermal design — without yet delivering a coherent on-device AI software story comparable to what HP is announcing. Lenovo's ThinkPad AI initiative has enterprise security integrations but has similarly not committed to a locally-running LLM of HP IQ's scale.

On the pure software side, Google's Gemini Nano is running on-device on Pixel phones and is beginning to appear in Chromebook configurations, but at a fraction of HP IQ's parameter count. Mistral has been pushing its smaller models as enterprise-deployable on-device options, and companies like Ollama have built developer ecosystems around running open-weight models locally — but none of these are integrated into a commercial enterprise PC at the hardware level the way HP IQ is.

The closest analog is Apple Intelligence on Apple Silicon Macs. Apple runs a local 3B parameter model for on-device tasks and routes heavier requests to Private Cloud Compute — a privacy-preserving cloud architecture Apple has detailed extensively. Apple's on-device model is substantially smaller than HP's 20B, but Apple's integrated silicon gives it efficiency advantages that make direct parameter count comparisons misleading. The more relevant distinction is the enterprise software story: Apple Intelligence is designed around consumer workflows, while HP IQ is built from the ground up for IT-managed enterprise environments.

What Enterprise IT Should Know

For enterprise IT organizations evaluating HP IQ, several practical considerations will determine whether this is a meaningful deployment or a footnote.

Hardware requirements will gate adoption aggressively in the near term. The 20B parameter local model requires the NPU and memory headroom found in HP's AI PC portfolio — devices built on 2024 and 2025 silicon, not the installed base most enterprises have been running for the past three years. Early access is limited to select AI PCs, and full rollout targets Fall 2026. Organizations with hardware refresh cycles aligned to that window are well-positioned; those with 2022-2023 hardware that has not yet cycled are not.

Management integration is the critical enterprise enabler. HP's framing — that HP IQ can be administered through existing endpoint management infrastructure — needs to be validated in practice. The specifics of how IT teams will control model update cadences, configure cloud escalation policies, audit what tasks the local model is performing, and integrate HP IQ's outputs with existing SIEM and DLP tooling will determine whether large enterprise deployments are feasible or require significant additional integration work.

The OpenAI model licensing arrangement introduces a dependency worth noting. HP IQ's base model is built on OpenAI's gpt-oss-20b. That is an unusual architecture choice — most on-device deployments use fully open-weight models like Llama or Mistral variants specifically to avoid licensing dependencies. HP presumably has an enterprise licensing agreement covering the commercial deployment of gpt-oss-20b on HP hardware, but the terms of that arrangement, including how model updates are handled as OpenAI improves the underlying model, are not yet public. Enterprise procurement teams will want clarity on this dependency before committing to HP IQ at scale.

The proximity-based multi-device features are compelling but require HP hardware throughout the environment to deliver their full value. Organizations with mixed-vendor hardware ecosystems — which is most large enterprises — will get a partial implementation of the vision HP is pitching at Imagine 2026.

Why This Moment Is Right for On-Device at Scale

The timing of HP IQ is not accidental. Several converging forces have made Spring 2026 the moment when a locally-running 20B parameter enterprise AI system becomes commercially viable rather than aspirational.

NPU silicon maturity is the enabling factor. The Qualcomm Snapdragon X Elite, Intel Core Ultra 200V, and AMD Ryzen AI 300 chips shipping in HP's current AI PC lineup represent the third generation of chips designed with AI workloads as a primary design consideration. Their NPU performance is roughly 10x what shipped in enterprise laptops two years ago, and the software toolchains for deploying quantized LLMs on these NPUs — ONNX Runtime, Qualcomm's AI Hub, Intel's OpenVINO — have matured to the point where hardware vendors can build reliable production deployments on top of them.

Model compression research has tracked the hardware curve almost exactly. Quantization techniques that reduce 20B parameter models to practical on-device sizes without significant quality degradation have advanced rapidly. The combination of GPTQ, AWQ, and more recent post-training quantization methods means that gpt-oss-20b deployed at 4-bit quantization on an HP AI PC can deliver inference quality that would have required a 7B or 8B parameter model at higher precision two years ago. The effective intelligence-per-watt of on-device inference has improved substantially faster than raw TOPS numbers alone suggest.

Enterprise AI anxiety has been building for three years without resolution. Cloud AI service providers have made significant commitments around enterprise data governance — Microsoft's data processing addendums, Google's Vertex AI enterprise agreements, Anthropic's API terms — but the fundamental architecture of cloud inference means that data leaves the enterprise network. As AI capability has expanded and enterprises have wanted to apply AI to more sensitive workloads, the governance gap has widened rather than narrowed. HP IQ addresses that gap structurally, not contractually.

What Comes Next

HP's roadmap from Spring 2026 early access to Fall 2026 full launch to broader 2H 2026 portfolio rollout covers roughly six months of market development — a compressed timeline for an enterprise software product of this complexity.

The early access phase will be critical for establishing whether HP IQ's local inference performance is fast enough for the use cases HP is marketing. Document summarization, meeting intelligence, and device diagnostics are all async workflows where a few seconds of inference latency is invisible to the user. If HP pushes HP IQ toward real-time use cases — a conversational assistant, a code completion tool, a live document editor — the performance characteristics of local inference at 20B parameters will become much more visible.

The model update question will surface quickly. OpenAI improves its models continuously, and enterprises will want HP IQ's base model to improve over time as well. Whether HP ships model updates through standard Windows Update channels, through HP's own device management platform, or through some other mechanism — and how frequently those updates ship — will determine whether HP IQ maintains a competitive capability level relative to cloud alternatives that update transparently and continuously.

The broader HP portfolio rollout in 2H 2026 will test whether the HP AI PC hardware base is large enough to make HP IQ a meaningful market position rather than a premium differentiator for a small device cohort. HP ships tens of millions of PCs annually. If HP IQ reaches even a fraction of that installed base with strong enterprise uptake, the scale of locally-deployed AI inference it represents would be significant — both for HP's competitive position and for the broader industry question of whether the cloud remains the default inference venue for enterprise AI.

The announcement at Imagine 2026 is HP's clearest statement yet that it believes the answer to that question is no.

FAQ

What is HP IQ and how does it differ from existing AI assistants on PCs?

HP IQ is a workplace intelligence layer built into HP devices that runs a 20 billion parameter language model locally on the device, rather than routing queries to cloud servers. Unlike Microsoft's Copilot+ features, which rely primarily on Azure OpenAI Service for language model inference, HP IQ performs its core AI tasks on the device's neural processing unit using a locally-stored model based on OpenAI's gpt-oss-20b. The result is that sensitive work data — documents, meeting notes, code — can be processed by AI without leaving the machine.

Which devices will support HP IQ?

Early access begins Spring 2026 on select HP AI PCs equipped with Qualcomm Snapdragon X Elite, Intel Core Ultra 200V series, or AMD Ryzen AI 300 series processors with integrated NPUs. Full shipping is targeted for Fall 2026, with a broader rollout across the HP portfolio in the second half of 2026. Older HP hardware without dedicated NPU silicon will not support the full HP IQ feature set.

Why is a 20B parameter model significant for on-device AI?

Most on-device AI deployments to date have used models in the 1B-8B parameter range, with Apple Intelligence's primary on-device model estimated at approximately 3B parameters. HP IQ's 20B parameter local model is substantially more capable at complex reasoning, document analysis, and multi-step task execution. The tradeoff is higher hardware requirements — the model requires a capable NPU and sufficient system memory, which is why it is limited to HP's AI PC hardware tier.

How does HP IQ handle privacy and enterprise data governance?

HP IQ is designed with local-by-default inference, meaning AI processing happens on the device rather than in the cloud. For enterprise deployments, IT administrators can configure whether and when the system is permitted to escalate queries to cloud services, what data types can leave the device, and how the model integrates with existing data loss prevention and endpoint management infrastructure. For regulated industries operating under HIPAA, FINRA, or similar data governance requirements, this architecture makes a broad category of AI-assisted tasks compliant by default.

What is the relationship between HP IQ and OpenAI?

HP IQ's base model is built on gpt-oss-20b, a model from OpenAI's open-source-style model series that HP has licensed for commercial deployment on HP hardware. HP has optimized the model for on-device inference using quantization techniques that reduce its memory footprint while preserving most of its capability. The specific commercial terms of HP's licensing arrangement with OpenAI for enterprise deployment of this model have not been publicly disclosed.

Let's Build Something Together

HP IQ puts a 20 billion parameter AI model on your laptop — no cloud required

Weekly Newsletter

Weekly Newsletter

What HP IQ Actually Is

The Privacy Architecture

On-Device at 20 Billion Parameters: The Technical Bet

Proximity-Based Multi-Device Connectivity

The Competitive Landscape

What Enterprise IT Should Know

Why This Moment Is Right for On-Device at Scale

What Comes Next

FAQ

→ Related Links

→ Related Posts

Harvey AI raises $200 million at $11 billion to bring AI to every law firm

67% of Fortune 500 companies now run AI agents in production — the enterprise inflection point

Dapr Agents v1.0 goes GA: the cloud-native framework that makes AI agents production-ready