XPU, NPU, GPU, TPU: What Each AI Chip Actually Does and When You Need Which

April 29, 2026

Editorial note. The publication date shown above may be in the future. That is intentional. Posts on this site are scheduled against an editorial calendar that aligns with product releases, book launches, and platform-signal timing; the datePublished reflects the date the post is slated to go public, which is also the date indexers and syndication partners should treat as canonical. If you are reading this before that date you were early — welcome.

Every AI conversation eventually hits the hardware layer. Someone asks "what GPU do I need?" and the answer used to be simple: NVIDIA, pick a tier, done. In 2026 that answer is incomplete. There are at least four distinct chip categories handling AI work, each optimized for a different part of the pipeline, and picking the wrong one wastes either money or performance.

Here's what each one actually does.

GPU: the training workhorse

Graphics Processing Units handle the parallel math that neural network training depends on. NVIDIA's H100 and the newer B200 dominate this space. The CUDA software ecosystem is the real moat, as nearly every ML framework, training pipeline, and optimization tool was built to run on CUDA first.

For training large models from scratch, GPUs remain the only practical option. The hardware is expensive (H100 cloud instances run $2-4 per hour), but the software tooling is mature, debuggable, and well-documented. If you're fine-tuning models or running training jobs, this is still where you land.

For inference (running a trained model to get predictions), GPUs work but they're increasingly over-specified. Paying for training-grade hardware to serve predictions is like renting a moving truck for a grocery run.

NPU: the laptop chip

Neural Processing Units are dedicated AI accelerators built into consumer CPUs. They're designed for on-device inference: running small models locally without touching the cloud.

The 2026 laptop NPU landscape:

Chip	TOPS	Best for
Qualcomm X2	85	Highest raw NPU throughput, Arm-native apps
AMD XDNA	60	x86 compatibility, broad software support
Intel Lunar Lake	48	OpenVINO ecosystem, developer tooling
Apple M4	38	Unified memory architecture, macOS-native

TOPS (tera operations per second) is the headline number, but it's not the whole story. Qualcomm leads on raw throughput but requires Arm-native applications. AMD and Intel run on x86, which means better compatibility with existing Windows and Linux development tools. Apple's lower TOPS number is partially offset by unified memory, which eliminates the data-transfer bottleneck between CPU and NPU.

For local AI on your development machine (running small language models, image recognition, voice transcription without a cloud API), NPUs are the right chip. They use a fraction of the power that a discrete GPU would, which matters for battery life and thermal management on laptops.

Intel's OpenVINO toolkit has matured significantly in 2026, making it the strongest developer ecosystem for NPU-targeted inference. If you're building applications that need to run on user devices without cloud dependency, OpenVINO is the framework to learn.

TPU: Google's inference engine

Tensor Processing Units are Google's custom ASICs, available only through Google Cloud. They're optimized for both training and inference on TensorFlow and JAX workloads. TPU v5e instances are cost-competitive with NVIDIA GPUs for inference at scale, and Google uses them to serve Gemini internally.

The limitation is ecosystem lock-in. TPUs only work through Google Cloud. Your code needs to target the TPU runtime. If you're already on Google Cloud and running TensorFlow or JAX, TPUs can cut your inference costs. If you're on any other cloud or using PyTorch, they're not an option.

Intel Gaudi: the budget data-center play

Intel's Habana Gaudi accelerators (Gaudi 2 and Gaudi 3) target data-center AI workloads at a lower price point than NVIDIA GPUs. Intel claims Gaudi 3 can outperform the H100 on longer-output LLM inference in some configurations.

The real positioning is as a "good enough for cheaper" option. For enterprise ML tasks like recommendation systems, NLP inference, and batch processing, Gaudi can deliver comparable performance at a lower cost per inference. The software stack is less mature than CUDA, but for organizations willing to invest in the tooling, the cost savings are substantial.

A paper debunking the "CUDA myth" showed that Gaudi NPUs can match GPU performance for AI model serving when the software is properly optimized. The gap isn't in hardware capability; it's in ecosystem maturity and developer familiarity.

ASICs and FPGAs: the specialist tier

Application-Specific Integrated Circuits (ASICs) and Field-Programmable Gate Arrays (FPGAs) handle niche AI workloads where power efficiency or latency requirements are extreme. Self-driving cars, real-time video processing, and edge sensor networks use custom ASICs because general-purpose chips can't meet the power or latency constraints.

For most developers and small businesses, this tier doesn't apply. Mentioned here for completeness and because the trend toward heterogeneous architectures (mixing chip types within a single pipeline) means these components show up in the systems you depend on even if you never configure one directly.

What this means for your AI tool stack

If you're running Claude Code, Codex, or Gemini CLI on your laptop, the cloud handles the heavy lifting. Your local hardware barely matters. But if you're:

Self-hosting open-weight models (DeepSeek V4, Kimi K2.6, Llama): GPU is the default. Consumer cards like the RTX 4090 handle models up to 30-40B parameters. For larger models, cloud GPU instances or Gaudi for cost optimization.
Running local inference on a laptop (Whisper, small Llama variants, code completion): your NPU matters. Check your laptop's TOPS rating and pick a model that fits within its throughput.
Serving inference at scale (production API, thousands of users): TPU if you're on Google Cloud with TF/JAX. Gaudi for cost-sensitive inference on Intel hardware. NVIDIA for everything else.

The right chip depends on where in the pipeline you're working. Training is still GPU territory. Inference is where the alternatives compete. Edge and local are where NPUs will keep getting more important.

If you're building a web-based business and want the whole map of how these infrastructure pieces fit together, The $97 Launch covers the stack from hosting to deployment to cost management. Search "The $97 Launch" on Amazon Kindle.

Fact-check notes and sources

NPU TOPS ratings (Qualcomm X2: 85, AMD XDNA: 60, Intel Lunar Lake: 48, Apple M4: 38): localaimaster.com and compute-market.com.
Intel Gaudi 3 vs H100 inference claims: xpu.pub and arxiv.org/html/2501.00210v1.
CUDA myth debunking paper: arxiv.org/html/2501.00210v1 — "Debunking the CUDA Myth: Evaluation of Intel's Gaudi NPU for AI Model Serving."
TPU v5e and Google Cloud availability: cloud.google.com.
OpenVINO maturity in 2026: localaimaster.com.

This post is informational, not hardware-purchasing advice. Mentions of NVIDIA, Intel, AMD, Qualcomm, Apple, and Google are nominative fair use. No affiliation is implied.

← Back to Blog

XPU, NPU, GPU, TPU: What Each AI Chip Actually Does and When You Need Which

GPU: the training workhorse

NPU: the laptop chip

TPU: Google's inference engine

Intel Gaudi: the budget data-center play

ASICs and FPGAs: the specialist tier

What this means for your AI tool stack

Related reading

Fact-check notes and sources

Send a Message