CPU: General-purpose computing design, with a few powerful cores, excels at complex logic, branch decisions, and system-level tasks. It has deep cache and off-chip DRAM (main memory), suitable for operating systems, databases, and similar tasks, but is less efficient for the repetitive matrix multiplications required by neural networks.
GPU: Not just a few powerful cores, but thousands of smaller cores executing the same instructions simultaneously (SIMD). This high level of parallelism perfectly matches the mathematical operations of neural networks, making GPUs dominant in AI training.
TPU (Google-designed): Further specialization. The core is a grid of multiply-accumulate (MAC) units, with data flowing in a “wave” pattern—weights enter from one side, activation values from the other, and results are directly propagated without each time writing back to memory. The entire execution is controlled by a compiler (not hardware scheduling), optimized specifically for neural network workloads.
NPU (Neural Processing Unit): Edge device optimized version. Built-in Neural Compute Engine (large MAC arrays + on-chip SRAM), but uses low-power system memory instead of high-bandwidth HBM. The goal is to run inference at single-digit wattage in scenarios like smartphones, wearables, and IoT devices (Apple Neural Engine, Intel NPU fall into this category).
LPU (Language Processing Unit, introduced by Groq): The newest member. Completely removes off-chip memory, with all weights stored in on-chip SRAM. Execution is fully deterministic, scheduled by a compiler, with no cache misses or runtime scheduling overhead. The trade-off is limited on-chip memory, requiring hundreds of chips interconnected to serve large models, but latency advantages are very significant.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
280.44K Popularity
#
比特币Breaks79K
11.68M Popularity
#
CryptoMarketsRiseBroadly
88.68K Popularity
#
WHCADinnerShootingIncident
15.02K Popularity
#
IranProposesHormuzStraitReopeningTerms
285.34K Popularity

Sitemap

Today’s AI is dominated by five hardware architectures, each making different trade-offs between flexibility, parallelism, and memory access.

Trending Topics

WCTCTradingKingPK

比特币Breaks79K

CryptoMarketsRiseBroadly

WHCADinnerShootingIncident

IranProposesHormuzStraitReopeningTerms

Pin