CryptoWorld News: Developer Luthira Abeykoon and Krish Chhajer have ported Karpathy's MicroGPT (only 4,192 parameters) to an FPGA using SystemVerilog, achieving a generation speed of over 50,000 tokens per second. The project Talos-V2 (Tensor Accelerated Logic for On-Chip Systems) has been open-sourced on GitHub and runs on the DE1-SOC Cyclone V educational-grade Intel FPGA. The weights are stored in on-chip ROM in the Q4.12 fixed-point format. In the model, the matrix-vector multiplication is implemented as a 16-channel systolic array (systolic array), including the Q/K/V projection, MLP, and LM

CoinNetwork

2026-05-03 02:06:11

CryptoWorld News: Developers Luthira Abeykoon and Krish Chhajer have ported Karpathy’s MicroGPT (only 4,192 parameters) into FPGA using SystemVerilog, achieving generation speeds of over 50k tokens per second. The project Talos-V2 (Tensor Accelerated Logic for On-Chip Systems) is open source on GitHub, running on a DE1-SOC Cyclone V educational-grade Intel FPGA, with weights stored in on-chip ROM in Q4.12 fixed-point format. The matrix-vector multiplication in the model is implemented as a 16-channel systolic array, with Q/K/V projections, MLP, and LM Head sharing this unit and running sequentially. The implementation of the attention mechanism needs to be split into eight steps. The authors state that the project aims to convert each step of Transformer inference into visual hardware: memory, counters, state machines, and lookup tables.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
532.82K Popularity
#
USSeeksStrategicBitcoinReserve
58.75M Popularity
#
BitcoinETFOptionLimitQuadruples
1.02M Popularity
#
#FedHoldsRateButDividesDeepen
42.34K Popularity
#
DeFiLossesTop600MInApril
10.19M Popularity

Sitemap

Implementing Transformer as a pure hardware circuit, achieving 50k tokens per second without using a GPU

Trending Topics

WCTCTradingKingPK

USSeeksStrategicBitcoinReserve

BitcoinETFOptionLimitQuadruples

#FedHoldsRateButDividesDeepen

DeFiLossesTop600MInApril

Pin