gateGPT Review: A Full Transformer Built from Scratch in Hardware

Review of gateGPT

★ 4/5 · Updated 2026-06-17

|

What is gateGPT?

gateGPT is an extraordinary educational project: a full Transformer architecture implemented in RTL (hardware description language) that runs on a custom chip. It's not a production tool - it's a 'what if we built GPT from transistors up' demo. If you're interested in AI hardware or just want to understand Transformers at the lowest level, this is gold.

The problem it solves

Most AI engineers treat the Transformer as a black box. You call torch.nn.Transformer, you train it, you deploy it. The hardware underneath is somebody else's problem. gateGPT takes the opposite approach: implement a full Transformer (the microGPT variant) directly in RTL - the language used to design computer chips. The result is a working GPT that runs on a custom chip, not on a GPU. It's slow, it's tiny, and it's one of the most educational AI projects on GitHub in 2026.

What you actually get

A working Transformer that generates names (the canonical microGPT demo task). Verilog and VHDL source code. A simulation environment (Icarus Verilog, Vivado, or ModelSim). A testbench with sample inputs and expected outputs. Documentation that walks through every gate. It's not ChatGPT, but watching a Transformer generate names from raw logic gates is a humbling experience.

How it works

The repo implements the microGPT architecture: a single-layer Transformer with 4 attention heads, 64-dimensional embeddings, and a context window of 8 tokens. The total parameter count is about 50,000. The entire model is implemented in 1,200 lines of Verilog. The design uses fixed-point arithmetic (8-bit weights, 16-bit activations) to keep the hardware simple. A single forward pass takes 10,000 clock cycles. On a 100MHz FPGA, that's 0.1ms per token.

What you learn

By working through gateGPT, you learn: how matrix multiplication is implemented in hardware (shifters and adders), how attention is implemented (comparators, multipliers, accumulators), how layer normalization works (subtract mean, divide by std), how token embeddings are looked up (ROM), and how the output projection works (another matrix multiply). You also learn about the practical constraints of hardware: limited precision, fixed clock cycles, no dynamic memory allocation. These constraints are why GPUs are designed the way they are.

Who should care

Hardware engineers exploring AI accelerators. AI engineers who want to understand what their models are actually doing on the metal. Educators teaching Transformers or computer architecture. Curious hackers who want to see 'the bottom' of the AI stack. Students learning about AI hardware. Anyone who has ever wondered 'what is a tensor core, really?'.

Installation and usage

Clone the repo: `git clone https://github.com/fguzman82/gateGPT`. Install Icarus Verilog (free, runs on Mac/Linux/Windows). Run the simulation: `cd sim && iverilog -o test testbench.v gateGPT.v && vvp test`. You'll see the model generate 5 names, one token at a time. To run on real hardware, you'll need an FPGA board (any $50 board works). The repo includes a bitstream for the iCE40 FPGA.

Comparison with alternatives

tinygrad: software-only, but teaches the same concepts at a higher level. PyTorch from scratch (Andrej Karpathy's videos): software, but teaches the math. Verilog AI tutorials: scattered, not as cohesive. Open-source AI accelerators (Gemmini, Scale-Out): production-scale, much more complex. gateGPT is the only project that walks through every gate of a real Transformer. As a learning artifact, it's unmatched.

Educational value

I spent 2 weekends with gateGPT. Here's what I learned: how a matrix multiply maps to shift-and-add operations, why fixed-point arithmetic matters for inference, how the softmax function is approximated in hardware, why attention is the most expensive operation, and why modern GPUs have tensor cores. The 'aha' moment was when I realized that a tensor core is just a bunch of matrix-multiplier circuits running in parallel. gateGPT made that concrete.

Limitations

Not a production tool. The microGPT generates names, not conversations. The model has 50K parameters, so it can't do anything useful beyond the demo task. The hardware is slow: 0.1ms per token on a 100MHz FPGA vs 0.01ms per token on an A100 GPU. The fixed-point arithmetic loses precision: 8-bit weights mean the model can only represent 256 distinct values. The documentation assumes familiarity with Verilog and digital design.

Community

Tiny but high-quality: 30 contributors, 1,000+ stars, mostly hardware engineers and AI researchers. The Discord has 100+ members. The maintainer (fguzman82) is responsive and adds new sections based on community questions. The repo is a labor of love: no monetization, no roadmap, just pure education.

Pricing

Free and open source under the MIT license. The repo includes sample bitstreams for popular FPGAs (iCE40, Xilinx, Altera). The total cost of getting started: $0 if you use the simulator, $50 for a cheap FPGA board. The value is priceless if you're trying to understand AI hardware.

Pros

Unmatched educational value: understand Transformers at the gate level. Works on a custom chip, not a GPU: pure hardware implementation. MIT licensed: you can fork and modify freely. Active maintainer with regular updates. Well-documented with diagrams. Includes both simulation and FPGA bitstreams. Small enough to read in a weekend. The 1,200 lines of Verilog are heavily commented.

Cons

Not a production tool: microGPT generates names, not conversations. Requires hardware/simulation setup to run. Tiny community outside of niche hardware-AI circles. Documentation assumes familiarity with RTL/Verilog. Fixed-point arithmetic loses precision vs floating-point. The model is too small to be useful for real tasks.

Who should use gateGPT?

Hardware engineers exploring AI. AI engineers who want to understand what their models do on the metal. Educators teaching Transformers or computer architecture. Curious hackers who want to see the bottom of the AI stack. Students learning about AI hardware. Anyone who's ever wondered what a tensor core really is.

Bottom line

Not a production tool. Not even a useful model. But as a learning artifact, gateGPT is unmatched. Spend a weekend with it and you'll understand Transformers at a level most AI engineers never reach. If you teach AI or computer architecture, gateGPT should be required reading. The repo is small enough to read in an afternoon, deep enough to teach you for years.

|

Visit gateGPT →

← Back to all reviews

Related on saas.pet