Table of Contents
Open Table of Contents
Introduction
I’ve finally been able to find some time to try my hand at something I’ve wanted to build for a while: a real-time Mandelbrot renderer running entirely on an FPGA. Instead of floating-point math and shaders, this hardware perspective would allow me to build everything myself: fixed-point multipliers, squarers, escape-time logic, pixel-to-complex mapping, and eventually a VGA/HDMI output pipeline.
It’s basically turning a mathematical toy into a tiny hardware graphics engine. There are tons of software Mandelbrot explorers already (like https://math.hws.edu/eck/js/mandelbrot/MB.html), and a few FPGA attempts scattered across GitHub, but most of them are either super minimal proofs-of-concept or extremely optimized designs with almost no explanation. I wanted something in between — a clean, modular, fully understandable Mandelbrot engine that I can tweak, analyze, and eventually integrate into my larger graphics projects.
This first post focuses on the idea and sets up the groundwork:
- what the Mandelbrot set actually is
- how the escape-time algorithm works
- how to translate the math into fixed-point hardware
- possible hiccups I’m wary of running into
Later posts will get into VGA/HDMI timings, fixed-point decisions, pipelining strategies, and my full Mandelbrot core.
The Idea Behind the Mandelbrot Engine
Before touching Verilog, it’s worth understanding what the FPGA actually needs to compute. The Mandelbrot set is defined by a very simple iteration:
Here, each pixel on the screen corresponds to a complex value . To determine its color, you iterate the formula and check whether the value “escapes.”
Breaking it into real and imaginary parts
Let and .
Expanding:
So the FPGA no longer sees complex numbers — it just implements these two scalar recurrences:
The escape condition becomes:
That’s it.
All Mandelbrot rendering boils down to: multiplying, squaring, adding, and checking a threshold.
Why Fixed-Point (and not floating point)
Most small FPGA’s don’t have built in floating point hardware. So instead of floats, everything is stored in a fixed-point format like Q3.13:
- 1 sign bit
- 2 integer bits
- 13 fractional bits
This keeps the multipliers simple and deterministic, and when you multiply 2 Q-format numbers:
- the product grows in bit-width
- you shift right to restore the original fixed-point scale
- and pipeline stays synchronus
Mapping screen pixels to the complex plane
For a screen coordinate , the FPGA computes:
where:
- is the visible horizontal range
- is the visible vertical range
- are the screen resolution
All of these constants are stored as fixed-point values too.
Choosing these ranges matters since they determine how many integer bits the fixed-point format needs, which affects multiplier widths, DSP usage, and timing.
Design Choices: FP format, Pipeline Layout, and Hardware
Picking a Fixed-Point Format
Every value in the Mandelbrot engine must fit inside a signed fixed-point format. However, the key question is “How many integer bits vs. how many fractional bits”..
Since the mandelbrot set mostly lives inside:
So we need enough integer range to represent about . That suggests 2–3 integer bits are enough.
Fractional precision determines image clarity and how “smooth” the boundaries appear in deep zooms.
| Format | Total Bits | Range | Fraction Resolution | Notes |
|---|---|---|---|---|
| Q3.13 | 16-bit | ~±4 | Great balance, fits small FPGAs | |
| Q4.12 | 16-bit | ~±8 | Slightly more headroom | |
| Q2.14 | 16-bit | ±2 | Higher precision, narrow range | |
| Q5.11 | 16-bit | ~±16 | For extreme zooms or wide windows |
I figured Q3.13 provides a good range for the standard Mandelbrot view, high enough precisions for zooming capabilities, and multiplies the fit nicely into the 16x16 -> 32 bit DSP operations.
After each multiplication, we can simply bitshift right by 13 bits to restore scaling.
Overflow and Saturation Strats
From playing around with various simulators and other implementations online, I learned multipliers will overflow internally if not controlled, and determined the following solutions:
-
Wrap-around (default)
- Cheaper in logic
- Works fine for Mandelbrot because escape decisions happen long before wrap-around causes issues
-
Saturation
- Clamps outputs on overflow
- Uses more LUTs, not usually needed
Most HDL Mandelbrot cores simply use the wrap-around method since the escape radius (4.0) is small and deterministic.
Tang Nano 9k
I had a Tang Nano 9k on hand (from Alibaba, these are around $10!), but I figured it’d be pretty good for my uses:
It has
- 8 DSP blocks
- 9k LUTs
- a built-in HDMI transmitter (via DVI over PMOD), which has an internal serializer that lets me outpiut TMDS signals without needing a seperate chip!
- Runs around 25-40 MHz! I plan to use a 640x480 and a 60f
- Small enough that it forces me to think about pipelining a bit more (can’t just parallelize everything)
Iteration Pipeline Architecture
Each pixel needs multiple multiplications and additions per iteration. A naïve design would take many cycles per iteration, but I plan to pipeline…
A typical pipeline stage diagram:
Rendering Strategies: Iteration-Budget per Pixel
I have 2 main design strategies:
- Per-pixel iteration engine (simple)
- For each pixel, run the whole iteration loop
- Store the escape count
- Move to the next pixel
Pro: simplest to code
Con: cannot reach real-time frame rates unless iteration count is low (<64)
- Multiple pixels in flight (pipelined engine)
- A new pixel enters the pipeline every few cycles
- Earlier pixels continue iterating in deeper pipeline stages
Pro: high throughput
Con: harder finite-state machine & memory flow (but we got it eyy)
I want to be able to output at 640x480 at 60Hz (which is pretty standard). Standard VGA timing uses a pixel clock about