Skip to content

Mandy, An FPGA-Based Mandelbrot Engine (Initial Thoughts)

Published: at 08:48 PM (5 min read)

Table of Contents

Open Table of Contents

Introduction

I’ve finally been able to find some time to try my hand at something I’ve wanted to build for a while: a real-time Mandelbrot renderer running entirely on an FPGA. Instead of floating-point math and shaders, this hardware perspective would allow me to build everything myself: fixed-point multipliers, squarers, escape-time logic, pixel-to-complex mapping, and eventually a VGA/HDMI output pipeline.

It’s basically turning a mathematical toy into a tiny hardware graphics engine. There are tons of software Mandelbrot explorers already (like https://math.hws.edu/eck/js/mandelbrot/MB.html), and a few FPGA attempts scattered across GitHub, but most of them are either super minimal proofs-of-concept or extremely optimized designs with almost no explanation. I wanted something in between — a clean, modular, fully understandable Mandelbrot engine that I can tweak, analyze, and eventually integrate into my larger graphics projects.

This first post focuses on the idea and sets up the groundwork:

Later posts will get into VGA/HDMI timings, fixed-point decisions, pipelining strategies, and my full Mandelbrot core.

The Idea Behind the Mandelbrot Engine

Before touching Verilog, it’s worth understanding what the FPGA actually needs to compute. The Mandelbrot set is defined by a very simple iteration:

zn+1=zn2+c,z0=0z_{n+1} = z_n^2 + c,\quad z_0 = 0

Here, each pixel on the screen corresponds to a complex value cc. To determine its color, you iterate the formula and check whether the value “escapes.”

Breaking it into real and imaginary parts

Let zn=xn+iynz_n = x_n + i y_n and c=a+ibc = a + i b.
Expanding:

zn+1=(xn2yn2+a)+i(2xnyn+b)z_{n+1} = (x_n^2 - y_n^2 + a) + i(2 x_n y_n + b)

So the FPGA no longer sees complex numbers — it just implements these two scalar recurrences:

xn+1=xn2yn2+ax_{n+1} = x_n^2 - y_n^2 + a yn+1=2xnyn+by_{n+1} = 2 x_n y_n + b

The escape condition becomes:

xn2+yn2>4x_n^2 + y_n^2 > 4

That’s it.
All Mandelbrot rendering boils down to: multiplying, squaring, adding, and checking a threshold.

Why Fixed-Point (and not floating point)

Most small FPGA’s don’t have built in floating point hardware. So instead of floats, everything is stored in a fixed-point format like Q3.13:

This keeps the multipliers simple and deterministic, and when you multiply 2 Q-format numbers:

Mapping screen pixels to the complex plane

For a screen coordinate (px,py)(p_x, p_y), the FPGA computes:

a=xmin+pxW(xmaxxmin)a = x_\text{min} + \frac{p_x}{W}(x_\text{max} - x_\text{min}) b=ymin+pyH(ymaxymin)b = y_\text{min} + \frac{p_y}{H}(y_\text{max} - y_\text{min})

where:

All of these constants are stored as fixed-point values too.

Choosing these ranges matters since they determine how many integer bits the fixed-point format needs, which affects multiplier widths, DSP usage, and timing.

Design Choices: FP format, Pipeline Layout, and Hardware

Picking a Fixed-Point Format

Every value in the Mandelbrot engine must fit inside a signed fixed-point format. However, the key question is “How many integer bits vs. how many fractional bits”..

Since the mandelbrot set mostly lives inside:

So we need enough integer range to represent about ±3\pm 3. That suggests 2–3 integer bits are enough.

Fractional precision determines image clarity and how “smooth” the boundaries appear in deep zooms.

FormatTotal BitsRangeFraction ResolutionNotes
Q3.1316-bit~±42132^{-13}Great balance, fits small FPGAs
Q4.1216-bit~±82122^{-12}Slightly more headroom
Q2.1416-bit±22142^{-14}Higher precision, narrow range
Q5.1116-bit~±162112^{-11}For extreme zooms or wide windows

I figured Q3.13 provides a good range for the standard Mandelbrot view, high enough precisions for zooming capabilities, and multiplies the fit nicely into the 16x16 -> 32 bit DSP operations.

After each multiplication, we can simply bitshift right by 13 bits to restore scaling.

Overflow and Saturation Strats

From playing around with various simulators and other implementations online, I learned multipliers will overflow internally if not controlled, and determined the following solutions:

  1. Wrap-around (default)

    • Cheaper in logic
    • Works fine for Mandelbrot because escape decisions happen long before wrap-around causes issues
  2. Saturation

    • Clamps outputs on overflow
    • Uses more LUTs, not usually needed

Most HDL Mandelbrot cores simply use the wrap-around method since the escape radius (4.0) is small and deterministic.

Tang Nano 9k

I had a Tang Nano 9k on hand (from Alibaba, these are around $10!), but I figured it’d be pretty good for my uses:

It has

Iteration Pipeline Architecture

Each pixel needs multiple multiplications and additions per iteration. A naïve design would take many cycles per iteration, but I plan to pipeline…

A typical pipeline stage diagram:

Rendering Strategies: Iteration-Budget per Pixel

I have 2 main design strategies:

  1. Per-pixel iteration engine (simple)

Pro: simplest to code
Con: cannot reach real-time frame rates unless iteration count is low (<64)

  1. Multiple pixels in flight (pipelined engine)

Pro: high throughput
Con: harder finite-state machine & memory flow (but we got it eyy)

I want to be able to output at 640x480 at 60Hz (which is pretty standard). Standard VGA timing uses a pixel clock about

fpixel25.175 MHzf_\text{pixel} \approx 25.175\ \text{MHz}