Skip to content

FPGA-Based Mandelbrot Engine

Published: at 08:48 PM (5 min read)

Recently I made a lightweight HDMI Driver implemented on the Gowin Tang Nano 9k. I’m working on making a Mandelbrot (and a more general fractal) engine. I found this idea on Bruce Land’s Cornell ECE 5760 Lab Site and thought it would be fun to make.

Table of Contents

Open Table of Contents

Limitations

Since I am running everything on the Tang Nano and not the standard DE-10 Nano board, I’m running into some fun challenges.

Specifically, the Mandelbrot math requires calculating Z2=(Re+Imi)2Z^2 = (Re+Im \cdot i)^2. In hardware, this expansion (Re2Im2)+(2ReIm)i(Re^2-Im^2)+(2 \cdot Re \cdot Im)i requires 3 multipliers per iteration:

The fundamental limitations are that

Baby’s First Steps

Just to illustrate the output, I wrote mandelbrot.v, which is a 1st pass of a piplined (spatially unrolled) mandelbrot solver.

module mandelbrot #(
    parameter MAX_ITER = 5,   // Limit to 6 to fit in Tang Nano 9K DSPs (20 total)
    parameter WIDTH = 10,
    parameter HEIGHT = 10
)(
    input wire clk,
    input wire rst_n,
    input wire [WIDTH-1:0] x,
    input wire [HEIGHT-1:0] y,
    input wire active,
    output reg [7:0] color
);

    // Q3.14 Format: 1 sign, 3 integer, 14 fraction.
    // 1.0 = 16384
    reg signed [17:0] c_re_init, c_im_init;
    reg active_init;

    // X Scale: (3.0 / 640) * 16384 = 76.8  -> Use 77
    // Y Scale: (2.4 / 480) * 16384 = 81.92 -> Use 82
    // X Offset: -2.0 * 16384 = -32768
    // Y Offset: -1.2 * 16384 = -19661
    always @(posedge clk) begin
        c_re_init <= (x * 18'sd77) - 18'sd32768;
        c_im_init <= (y * 18'sd82) - 18'sd19661;
        active_init <= active;
    end

    reg signed [17:0] z_re [0:MAX_ITER];
    reg signed [17:0] z_im [0:MAX_ITER];
    reg signed [17:0] c_re [0:MAX_ITER];
    reg signed [17:0] c_im [0:MAX_ITER];
    reg [4:0] iter_count [0:MAX_ITER];
    reg valid [0:MAX_ITER];


    always @(posedge clk or negedge rst_n) begin
        if(!rst_n) begin
            z_re[0] <= 0; z_im[0] <= 0;
            c_re[0] <= 0; c_im[0] <= 0;
            iter_count[0] <= 0; valid[0] <= 0;
        end else begin
            z_re[0] <= 0;
            z_im[0] <= 0;
            c_re[0] <= c_re_init;
            c_im[0] <= c_im_init;
            iter_count[0] <= 0;
            valid[0] <= active_init;
        end
    end


    genvar i;
    generate
        for (i = 0; i < MAX_ITER; i = i + 1) begin : stage
            // 18-bit * 18-bit = 36-bit Result (Q6.28 Format)
            wire signed [35:0] z_re_sq = z_re[i] * z_re[i];
            wire signed [35:0] z_im_sq = z_im[i] * z_im[i];
            wire signed [35:0] z_mix   = z_re[i] * z_im[i];

            // Divergence Check: (Re^2 + Im^2) > 4.0
            // 4.0 in Q6.28 format is (4 * 2^28) = 1,073,741,824
            wire divergence = (z_re_sq + z_im_sq) >= 36'sd1073741824;

            always @(posedge clk or negedge rst_n) begin
                if (!rst_n) begin
                    z_re[i+1] <= 0; z_im[i+1] <= 0;
                    c_re[i+1] <= 0; c_im[i+1] <= 0;
                    iter_count[i+1] <= 0; valid[i+1] <= 0;
                end else begin
                    // Pass C and Valid down the pipe
                    c_re[i+1] <= c_re[i];
                    c_im[i+1] <= c_im[i];
                    valid[i+1] <= valid[i];

                    if (!divergence) begin
                        // Z_next = Z^2 + C
                        // Shift Q6.28 down to Q3.14 (>>> 14)
                        z_re[i+1] <= ((z_re_sq - z_im_sq) >>> 14) + c_re[i];
                        z_im[i+1] <= ((z_mix <<< 1)       >>> 14) + c_im[i];
                        iter_count[i+1] <= iter_count[i] + 1;
                    end else begin
                        // Lock values if diverged
                        z_re[i+1] <= z_re[i];
                        z_im[i+1] <= z_im[i];
                        iter_count[i+1] <= iter_count[i];
                    end
                end
            end
        end
    endgenerate

    always @(posedge clk) begin
        // Output result
        color <= valid[MAX_ITER] ? {iter_count[MAX_ITER], 3'b000} : 8'h00;
    end

endmodule

Coordinate Mapping

c_re_init <= (x * 18'sd77) - 18'sd32768; This section converts my screen coordinates (from 0 to 640) into the complex number plane (roughly -2.0 to 1.0). It uses fixed point math (Q3.14), where 18’sd77 is the scaling factor (zoom level), and 18’sd32768 is my offset (panning to center of the image).

Hardware Unrolling generate block

This is the part that currently kills my resources.

genvar i;
generate
    for (i = 0; i < MAX_ITER; i = i + 1) begin : stage
        // ... instantiates hardware ...
    end
endgenerate

This tells the synthesizer to copy and paste the logic inside this block MAX_ITER times. If MAX_ITER is 5, it creates Stages 0 to Stage 4. Each stage requires 3 multipliers, so the totla cost is 15 multipliers. Tang Nano 9k has 20 multipliers, so my upper limit is currently MAX_ITER = 6.

Math Engine

Inside each stage, I perform the standard Mandelbrot operation Zn+1=Zn2+CZ_{n+1} = Z_{n}^2 + C. I also bit shift >>> 14 to normalize the fixed-point numbers after multiplication, since Q3.14 * Q3.14 is Q6.28.

Output

Comments