💾 Archived View for thebird.nl › blog › 2023 › zig-matrix-multiplication.gmi captured on 2023-03-20 at 17:40:28. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Zig matrix multiplication

In this page we are going to explore matrix multiplication a little with Zig. In

zig-pointers-and-c.gmi

we explored building a matrix retrieved from lmdb through a C ABI.

We are going to compute a float matrix by doing a multiplication of a byte matrix with its transpose. This is one way to create a kinship matrix from genotypes. Normally we could use the GSL and/or openblas are used to multiply. I have a hunch we can outperform those because of our special case. First of all, we compute a matrix for every chromosome of a species. So it is 20+ in 1 go. Also we are coming from byte-sized genotype data and, as the result is symmetric, we need half the computations and we can use as many threads. Also we can make use of SIMD support in Zig:

https://zig.news/michalz/fast-multi-platform-simd-math-library-in-zig-2adn

Multiply one matrix

Let's start the dumb way, computing one matrix starting from the immutable genotypes.

test "compute matrix with its transpose" {
    const g = [3][2]u8{
        [_]u8{ 1, 4 },
        [_]u8{ 2, 5 },
        [_]u8{ 3, 6 },
    };
    assert(g[0][0]==1);
    assert(g[1][0]==2);
    var k = [2][2]f32{
        [_]f32{ 0,0 },
        [_]f32{ 0,0 },
    };
    var k2 = try multiply_matrix_with_transpose(g, &k);
    var check = [2][2]f32{
        [_]f32{ 14,32 },
        [_]f32{ 32,77 },
    };
    assert(k2[1][0]==check[1][0]);
    assert(k2[1][0]==32);
    assert(k[1][0]==k[0][1]);
    assert(k[1][1]==77);

First implementation

fn multiply_matrix_with_transpose(g: anytype, k: anytype) anyerror!@TypeOf(k.*) {
    var j: usize = 0;
    const inds = k.len;
    const nmarkers = g.len;
    // walk the kinship matrix
    while (j < inds) : (j += 1) {
        var i: usize = 0;
        while (i < inds) : (i += 1) {
            var compute_point: f32 = 0;
            if (i >= j) {
                // since k is symmetric we only compute once
                var x: usize = 0;
                while (x < nmarkers) : (x +=1 ) {
                    const f1 : f32 = @intToFloat(f32,g[x][i]);
                    const f2 : f32 = @intToFloat(f32,g[x][j]);
                    compute_point += f1*f2;
                }
                k.*[j][i] = compute_point;
                k.*[i][j] = compute_point;
            }
        }
    }
    return k.*;
}

Compiling

zig build -Drelease-fast=true && zig test mgamma.zig && time ./zig-out/bin/mgamma
Test [0/2] test "lmdb-setup"... Test [1/2] test "compute matrix with its transpose"... All 2 tests passed.
mgamma loading test/data/bxd...

	lmdb.struct_MDB_txn

nrows,ncols = 7321,236
top left value is 0, bottom right is 2
top left value is 0.0e+00, bottom right is 0.0e+00
top left value is 3.126e+03, top right is 4.528e+03, bottom left is 4.528e+03, bottom right is 2.0119e+04

real	0m0.347s
user	0m0.347s
sys	0m0.000s

Not bad for a 236 inds x 7321 markers dataset.

There are two small optimization here. First the indexing is done once for each 'compute_point'. Also, since the result is symmetric we compute half the points. There are more optimations possible. It will be particularly interesting to use SIMD instructions on vector g[x]. We'll keep that for later, because the real gains will be in IO for 20+ leave one out (LOCO) chromosomes computed together.