💾 Archived View for thebird.nl › blog › 2023 › zig-matrix-multiplication.gmi captured on 2023-03-20 at 17:40:28. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
In this page we are going to explore matrix multiplication a little with Zig. In
we explored building a matrix retrieved from lmdb through a C ABI.
We are going to compute a float matrix by doing a multiplication of a byte matrix with its transpose. This is one way to create a kinship matrix from genotypes. Normally we could use the GSL and/or openblas are used to multiply. I have a hunch we can outperform those because of our special case. First of all, we compute a matrix for every chromosome of a species. So it is 20+ in 1 go. Also we are coming from byte-sized genotype data and, as the result is symmetric, we need half the computations and we can use as many threads. Also we can make use of SIMD support in Zig:
https://zig.news/michalz/fast-multi-platform-simd-math-library-in-zig-2adn
Let's start the dumb way, computing one matrix starting from the immutable genotypes.
test "compute matrix with its transpose" { const g = [3][2]u8{ [_]u8{ 1, 4 }, [_]u8{ 2, 5 }, [_]u8{ 3, 6 }, }; assert(g[0][0]==1); assert(g[1][0]==2); var k = [2][2]f32{ [_]f32{ 0,0 }, [_]f32{ 0,0 }, }; var k2 = try multiply_matrix_with_transpose(g, &k); var check = [2][2]f32{ [_]f32{ 14,32 }, [_]f32{ 32,77 }, }; assert(k2[1][0]==check[1][0]); assert(k2[1][0]==32); assert(k[1][0]==k[0][1]); assert(k[1][1]==77);
First implementation
fn multiply_matrix_with_transpose(g: anytype, k: anytype) anyerror!@TypeOf(k.*) { var j: usize = 0; const inds = k.len; const nmarkers = g.len; // walk the kinship matrix while (j < inds) : (j += 1) { var i: usize = 0; while (i < inds) : (i += 1) { var compute_point: f32 = 0; if (i >= j) { // since k is symmetric we only compute once var x: usize = 0; while (x < nmarkers) : (x +=1 ) { const f1 : f32 = @intToFloat(f32,g[x][i]); const f2 : f32 = @intToFloat(f32,g[x][j]); compute_point += f1*f2; } k.*[j][i] = compute_point; k.*[i][j] = compute_point; } } } return k.*; }
Compiling
zig build -Drelease-fast=true && zig test mgamma.zig && time ./zig-out/bin/mgamma Test [0/2] test "lmdb-setup"... Test [1/2] test "compute matrix with its transpose"... All 2 tests passed. mgamma loading test/data/bxd...
Not bad for a 236 inds x 7321 markers dataset.
There are two small optimization here. First the indexing is done once for each 'compute_point'. Also, since the result is symmetric we compute half the points. There are more optimations possible. It will be particularly interesting to use SIMD instructions on vector g[x]. We'll keep that for later, because the real gains will be in IO for 20+ leave one out (LOCO) chromosomes computed together.