💾 Archived View for thebird.nl › blog › 2021 › 07 › gemma-on-hpc-part1.gmi captured on 2023-03-20 at 17:41:28. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Running GEMMA on a cluster

by Pjotr Prins - 2021-07-03

Gemini

My first gemini-style blog post! Gemini is focussed on writing text. Perfect for a blog.

My gemini (Pjotr Prins)

See gemini markdown

Octopus & GEMMA

Last fall we installed the Octopus cluster at UTHSC. A small cluster we own and that can run GNU Guix and anything we want.

Octopus HPC powered by Debian an GNU Guix

GeneNetwork itself, meanwhile, runs GEMMA on a single multi-core server. Performance is decent enough, but we want to go faster and with Octopus we should be able to leverage those 264 cores - particularly for split computations, such as LOCO and permutations. The plan is to offer a GN3 REST endpoint for anyone to use. Later when we have a million RISC-V cores we can do something amazing. More on that soon.

Describe generic execution engine (sheepdog)

For some time we have been discussing remote execution mechanisms - working name 'sheepdog'. Key properties are:

Initiate remote connection with ssh (may change)
Invoke remote job using Guix deployment - to warrant control of the software stack
Remote and local jobs can run at the same time
Different job control mechanisms can be used (simple, slurm, GNU parallel)
Output of jobs is captured and fed back as it arrives to the server over message passing
Data can be fetched over http(s)

Firewalling is often restrictive. For above setup we need

The remote server should be accessible over ssh
The remote server should be able to reach the message server (redis)
The remote server should be able to fetch http(s)

With stunnel message passing (we'll use redis) can be forwarded over https. And if we use polling for job management we can even reduce the requirements to

The remote server should be able to fetch http(s)

Provided we can set up an stunnel (requires root on the remote machine - perhaps in a container). For setups that don't offer Guix it will be hard to support reproducible runs. Restricted environments must be for other purposes than reproducible research. For more information see my writeup:

Creating a reproducible workflow

Initiate remote connection with ssh

The first step is to connect to a remote server using ssh. In the future we may allow for setups that don't use ssh, using a polling/drop-style technique, but at this point we assume that key-based ssh'ing into a server, jump host or container is generally possible.

Invoke remote job using Guix deployment - to warrant control of the software stack

Once a connection is established we can check for the Guix deployment on the remote machine and make sure it is aligned. Multiple possibilities exist, including sending Guix 'closures', but for now we'll keep it simple:

Check if the correct environment exists - using Guix hash values
Fire up the (remote) handler in the environment

Different job control mechanisms can be used (simple, slurm, GNU parallel)

Note that ssh simply acts as a remote shell. This behaviour can be mirrored on a local machine using a standard shell. We will need some abstraction for simple runs, slurm runs and other tools. Say we want to run GEMMA with LOCO - essentially run GEMMA multiple times. That can be captured as one remote execution command for sheepdog with a serial and a parallel component. Note that if workflows get any fancier than this we should really consider using CWL instead. CWL we'll also invoke with sheepdog, but that is later.

In the case of LOCO we'll write a script that sets up the environment

Copy input files to remote
Computes GRMs for LOCO
Computes GWA for LOCO
Returns results

Initially we'll use gemma-wrapper which also deals with caching results. Gemma-wrapper already has minimal slurm support.

Gemma-wrapper

Gemma-wrapper slurm support

Output of jobs is captured and fed back as it arrives to the server over message passing

While a job is running (say with slurm) output for stdout and stderr gets written to a file and can be fed back to a message server (in our case redis). This will allow showing progress in the browser. Also, when there are problems, the output can be used for troubleshooting.

File formats

And, while we are at it, we should think of using gzipped files by default - which speeds transfers up - and we should use R/qtl2 file formats which are better at identifying issues. GEMMA is not great at reporting errors, but we can run my new gemma2lib tools for reporting data problems after GEMMA fails.

R/qtl2 input formats

Data transfer can be handled over http(s) and IPFS

The first version simply copies files. That is unecessary if files have been copied before. Provided data is on IPFS we can check the (pre-computed) Hash value and only copy locally when it does not exist. IPFS can do that automatically, but requires special ports open which tend to be closed on our firewalls. Worth checking though: if we can use IPFS natively it takes care of caching automatically.

Opportunistic computations

Even though an HPC environment can accelerate computations it can still be slower than local compute, particularly for smaller jobs. Therefore we want a system which can fire up a job in different setups an the same time - and may the best one win. This is part of the design where we can mix different hardware setups and (simply) fire up across different setups.

Caching and hashes

See above on IPFS

Implement: Invoke gemma-wrapper locally

GN2 runs gemma-wrapper straight from the web-server. This is inconvenient because there are no progress updates and browsers/nginx can time out. We need to introduce something similar to flask-executor - though we'll roll our own based on a GN3 endpoint (below). We also have the opportunity to use the faster Kinship computation in my gemma2lib and to improve GEMMA parallel processing on a single machine. Even on a single host we should be able to speed up GEMMA for LOCO in this step.

Flask executor

Implemented parallel in gemma-wrapper locally

Implement: Invoke gemma-wrapper remotely on Octopus

In the next phase we'll introduce running GEMMA on Octopus with 264 cores. This will be particularly interesting for permutations to compute significance and for global pre-compute or 'GEMMA full vector output with polarity' as Rob coined it.

nyi

Implement: GEMMA permutations (a gemma-wrapper-wrapper!?)

Here we update running the permutations in a new tool. Key features is reporting as permutations are progressing and ability to run on slurm (see below).

In progress:

Implementing permutations

Implement: Slurm runner - part of sheepdog

nyi

Implement: GN3 endpoint

Next we'll create an endpoint that will allow invoking gemma-wrapper locally and remotely. This endpoint will collect stdout/stderr and give a mechanism for feeding data to the browser. We'll use that information to display CI-style progress in the browser which will be convenient for power users and trouble shooting.

nyi

Implement: GN2 invokes GN3 endpoint and shows progress bar

The GN2 web interface will be rejigged to make use of the GN3 endpoint using web-sockets.

nyi

Implement: Show output as it arrives (power users)

We will use above output from stderr/stdout to display it as it comes in using web-sockets.

nyi

Implement: Invoke local and remotes at the same time

Finally, as a coup-de-grace, we'll allow opportunistic use of resources and fire up computations on several hosts - may the fastest one win. The current idea is to complete all processes so they can reuse the cache later. Any dataset that gets run is likely to run again and disk space comes cheap these days.

nyi