💾 Archived View for thebird.nl › blog › 2021 › 07 › gemma-on-hpc-part1.gmi captured on 2023-03-20 at 17:41:28. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-04-28)
-=-=-=-=-=-=-
by Pjotr Prins - 2021-07-03
My first gemini-style blog post! Gemini is focussed on writing text. Perfect for a blog.
Last fall we installed the Octopus cluster at UTHSC. A small cluster we own and that can run GNU Guix and anything we want.
Octopus HPC powered by Debian an GNU Guix
GeneNetwork itself, meanwhile, runs GEMMA on a single multi-core server. Performance is decent enough, but we want to go faster and with Octopus we should be able to leverage those 264 cores - particularly for split computations, such as LOCO and permutations. The plan is to offer a GN3 REST endpoint for anyone to use. Later when we have a million RISC-V cores we can do something amazing. More on that soon.
For some time we have been discussing remote execution mechanisms - working name 'sheepdog'. Key properties are:
Firewalling is often restrictive. For above setup we need
With stunnel message passing (we'll use redis) can be forwarded over https. And if we use polling for job management we can even reduce the requirements to
Provided we can set up an stunnel (requires root on the remote machine - perhaps in a container). For setups that don't offer Guix it will be hard to support reproducible runs. Restricted environments must be for other purposes than reproducible research. For more information see my writeup:
Creating a reproducible workflow
The first step is to connect to a remote server using ssh. In the future we may allow for setups that don't use ssh, using a polling/drop-style technique, but at this point we assume that key-based ssh'ing into a server, jump host or container is generally possible.
Once a connection is established we can check for the Guix deployment on the remote machine and make sure it is aligned. Multiple possibilities exist, including sending Guix 'closures', but for now we'll keep it simple:
Note that ssh simply acts as a remote shell. This behaviour can be mirrored on a local machine using a standard shell. We will need some abstraction for simple runs, slurm runs and other tools. Say we want to run GEMMA with LOCO - essentially run GEMMA multiple times. That can be captured as one remote execution command for sheepdog with a serial and a parallel component. Note that if workflows get any fancier than this we should really consider using CWL instead. CWL we'll also invoke with sheepdog, but that is later.
In the case of LOCO we'll write a script that sets up the environment
Initially we'll use gemma-wrapper which also deals with caching results. Gemma-wrapper already has minimal slurm support.
While a job is running (say with slurm) output for stdout and stderr gets written to a file and can be fed back to a message server (in our case redis). This will allow showing progress in the browser. Also, when there are problems, the output can be used for troubleshooting.
And, while we are at it, we should think of using gzipped files by default - which speeds transfers up - and we should use R/qtl2 file formats which are better at identifying issues. GEMMA is not great at reporting errors, but we can run my new gemma2lib tools for reporting data problems after GEMMA fails.
The first version simply copies files. That is unecessary if files have been copied before. Provided data is on IPFS we can check the (pre-computed) Hash value and only copy locally when it does not exist. IPFS can do that automatically, but requires special ports open which tend to be closed on our firewalls. Worth checking though: if we can use IPFS natively it takes care of caching automatically.
Even though an HPC environment can accelerate computations it can still be slower than local compute, particularly for smaller jobs. Therefore we want a system which can fire up a job in different setups an the same time - and may the best one win. This is part of the design where we can mix different hardware setups and (simply) fire up across different setups.
See above on IPFS
GN2 runs gemma-wrapper straight from the web-server. This is inconvenient because there are no progress updates and browsers/nginx can time out. We need to introduce something similar to flask-executor - though we'll roll our own based on a GN3 endpoint (below). We also have the opportunity to use the faster Kinship computation in my gemma2lib and to improve GEMMA parallel processing on a single machine. Even on a single host we should be able to speed up GEMMA for LOCO in this step.
Implemented parallel in gemma-wrapper locally
In the next phase we'll introduce running GEMMA on Octopus with 264 cores. This will be particularly interesting for permutations to compute significance and for global pre-compute or 'GEMMA full vector output with polarity' as Rob coined it.
nyi
Here we update running the permutations in a new tool. Key features is reporting as permutations are progressing and ability to run on slurm (see below).
In progress:
nyi
Next we'll create an endpoint that will allow invoking gemma-wrapper locally and remotely. This endpoint will collect stdout/stderr and give a mechanism for feeding data to the browser. We'll use that information to display CI-style progress in the browser which will be convenient for power users and trouble shooting.
nyi
The GN2 web interface will be rejigged to make use of the GN3 endpoint using web-sockets.
nyi
We will use above output from stderr/stdout to display it as it comes in using web-sockets.
nyi
Finally, as a coup-de-grace, we'll allow opportunistic use of resources and fire up computations on several hosts - may the fastest one win. The current idea is to complete all processes so they can reuse the cache later. Any dataset that gets run is likely to run again and disk space comes cheap these days.
nyi