Setup of BIIGLE with GPU computing
Some BIIGLE modules, such as biigle/maia
, may require GPU computing resources. This guide describes how to set up a BIIGLE instance with access to GPU computing resources by setting up a working instance with the biigle/maia
module.
Prerequisites
You need the NVIDIA GPU drivers, Docker, nvidia-docker2 and Docker Compose installed on the GPU machine(s) that should perform GPU computing. You can find example installation instructions here.
Basic Concepts
BIIGLE can be flexibly configured to use GPU computing resources in various scenarios. The GPU can be located on the same machine on which the BIIGLE instance is running (Same Host GPU) or the GPU(s) can be located on (an)other machine(s) (Remote Host GPU).
Same Host GPU
This is the simplest setup. In this scenario, the GPU is available on the same host that runs the BIIGLE instance. BIIGLE makes use of special queues for GPU jobs and responses to distinguish between regular jobs and those that should be executed on a GPU. This requires a dedicated queue worker for GPU jobs.
The gpu
branch of biigle/biigle
includes the configuration for biigle/maia
and same host GPU computing. Use this branch to install your new BIIGLE instance. If you already have a BIIGLE instance set up, merge this branch in your existing configuration and then perform the updating steps (including database migrations).
Remote Host GPU
In this scenario, the GPU(s) are available on different machines than the host that runs the BIIGLE instance. Start with the gpu
branch of biigle/biigle
as described in the same host GPU setup.
Important
The remote GPU machines need to be able to access the image and video files in the same way than the main BIIGLE instance. This could be achieved with a shared filesystem, an (S)FTP server, a cloud object storage service or with the exclusive use of remote volumes in your BIIGLE instance. You must update the build/config/filesystems.php
file for this.
1. Update the cache configuration
The job queue is managed by the cache
service. In order for other machines to access the cache, it needs to be accessible outside of the private network that is established between the service Docker containers of BIIGLE. Update the cache
service in the docker-compose.yaml
file as follows:
cache:
image: redis:3.0-alpine
restart: always
command: redis-server --requirepass $REDIS_PASSWORD
environment:
- "REDIS_PASSWORD=${REDIS_PASSWORD}"
ports:
- 6379:6379
Now the cache can be accessed from other machines.
2. Update the environment
Add REDIS_PASSWORD=mypassword
to the .env
file and update the respective line in the build/.env
file. Replace mypassword
with a strong password of your choosing (e.g. generated with pwgen 30 1
). In addition, set the REDIS_HOST
variable in build/.env
to the IP address of the host that runs the BIIGLE instance.
3. Configure a GPU worker
Since in this scenario the gpu-worker
services run on different machines than the BIIGLE main application, remove the service from the docker-compose.yaml
file. Next, create a new docker-compose.gpu.yaml
file with the following contents:
services:
gpu-worker:
image: biigle/gpu-worker-dist
user: ${USER_ID}:${GROUP_ID}
restart: always
deploy:
replicas: 1
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
- ./storage:/var/www/storage
init: true
command: "php -d memory_limit=1G artisan queue:work --queue=gpu --sleep=5 --tries=1 --timeout=0"
4. Deploy a GPU worker
To deploy a new GPU worker, first install and build (or update) your main BIIGLE instance (with the modifications described above). Then perform these steps on the new GPU machine:
-
Transfer the
biigle/gpu-worker-dist
Docker image the the new GPU machine (e.g. usingdocker save
anddocker load
). -
Copy the files of your
biigle/biigle
setup to the GPU machine (including the.env
file). You can exclude thebuild
andcertificate
directories if you like. -
Update the
USER_ID
andGROUP_ID
variables in.env
if necessary. -
Run
docker compose -f docker-compose.gpu.yaml up -d
to start the GPU worker.
This can be done with multiple GPU machines to enable parallel processing of GPU jobs.