AllianceCan: usage notes

Build `venv` via `SSH`

On terminal: ssh user_name@rorqual.alliancecan.ca
Copy-paste

cd /home/user_name/links/projects/def-user_name-ab/user_name
module reset
module load StdEnv/2023 python/3.11

python3.11 -m venv py311
source py311/bin/activate

pip install --no-index --upgrade pip

# Clear the pip cache first
pip cache purge

# Install using a custom temp directory
mkdir -p /scratch/$USER/tmp
export TMPDIR=/scratch/$USER/tmp

# packages you'll need
pip install torch 
pip install ipython tqdm transformers optuna triton sklearn 

pip install --no-cache-dir xgboost
pip install --no-cache-dir scikit-survival



# required to use same kernel in Jupyter Hub
pip install ipykernel

# create a kernel for use in Jupyter Hub
python -m ipykernel install --user --name myenv311 --display-name "py311"

Optionally:

Sockeye: usage notes for working in the ARC (Advanced Research Computing) environment

One-time setup
Frequent Linux commands
Modules
Conda
Virtual environments via Conda
Running an interactive job (with GPU)
Running an offline job in Slurm

One-time setup

Apply for Sockeye allocation: https://flex.redcap.ubc.ca/surveys/?s=7MKJT898LK
Setup Multi-factor authentication. This is mandatory step or you will not be able to SSH.
Install myVPN:
- Set up guide for Mac users
- Window users may need to email and request OneDrive link to download an installer, as Lisa did in April, 2024

Frequent Linux commands

`print_members`

Output:

#################################### Allocation members for st-username-1 ####################################
sli (Sears Li) 
moak1 (Maya Oak)

Modules

To see how to load a software via loading of required module(s), one may need to query on what to load.

For instance, to use Git, one would issue:

$module spider git

Which gives an output like this:

  For detailed information about a specific "git" package (including how to load the modules) use the module's full name. Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider git/2.41.0

Following the suggested query, one would be advised to load a version of gcc module

Hence, to use Git, one would issue a command like this:

 module load gcc/5.5.0 git/2.41.0

Conda environments

New Python environment with `ipython`

conda create -n "py3.12" python=3.12 ipython  
conda activate py3.12

Replicate exact environment described in a `environment.yml`

conda env create -f environment.yml

Running an interactive job

salloc --time=10:0:0 --mem=3G --nodes=1 --ntasks=2 --account=st-username-1-gpu --gpus=1

Running an offline job in Slurm

Create job specification file
Submit on the job-queue
Wait for job release and job completion, which should give you log file(s) as specified via the error and output switches.

Here's an example job specification:

#!/bin/bash
#SBATCH --account=username                # Allocation code
#SBATCH --nodes=1                         # Number of nodes for each sub-job.
#SBATCH --ntasks-per-node=1               # Number of tasks per node for each sub-job.
#SBATCH --time=X:00:00                    # Estimating X hours of runtime, e.g. X=3   (job will not be complete if actual runtime needed exceeds X)
#SBATCH --mem=YG                          # Estimating Y GB of memory needed, e.g. Y=8  (will not run successfully if actual memory needed exceeds Y)
#SBATCH --output=logs/array_%A_%a.out     # [optional] Redirects output to a unique file for each sub-job.
#SBATCH --error=logs/array_%A_%a.err      # [optional] Redirects error logs to a unique file for each sub-job.
#SBATCH --mail-user=your_email_addr@ca    # [optional] Email address for job notifications
#SBATCH --job-name=nps_job_array          # [optional] Job name
#SBATCH --mail-type=ALL                   # [optional] Email notifications received for ALL job events [other options: E for errors]

# resets language
export LC_ALL=C; unset LANGUAGE

# Load necessary modules
module load gcc/5.5.0

How much time left before job ends

echo "$(squeue -h -j $SLURM_JOBID -o %L)"

For instance, squeue -u $USER shows you the status:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
3951233       gpu interact   xxx    R    9:19:53      1 se353

Then, you could query about end time like this:

echo "$(squeue -h -j 3951233 -o %L)"

Misc.

  Graham has several types of GPUs, some of which are available with less wait:
  320 p100 2/node, 12GB, original
   70 v100 8/node, 16GB, newer, about 50% faster than P100 and with tensor cores
  144 t4   4/node, 16GB, newer, about half a V100, for compute & AI except much slower FP64

Python/Git Workshop

Notes created for RADD

Build `venv` via `SSH`

Contents

One-time setup

Frequent Linux commands

`print_members`

Modules

Conda environments

New Python environment with `ipython`

Replicate exact environment described in a `environment.yml`

Running an interactive job

Running an offline job in Slurm

Build venv via SSH

Contents

One-time setup

Frequent Linux commands

print_members

Modules

Conda environments

New Python environment with ipython

Replicate exact environment described in a environment.yml

Running an interactive job

Running an offline job in Slurm

Build `venv` via `SSH`

`print_members`

New Python environment with `ipython`

Replicate exact environment described in a `environment.yml`