Remote Dispatch (SSH & SLURM)¶
Remote dispatch lets you send a job to an SSH host or a SLURM cluster from your local machine without manually SSHing in.
Prerequisites — ~/.theseus.yaml¶
You need a dispatch config that describes your infrastructure. Copy examples/dispatch.yaml from the repo as a starting point:
Then edit it to match your clusters. A minimal plain-SSH example:
clusters:
mybox:
root: /data/theseus # this is the output folder
work: /tmp/theseus # this is a temporary directory where code is copied
hosts:
mybox:
ssh: mybox # alias in ~/.ssh/config
cluster: mybox
type: plain
chips:
h100: 4
uv_groups: [cuda12]
priority:
- mybox
A minimal SLURM example:
clusters:
hpc:
root: /mnt/data/theseus # this is the output folder
work: /scratch/theseus # this is a temporary directory where code is copied
hosts:
hpc-login:
ssh: hpc # alias in ~/.ssh/config
cluster: hpc
type: slurm
partitions: [gpu]
account: myproject
uv_groups: [cuda12]
priority:
- hpc-login
Step 1 — Generate a config (same as local)¶
Step 2 — Submit¶
Theseus reads ~/.theseus.yaml, finds the first host in priority that can satisfy the hardware request, ships your code, and either SSHes in to run it directly (plain host) or submits an sbatch job (SLURM host).
Override hardware at submit time if you didn't bake it into the config:
Pin to a specific cluster, or exclude one:
theseus submit my-gpt-run run.yaml --cluster hpc-login
theseus submit my-gpt-run run.yaml --exclude-cluster cloud
By default theseus ships your working tree including uncommitted changes (--dirty). To ship only committed code:
Monitoring Jobs¶
Job and log naming¶
The job name and log file path are derived from your submit arguments.
SLURM jobs are named {project}-{group}-{name} and log to:
where %j is the SLURM job ID. project defaults to "general" and group
defaults to "default" if not specified. For example:
theseus submit my_run run.yaml --project myproj --group exp1
# -> SLURM job name: myproj-exp1-my_run
# -> log file: /scratch/theseus/myproj-exp1-my_run-12345678.out
SSH (plain) jobs log to:
For example:
The exact path is printed when the job is submitted.
SLURM¶
# Check job status:
squeue -u $USER
# Tail the log (find the path in the submit output, or list the log dir):
tail -f /scratch/theseus/myproj-exp1-my_run-12345678.out
# Cancel a job:
scancel <job-id>