Skip to content

config

config

Remote dispatch configuration schema.

See examples/dispatch.yaml for a complete example.

JuiceFSMount(redis_url: str, mount_point: str, cache_size: str | None = None, cache_dir: str | None = None) dataclass

JuiceFS mount configuration.

ClusterConfig(root: str, work: str, log: str | None = None, share: str | None = None, mount: str | None = None, cache_size: str | None = None, cache_dir: str | None = None) dataclass

Configuration for a compute cluster's paths.

PartitionConfig(name: str, default: bool = False, constraint: str | None = None) dataclass

Configuration for a SLURM partition.

PlainHostConfig(ssh: str, cluster: str, type: Literal['plain'] = 'plain', chips: dict[str, int] = dict(), uv_groups: list[str] = list()) dataclass

Configuration for a plain SSH host (no scheduler).

SlurmHostConfig(ssh: str, cluster: str, type: Literal['slurm'] = 'slurm', partitions: list[PartitionConfig] = list(), account: str | None = None, qos: str | None = None, mem: str | None = None, exclude: list[str] = list(), uv_groups: list[str] = list(), chips: dict[str, int] | None = None, cpu_partitions: list[str] = list(), annotations: dict[str, str] = dict()) dataclass

Configuration for a SLURM cluster login node.

DispatchConfig(mount: str | None = None, proxy: str | None = None, clusters: dict[str, ClusterConfig] = dict(), hosts: dict[str, PlainHostConfig | SlurmHostConfig] = dict(), priority: list[str] = list(), gres_mapping: dict[str, str] = dict()) dataclass

Top-level dispatch configuration.

RemoteInventory(config: DispatchConfig)

Resolves dispatch config into usable objects.

get_cluster(name: str) -> Cluster

Get or create a Cluster object by name.

get_host(name: str) -> PlainHostConfig | SlurmHostConfig

Get host configuration by name.

get_chip(name: str) -> Chip

Get a Chip by name from SUPPORTED_CHIPS.

plain_hosts() -> list[str]

List all plain SSH hosts.

slurm_hosts() -> list[str]

List all SLURM hosts.

hosts_with_chip(chip: str | Chip) -> list[str]

Find plain hosts that have a specific chip type.

total_chips(chip: str | Chip) -> int

Get total count of a chip type across all plain hosts.

default_partition(host: str) -> PartitionConfig | None

Get the default partition for a SLURM host.

load_dispatch_config(path: str | Path) -> DispatchConfig

Load dispatch configuration from a YAML file.

Parameters:

Name Type Description Default
path str | Path

Path to YAML config file

required

Returns:

Type Description
DispatchConfig

Parsed DispatchConfig

parse_dispatch_config(cfg: DictConfig) -> DispatchConfig

Parse dispatch configuration from OmegaConf.

Parameters:

Name Type Description Default
cfg DictConfig

OmegaConf config with 'clusters' and 'hosts' keys

required

Returns:

Type Description
DispatchConfig

Parsed DispatchConfig

discover_plain_host(ssh_alias: str, timeout: float = 30.0) -> dict[str, int]

Discover chips on a plain SSH host by querying nvidia-smi.

Parameters:

Name Type Description Default
ssh_alias str

SSH config alias

required
timeout float

SSH timeout

30.0

Returns:

Type Description
dict[str, int]

Dict mapping chip name -> count

discover_slurm_partitions(ssh_alias: str, timeout: float = 30.0) -> list[PartitionConfig]

Discover SLURM partitions on a host.

Parameters:

Name Type Description Default
ssh_alias str

SSH config alias

required
timeout float

SSH timeout

30.0

Returns:

Type Description
list[PartitionConfig]

List of discovered PartitionConfig