config
config
¶
Remote dispatch configuration schema.
See examples/dispatch.yaml for a complete example.
JuiceFSMount(redis_url: str, mount_point: str, cache_size: str | None = None, cache_dir: str | None = None)
dataclass
¶
JuiceFS mount configuration.
ClusterConfig(root: str, work: str, log: str | None = None, share: str | None = None, mount: str | None = None, cache_size: str | None = None, cache_dir: str | None = None)
dataclass
¶
Configuration for a compute cluster's paths.
PartitionConfig(name: str, default: bool = False, constraint: str | None = None)
dataclass
¶
Configuration for a SLURM partition.
PlainHostConfig(ssh: str, cluster: str, type: Literal['plain'] = 'plain', chips: dict[str, int] = dict(), uv_groups: list[str] = list())
dataclass
¶
Configuration for a plain SSH host (no scheduler).
SlurmHostConfig(ssh: str, cluster: str, type: Literal['slurm'] = 'slurm', partitions: list[PartitionConfig] = list(), account: str | None = None, qos: str | None = None, mem: str | None = None, exclude: list[str] = list(), uv_groups: list[str] = list(), chips: dict[str, int] | None = None, cpu_partitions: list[str] = list(), annotations: dict[str, str] = dict())
dataclass
¶
Configuration for a SLURM cluster login node.
DispatchConfig(mount: str | None = None, proxy: str | None = None, clusters: dict[str, ClusterConfig] = dict(), hosts: dict[str, PlainHostConfig | SlurmHostConfig] = dict(), priority: list[str] = list(), gres_mapping: dict[str, str] = dict())
dataclass
¶
Top-level dispatch configuration.
RemoteInventory(config: DispatchConfig)
¶
Resolves dispatch config into usable objects.
get_cluster(name: str) -> Cluster
¶
Get or create a Cluster object by name.
get_host(name: str) -> PlainHostConfig | SlurmHostConfig
¶
Get host configuration by name.
get_chip(name: str) -> Chip
¶
Get a Chip by name from SUPPORTED_CHIPS.
plain_hosts() -> list[str]
¶
List all plain SSH hosts.
slurm_hosts() -> list[str]
¶
List all SLURM hosts.
hosts_with_chip(chip: str | Chip) -> list[str]
¶
Find plain hosts that have a specific chip type.
total_chips(chip: str | Chip) -> int
¶
Get total count of a chip type across all plain hosts.
default_partition(host: str) -> PartitionConfig | None
¶
Get the default partition for a SLURM host.
load_dispatch_config(path: str | Path) -> DispatchConfig
¶
Load dispatch configuration from a YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to YAML config file |
required |
Returns:
| Type | Description |
|---|---|
DispatchConfig
|
Parsed DispatchConfig |
parse_dispatch_config(cfg: DictConfig) -> DispatchConfig
¶
Parse dispatch configuration from OmegaConf.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
DictConfig
|
OmegaConf config with 'clusters' and 'hosts' keys |
required |
Returns:
| Type | Description |
|---|---|
DispatchConfig
|
Parsed DispatchConfig |
discover_plain_host(ssh_alias: str, timeout: float = 30.0) -> dict[str, int]
¶
Discover chips on a plain SSH host by querying nvidia-smi.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ssh_alias
|
str
|
SSH config alias |
required |
timeout
|
float
|
SSH timeout |
30.0
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Dict mapping chip name -> count |
discover_slurm_partitions(ssh_alias: str, timeout: float = 30.0) -> list[PartitionConfig]
¶
Discover SLURM partitions on a host.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ssh_alias
|
str
|
SSH config alias |
required |
timeout
|
float
|
SSH timeout |
30.0
|
Returns:
| Type | Description |
|---|---|
list[PartitionConfig]
|
List of discovered PartitionConfig |