Skip to content

volcano

volcano

Kubernetes Volcano dispatch utilities.

Provides kubectl wrappers for submitting Volcano Jobs, shipping code to PVCs, and rendering the volcano_job.yaml template.

get_queue_availability(queue_name: str, gpu_resource_key: str = 'nvidia.com/gpu', kubeconfig: str | None = None, context: str | None = None, timeout: float = 30.0) -> tuple[int, int] | None

Return (available, capacity) GPU count for a Volcano queue.

Fetches spec.capability and status.allocated from the Queue CRD. Returns None on error (queue not found, kubectl failure, etc.).

apply_job(yaml_content: str, namespace: str = 'default', kubeconfig: str | None = None, context: str | None = None, timeout: float = 60.0) -> RunResult

Submit a Volcano Job via kubectl apply -f -.

delete_job(job_name: str, namespace: str = 'default', kubeconfig: str | None = None, context: str | None = None, timeout: float = 60.0) -> RunResult

Delete a Volcano Job.

get_job_status(job_name: str, namespace: str = 'default', kubeconfig: str | None = None, context: str | None = None, timeout: float = 30.0) -> dict[str, Any] | None

Get Volcano Job status as a dict, or None if not found.

get_pod_logs(job_name: str, namespace: str = 'default', kubeconfig: str | None = None, context: str | None = None, timeout: float = 30.0, tail_lines: int = 100) -> RunResult

Get logs from pods belonging to a Volcano Job.

wait_completed(job_name: str, namespace: str = 'default', kubeconfig: str | None = None, context: str | None = None, poll_interval: float = 10.0, timeout: float | None = None) -> str | None

Poll until a Volcano Job reaches a terminal state.

Returns the final state string ("Completed", "Failed", "Aborted", etc.) or None on timeout.

render_volcano_job(job_name: str, host_config: VolcanoHostConfig, bootstrap_command: str, n_chips: int | None = None, cluster_env: dict[str, str] | None = None) -> str

Render the volcano_job.yaml template with concrete values.

Returns the rendered YAML string ready for kubectl apply.

ship_and_write_to_pvc(tarball: bytes, script: str, bootstrap_pys: dict[str, str], pvc_name: str, dest_path: str, queue: str, namespace: str = 'default', pvc_mount_path: str = '/workspace', kubeconfig: str | None = None, context: str | None = None, timeout: float = 120.0, helper_resources: dict[str, str] | None = None) -> RunResult

Ship code + bootstrap files to a PVC via a temporary Volcano Job.

Uses a single Volcano Job (vcjob) as the helper instead of a plain Pod, so it goes through the Volcano scheduler like real workloads. Code is streamed via tar | kubectl exec -i … tar -xpf - (no intermediate files). Bootstrap scripts are written via kubectl exec -i … cat. The helper vcjob is cleaned up on exit.