services

`services` ¶

Service layer for the web UI.

Services handle reading data from the filesystem and transforming it into the models used by the API.

`StatusService(status_dir: Path)` ¶

Service for reading job status from the filesystem.

When a JobCache is attached (via set_cache), list operations are served from the in-memory cache instead of walking the filesystem on every request. Single-job lookups still hit the filesystem for maximum freshness.

`set_cache(cache) -> None` ¶

Attach a JobCache instance to serve list queries from memory.

`list_all_jobs(project: Optional[str] = None, group: Optional[str] = None, status: Optional[JobStatus] = None, limit: int = 100) -> list[JobMetadata]` ¶

List all jobs, optionally filtered by project/group/status.

Jobs are returned sorted by start_time (most recent first).

`get_job(project: str, group: str, name: str, run_id: str) -> Optional[JobMetadata]` ¶

Get a specific job by its identifiers.

Always reads from filesystem for maximum freshness on detail pages. Falls back to cache if the filesystem read fails or file is missing.

`get_job_runs(project: str, group: str, name: str) -> list[JobMetadata]` ¶

Get all runs for a specific job name.

`list_projects() -> list[ProjectSummary]` ¶

List all projects with summary stats.

`get_running_jobs() -> list[JobMetadata]` ¶

Get all currently running jobs, excluding stale ones (no heartbeat >5min).

`get_recent_jobs(hours: int = 24, limit: int = 50) -> list[JobMetadata]` ¶

Get jobs started within the last N hours.

`get_dashboard_stats() -> DashboardStats` ¶

Get aggregated stats for the dashboard.

`delete_job(project: str, group: str, name: str, run_id: str) -> bool` ¶

Delete a specific job run. Returns True if successful.

`CheckpointService(checkpoints_dir: Path)` ¶

Service for browsing checkpoints.

`list_all_checkpoints(project: Optional[str] = None, group: Optional[str] = None, job_name: Optional[str] = None, limit: int = 100) -> list[CheckpointInfo]` ¶

List all checkpoints, optionally filtered.

Returns checkpoints sorted by creation time (most recent first).

`get_checkpoint(project: str, group: str, job_name: str, suffix: str) -> Optional[CheckpointInfo]` ¶

Get a specific checkpoint.

`get_latest_checkpoint(project: str, group: str, job_name: str) -> Optional[CheckpointInfo]` ¶

Get the latest checkpoint for a job.

`list_job_checkpoints(project: str, group: str, job_name: str) -> list[CheckpointInfo]` ¶

List all checkpoints for a specific job.

`get_checkpoint_config(project: str, group: str, job_name: str, suffix: str) -> Optional[dict[str, Any]]` ¶

Read the config from a checkpoint (tries config.yaml then config.json).

`get_checkpoint_job_spec(project: str, group: str, job_name: str, suffix: str) -> Optional[dict[str, Any]]` ¶

Read the job.json from a checkpoint.

`count_checkpoints() -> int` ¶

Count total number of checkpoints.

`get_total_size() -> int` ¶

Get total size of all checkpoints in bytes.

`format_size(size_bytes: int) -> str` ¶

Format bytes as human-readable string.

`delete_job_checkpoints(project: str, group: str, job_name: str) -> bool` ¶

Delete all checkpoints for a job. Returns True if successful.

`LogService(status_dir: Path)` ¶

Service for reading and streaming job logs.

`read_log(project: str, group: str, name: str, run_id: str, offset: int = 0, limit: Optional[int] = None) -> tuple[str, int]` ¶

Read log file contents.

Parameters:

Name	Type	Description	Default
`project`	`str`	Project name	required
`group`	`str`	Group name	required
`name`	`str`	Job name	required
`run_id`	`str`	Run ID	required
`offset`	`int`	Byte offset to start reading from	`0`
`limit`	`Optional[int]`	Maximum bytes to read (None for all)	`None`

Returns:

Type	Description
`tuple[str, int]`	Tuple of (content, new_offset)

`tail_log(project: str, group: str, name: str, run_id: str, lines: int | None = None) -> str` ¶

Read the last N lines of a log file, or entire file if lines=None.

Parameters:

Name	Type	Description	Default
`lines`	`int \| None`	Number of lines to return from end. None = entire file.	`None`

`stream_log(project: str, group: str, name: str, run_id: str, poll_interval: float = 1.0) -> AsyncGenerator[str, None]` `async` ¶

Stream log file contents as they're written.

Yields new content whenever the file is updated. Uses polling - TODO: consider inotify/fsevents for efficiency.

`get_log_size(project: str, group: str, name: str, run_id: str) -> int` ¶

Get the size of a log file in bytes.

`log_exists(project: str, group: str, name: str, run_id: str) -> bool` ¶

Check if a log file exists.

`JobCache(status_dir: Path, mode: str = 'polling', poll_interval: float = 10.0)` ¶

In-memory cache of job metadata with background refresh.

`start() -> None` `async` ¶

Start the cache: do initial full scan, then begin background refresh.

`stop() -> None` `async` ¶

Stop background refresh.

`list_all_jobs(project: Optional[str] = None, group: Optional[str] = None, status: Optional[JobStatus] = None, limit: int = 100) -> list[JobMetadata]` ¶

List all jobs from cache, with optional filtering.

`get_job(project: str, group: str, name: str, run_id: str) -> Optional[JobMetadata]` ¶

Get a single job from cache.

`get_job_runs(project: str, group: str, name: str) -> list[JobMetadata]` ¶

Get all runs for a specific job name from cache.

`list_projects() -> list[ProjectSummary]` ¶

List all projects with summary stats, from cache.

`get_running_jobs() -> list[JobMetadata]` ¶

Get all running, non-stale jobs from cache.

`get_recent_jobs(hours: int = 24, limit: int = 50) -> list[JobMetadata]` ¶

Get jobs started within the last N hours, from cache.

`get_dashboard_stats() -> DashboardStats` ¶

Get aggregated stats from cache.

`delete_job(project: str, group: str, name: str, run_id: str) -> None` ¶

Remove a job from the cache (called after filesystem delete).

`invalidate(project: str, group: str, name: str, run_id: str) -> None` ¶

Force re-read a specific job on next incremental scan.

services

services ¶

StatusService(status_dir: Path) ¶

set_cache(cache) -> None ¶

list_all_jobs(project: Optional[str] = None, group: Optional[str] = None, status: Optional[JobStatus] = None, limit: int = 100) -> list[JobMetadata] ¶

get_job(project: str, group: str, name: str, run_id: str) -> Optional[JobMetadata] ¶

get_job_runs(project: str, group: str, name: str) -> list[JobMetadata] ¶

list_projects() -> list[ProjectSummary] ¶

get_running_jobs() -> list[JobMetadata] ¶

get_recent_jobs(hours: int = 24, limit: int = 50) -> list[JobMetadata] ¶

get_dashboard_stats() -> DashboardStats ¶

delete_job(project: str, group: str, name: str, run_id: str) -> bool ¶

CheckpointService(checkpoints_dir: Path) ¶

list_all_checkpoints(project: Optional[str] = None, group: Optional[str] = None, job_name: Optional[str] = None, limit: int = 100) -> list[CheckpointInfo] ¶

get_checkpoint(project: str, group: str, job_name: str, suffix: str) -> Optional[CheckpointInfo] ¶

get_latest_checkpoint(project: str, group: str, job_name: str) -> Optional[CheckpointInfo] ¶

list_job_checkpoints(project: str, group: str, job_name: str) -> list[CheckpointInfo] ¶

get_checkpoint_config(project: str, group: str, job_name: str, suffix: str) -> Optional[dict[str, Any]] ¶

get_checkpoint_job_spec(project: str, group: str, job_name: str, suffix: str) -> Optional[dict[str, Any]] ¶

count_checkpoints() -> int ¶

get_total_size() -> int ¶

format_size(size_bytes: int) -> str ¶

delete_job_checkpoints(project: str, group: str, job_name: str) -> bool ¶

LogService(status_dir: Path) ¶

read_log(project: str, group: str, name: str, run_id: str, offset: int = 0, limit: Optional[int] = None) -> tuple[str, int] ¶

tail_log(project: str, group: str, name: str, run_id: str, lines: int | None = None) -> str ¶

stream_log(project: str, group: str, name: str, run_id: str, poll_interval: float = 1.0) -> AsyncGenerator[str, None] async ¶

get_log_size(project: str, group: str, name: str, run_id: str) -> int ¶

log_exists(project: str, group: str, name: str, run_id: str) -> bool ¶

JobCache(status_dir: Path, mode: str = 'polling', poll_interval: float = 10.0) ¶

start() -> None async ¶

stop() -> None async ¶

list_all_jobs(project: Optional[str] = None, group: Optional[str] = None, status: Optional[JobStatus] = None, limit: int = 100) -> list[JobMetadata] ¶

get_job(project: str, group: str, name: str, run_id: str) -> Optional[JobMetadata] ¶

get_job_runs(project: str, group: str, name: str) -> list[JobMetadata] ¶

list_projects() -> list[ProjectSummary] ¶

get_running_jobs() -> list[JobMetadata] ¶

get_recent_jobs(hours: int = 24, limit: int = 50) -> list[JobMetadata] ¶

get_dashboard_stats() -> DashboardStats ¶

delete_job(project: str, group: str, name: str, run_id: str) -> None ¶

invalidate(project: str, group: str, name: str, run_id: str) -> None ¶

`services` ¶

`StatusService(status_dir: Path)` ¶

`set_cache(cache) -> None` ¶

`list_all_jobs(project: Optional[str] = None, group: Optional[str] = None, status: Optional[JobStatus] = None, limit: int = 100) -> list[JobMetadata]` ¶

`get_job(project: str, group: str, name: str, run_id: str) -> Optional[JobMetadata]` ¶

`get_job_runs(project: str, group: str, name: str) -> list[JobMetadata]` ¶

`list_projects() -> list[ProjectSummary]` ¶

`get_running_jobs() -> list[JobMetadata]` ¶

`get_recent_jobs(hours: int = 24, limit: int = 50) -> list[JobMetadata]` ¶

`get_dashboard_stats() -> DashboardStats` ¶

`delete_job(project: str, group: str, name: str, run_id: str) -> bool` ¶

`CheckpointService(checkpoints_dir: Path)` ¶

`list_all_checkpoints(project: Optional[str] = None, group: Optional[str] = None, job_name: Optional[str] = None, limit: int = 100) -> list[CheckpointInfo]` ¶

`get_checkpoint(project: str, group: str, job_name: str, suffix: str) -> Optional[CheckpointInfo]` ¶

`get_latest_checkpoint(project: str, group: str, job_name: str) -> Optional[CheckpointInfo]` ¶

`list_job_checkpoints(project: str, group: str, job_name: str) -> list[CheckpointInfo]` ¶

`get_checkpoint_config(project: str, group: str, job_name: str, suffix: str) -> Optional[dict[str, Any]]` ¶

`get_checkpoint_job_spec(project: str, group: str, job_name: str, suffix: str) -> Optional[dict[str, Any]]` ¶

`count_checkpoints() -> int` ¶

`get_total_size() -> int` ¶

`format_size(size_bytes: int) -> str` ¶

`delete_job_checkpoints(project: str, group: str, job_name: str) -> bool` ¶

`LogService(status_dir: Path)` ¶

`read_log(project: str, group: str, name: str, run_id: str, offset: int = 0, limit: Optional[int] = None) -> tuple[str, int]` ¶

`tail_log(project: str, group: str, name: str, run_id: str, lines: int | None = None) -> str` ¶

`stream_log(project: str, group: str, name: str, run_id: str, poll_interval: float = 1.0) -> AsyncGenerator[str, None]` `async` ¶

`get_log_size(project: str, group: str, name: str, run_id: str) -> int` ¶

`log_exists(project: str, group: str, name: str, run_id: str) -> bool` ¶

`JobCache(status_dir: Path, mode: str = 'polling', poll_interval: float = 10.0)` ¶

`start() -> None` `async` ¶

`stop() -> None` `async` ¶

`list_all_jobs(project: Optional[str] = None, group: Optional[str] = None, status: Optional[JobStatus] = None, limit: int = 100) -> list[JobMetadata]` ¶

`get_job(project: str, group: str, name: str, run_id: str) -> Optional[JobMetadata]` ¶

`get_job_runs(project: str, group: str, name: str) -> list[JobMetadata]` ¶

`list_projects() -> list[ProjectSummary]` ¶

`get_running_jobs() -> list[JobMetadata]` ¶

`get_recent_jobs(hours: int = 24, limit: int = 50) -> list[JobMetadata]` ¶

`get_dashboard_stats() -> DashboardStats` ¶

`delete_job(project: str, group: str, name: str, run_id: str) -> None` ¶

`invalidate(project: str, group: str, name: str, run_id: str) -> None` ¶