app.ml_utils module#

Machine learning utilities for models and training-log queries.

This module provides small, focused utilities that are used across the training and visualization layers: a minimal PyTorch regression network, typed structures for training results, and helpers to retrieve and format the latest and historical training scores from the database. It exists to keep shared ML-centric logic isolated from the FastAPI views and the standalone training job.

See Also#

app.ml_train

Standalone Slurm-executed training pipeline that writes logs.

app.slurm_job_trigger

Dispatches training jobs into the Slurm cluster.

app.database

Engine and session factories (SessionLocal).

app.models.TrainingLog

ORM model consumed by the query helpers here.

Notes#

  • Primary role: define lightweight ML helpers and expose read-only queries for app.models.TrainingLog suitable for dashboards and APIs.

  • Key dependencies: a reachable database via app.database.SessionLocal and an optional writable shared volume at /data for model artifacts.

  • Invariants: the database schema for training_logs must be present.

Examples#

>>> # Fetch latest scores grouped by horizon (requires DB)
>>> from app.ml_utils import get_latest_training_logs
>>> latest = get_latest_training_logs()
>>> isinstance(latest, dict)
True
>>> # Create a tiny regression net (no DB required)
>>> import torch
>>> from app.ml_utils import SimpleRegressionNet
>>> net = SimpleRegressionNet(input_dim=4)
>>> y = net(torch.randn(2, 4))
>>> y.shape
torch.Size([2, 1])
class app.ml_utils.SimpleRegressionNet(*args: Any, **kwargs: Any)[source]#

Bases: Module

A minimal fully connected network for regression tasks.

Parameters:
input_dimint

Number of input features; must be a positive integer.

Examples

>>> import torch
>>> net = SimpleRegressionNet(input_dim=3)
>>> out = net(torch.randn(2, 3))
>>> out.shape
torch.Size([2, 1])
forward(x: torch.Tensor) torch.Tensor[source]#

Compute predictions for a batch of inputs.

Parameters:
xtorch.Tensor

Input feature tensor with shape (batch_size, input_dim).

Returns:
torch.Tensor

Output tensor with shape (batch_size, 1).

class app.ml_utils.TrainingLogDetails[source]#

Bases: TypedDict

Structured details for a single training log entry.

This typed mapping captures the essential fields used by the UI and reporting layers when presenting the most recent score per horizon.

Attributes:
timestampdatetime | None

Completion time of the training run in UTC.

sklearn_scorefloat

R^2 score of the Scikit-learn model for this run.

pytorch_scorefloat

R^2 score of the PyTorch model for this run.

data_countint

Number of samples used for training and evaluation.

coord_latitudefloat | None

Coordinate latitude associated with the run, if any.

coord_longitudefloat | None

Coordinate longitude associated with the run, if any.

horizon_labelstr | None

Human-friendly horizon label (e.g., "5min", "1h"), if set.

horizon_display_namestr

Preformatted string suitable for charts/legends.

coord_latitude: float | None#
coord_longitude: float | None#
data_count: int#
horizon_display_name: str#
horizon_label: str | None#
pytorch_score: float#
sklearn_score: float#
timestamp: datetime | None#
app.ml_utils.assert_positive_input_dim(input_dim: int) None[source]#

Validate that input_dim is a positive integer.

Parameters:
input_dimint

The number of input features expected by the model. Must be > 0.

Raises:
ValueError

If input_dim is not a positive integer.

Examples

>>> assert_positive_input_dim(4)
>>> assert_positive_input_dim(0)
Traceback (most recent call last):
...
ValueError: input_dim must be a positive integer, but was 0 (type: <class 'int'>).
app.ml_utils.get_historical_scores() Dict[str, Dict[str, Any]][source]#

Fetch historical scores grouped by horizon.

Returns time-ordered scores for every distinct horizon key found in the database. Database errors are logged and an empty mapping is returned on failure to keep callers resilient.

Returns:
dict[str, dict[str, Any]]

Mapping from horizon key to a dictionary with keys "timestamps", "sklearn_scores", "pytorch_scores", and "display_name".

Notes

  • All exceptions are caught and logged; on any error this function returns an empty dictionary.

app.ml_utils.get_latest_training_logs() Dict[str, TrainingLogDetails][source]#

Fetch the latest training log per horizon.

Iterates over distinct horizon keys in training_logs and returns the most recent entry for each. Database errors are logged and an empty mapping is returned on failure to keep callers resilient.

Returns:
dict[str, TrainingLogDetails]

Mapping from horizon key to latest log details.

Notes

  • All exceptions are caught and logged; on any error this function returns an empty dictionary.