vipr_reflectometry.flow_models.postprocess.cluster.model_selection package

Submodules

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.algorithms module

Model Selection Algorithms.

Provides hyperparameter sweep functions for determining optimal clustering parameters.

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.algorithms.gmm_model_selection_curves(samples: ndarray, k_min: int = 1, k_max: int = 10, *, whiten: bool = True, reg_covar: float = 1e-06, n_init: int = 10, seed: int = 42, logger=None) Dict[str, Any]

Calculate BIC, AIC and Silhouette scores over a range of K values for GMM.

Parameters:
  • samples – Parameter samples (num_samples, num_params)

  • k_min – Minimum K to test

  • k_max – Maximum K to test

  • whiten – Apply StandardScaler for stable BIC

  • reg_covar – Covariance regularization

  • n_init – Number of GMM initializations

  • seed – Random seed for reproducible GMM fitting and silhouette sampling

  • logger – Optional logger

Returns:

‘K’, ‘bic’, ‘aic’, ‘silhouette’

Return type:

Dict with keys

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.algorithms.hdbscan_sweep(samples: ndarray, min_cluster_size_values: List[int], *, min_samples: int = 10, whiten: bool = True, seed: int = 42, logger=None) Dict[str, Any]

Sweep over min_cluster_size values for HDBSCAN.

Returns number of clusters and silhouette score for each configuration.

Parameters:
  • samples – Parameter samples (num_samples, num_params)

  • min_cluster_size_values – List of min_cluster_size values to test

  • min_samples – HDBSCAN min_samples parameter

  • whiten – Apply StandardScaler

  • seed – Random seed for silhouette sampling (HDBSCAN itself is deterministic)

  • logger – Optional logger

Returns:

‘min_cluster_size’, ‘n_clusters’, ‘silhouette’

Return type:

Dict with keys

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hook module

VIPR Hook for Cluster Model Selection / Diagnostics.

Performs hyperparameter sweeps to help determine optimal clustering parameters. Does not modify data - only generates diagnostic plots.

class vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hook.ClusterDiagnosticsHook(app: VIPR)

Bases: object

Hook for clustering model selection and diagnostics.

Performs hyperparameter sweeps (GMM K-range or HDBSCAN min_cluster_size) and generates diagnostic plots without modifying the data.

class vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hook.ClusterDiagnosticsHookParams(*, method: str = 'gmm', k_min: int = 1, k_max: int = 10, whiten: bool = True, hdbscan_mcs_min: int = 20, hdbscan_mcs_max: int = 200, hdbscan_mcs_step: int = 20, hdbscan_min_samples: int = 10, seed: int = 42, n_init: int = 10)

Bases: BaseModel

Configuration for clustering model-selection diagnostics.

hdbscan_mcs_max: int
hdbscan_mcs_min: int
hdbscan_mcs_step: int
hdbscan_min_samples: int
k_max: int
k_min: int
method: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_init: int
seed: int
whiten: bool

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.visualization module

Model Selection Visualization.

Creates plots for BIC/AIC/Silhouette curves and HDBSCAN sweeps.

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.visualization.create_gmm_model_selection_plot(app, curves: Dict[str, Any], spectrum_idx: int)

Create BIC/AIC and Silhouette vs K plots for GMM.

Parameters:
  • app – VIPR application instance

  • curves – Dict from gmm_model_selection_curves()

  • spectrum_idx – Spectrum index

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.visualization.create_hdbscan_sweep_plot(app, sweep: Dict[str, Any], spectrum_idx: int)

Create cluster count and Silhouette vs min_cluster_size plots for HDBSCAN.

Parameters:
  • app – VIPR application instance

  • sweep – Dict from hdbscan_sweep()

  • spectrum_idx – Spectrum index

Module contents

Model Selection Module for Clustering.

Provides hyperparameter sweep tools and visualization for determining optimal clustering parameters before execution.

class vipr_reflectometry.flow_models.postprocess.cluster.model_selection.ClusterDiagnosticsHook(app: VIPR)

Bases: object

Hook for clustering model selection and diagnostics.

Performs hyperparameter sweeps (GMM K-range or HDBSCAN min_cluster_size) and generates diagnostic plots without modifying the data.

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.create_gmm_model_selection_plot(app, curves: Dict[str, Any], spectrum_idx: int)

Create BIC/AIC and Silhouette vs K plots for GMM.

Parameters:
  • app – VIPR application instance

  • curves – Dict from gmm_model_selection_curves()

  • spectrum_idx – Spectrum index

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.create_hdbscan_sweep_plot(app, sweep: Dict[str, Any], spectrum_idx: int)

Create cluster count and Silhouette vs min_cluster_size plots for HDBSCAN.

Parameters:
  • app – VIPR application instance

  • sweep – Dict from hdbscan_sweep()

  • spectrum_idx – Spectrum index

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.gmm_model_selection_curves(samples: ndarray, k_min: int = 1, k_max: int = 10, *, whiten: bool = True, reg_covar: float = 1e-06, n_init: int = 10, seed: int = 42, logger=None) Dict[str, Any]

Calculate BIC, AIC and Silhouette scores over a range of K values for GMM.

Parameters:
  • samples – Parameter samples (num_samples, num_params)

  • k_min – Minimum K to test

  • k_max – Maximum K to test

  • whiten – Apply StandardScaler for stable BIC

  • reg_covar – Covariance regularization

  • n_init – Number of GMM initializations

  • seed – Random seed for reproducible GMM fitting and silhouette sampling

  • logger – Optional logger

Returns:

‘K’, ‘bic’, ‘aic’, ‘silhouette’

Return type:

Dict with keys

vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hdbscan_sweep(samples: ndarray, min_cluster_size_values: List[int], *, min_samples: int = 10, whiten: bool = True, seed: int = 42, logger=None) Dict[str, Any]

Sweep over min_cluster_size values for HDBSCAN.

Returns number of clusters and silhouette score for each configuration.

Parameters:
  • samples – Parameter samples (num_samples, num_params)

  • min_cluster_size_values – List of min_cluster_size values to test

  • min_samples – HDBSCAN min_samples parameter

  • whiten – Apply StandardScaler

  • seed – Random seed for silhouette sampling (HDBSCAN itself is deterministic)

  • logger – Optional logger

Returns:

‘min_cluster_size’, ‘n_clusters’, ‘silhouette’

Return type:

Dict with keys