vipr_reflectometry.flow_models.postprocess.cluster.model_selection package¶
Submodules¶
vipr_reflectometry.flow_models.postprocess.cluster.model_selection.algorithms module¶
Model Selection Algorithms.
Provides hyperparameter sweep functions for determining optimal clustering parameters.
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.algorithms.gmm_model_selection_curves(samples: ndarray, k_min: int = 1, k_max: int = 10, *, whiten: bool = True, reg_covar: float = 1e-06, n_init: int = 10, seed: int = 42, logger=None) Dict[str, Any]¶
Calculate BIC, AIC and Silhouette scores over a range of K values for GMM.
- Parameters:
samples – Parameter samples (num_samples, num_params)
k_min – Minimum K to test
k_max – Maximum K to test
whiten – Apply StandardScaler for stable BIC
reg_covar – Covariance regularization
n_init – Number of GMM initializations
seed – Random seed for reproducible GMM fitting and silhouette sampling
logger – Optional logger
- Returns:
‘K’, ‘bic’, ‘aic’, ‘silhouette’
- Return type:
Dict with keys
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.algorithms.hdbscan_sweep(samples: ndarray, min_cluster_size_values: List[int], *, min_samples: int = 10, whiten: bool = True, seed: int = 42, logger=None) Dict[str, Any]¶
Sweep over min_cluster_size values for HDBSCAN.
Returns number of clusters and silhouette score for each configuration.
- Parameters:
samples – Parameter samples (num_samples, num_params)
min_cluster_size_values – List of min_cluster_size values to test
min_samples – HDBSCAN min_samples parameter
whiten – Apply StandardScaler
seed – Random seed for silhouette sampling (HDBSCAN itself is deterministic)
logger – Optional logger
- Returns:
‘min_cluster_size’, ‘n_clusters’, ‘silhouette’
- Return type:
Dict with keys
vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hook module¶
VIPR Hook for Cluster Model Selection / Diagnostics.
Performs hyperparameter sweeps to help determine optimal clustering parameters. Does not modify data - only generates diagnostic plots.
- class vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hook.ClusterDiagnosticsHook(app: VIPR)¶
Bases:
objectHook for clustering model selection and diagnostics.
Performs hyperparameter sweeps (GMM K-range or HDBSCAN min_cluster_size) and generates diagnostic plots without modifying the data.
- class vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hook.ClusterDiagnosticsHookParams(*, method: str = 'gmm', k_min: int = 1, k_max: int = 10, whiten: bool = True, hdbscan_mcs_min: int = 20, hdbscan_mcs_max: int = 200, hdbscan_mcs_step: int = 20, hdbscan_min_samples: int = 10, seed: int = 42, n_init: int = 10)¶
Bases:
BaseModelConfiguration for clustering model-selection diagnostics.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
vipr_reflectometry.flow_models.postprocess.cluster.model_selection.visualization module¶
Model Selection Visualization.
Creates plots for BIC/AIC/Silhouette curves and HDBSCAN sweeps.
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.visualization.create_gmm_model_selection_plot(app, curves: Dict[str, Any], spectrum_idx: int)¶
Create BIC/AIC and Silhouette vs K plots for GMM.
- Parameters:
app – VIPR application instance
curves – Dict from gmm_model_selection_curves()
spectrum_idx – Spectrum index
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.visualization.create_hdbscan_sweep_plot(app, sweep: Dict[str, Any], spectrum_idx: int)¶
Create cluster count and Silhouette vs min_cluster_size plots for HDBSCAN.
- Parameters:
app – VIPR application instance
sweep – Dict from hdbscan_sweep()
spectrum_idx – Spectrum index
Module contents¶
Model Selection Module for Clustering.
Provides hyperparameter sweep tools and visualization for determining optimal clustering parameters before execution.
- class vipr_reflectometry.flow_models.postprocess.cluster.model_selection.ClusterDiagnosticsHook(app: VIPR)¶
Bases:
objectHook for clustering model selection and diagnostics.
Performs hyperparameter sweeps (GMM K-range or HDBSCAN min_cluster_size) and generates diagnostic plots without modifying the data.
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.create_gmm_model_selection_plot(app, curves: Dict[str, Any], spectrum_idx: int)¶
Create BIC/AIC and Silhouette vs K plots for GMM.
- Parameters:
app – VIPR application instance
curves – Dict from gmm_model_selection_curves()
spectrum_idx – Spectrum index
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.create_hdbscan_sweep_plot(app, sweep: Dict[str, Any], spectrum_idx: int)¶
Create cluster count and Silhouette vs min_cluster_size plots for HDBSCAN.
- Parameters:
app – VIPR application instance
sweep – Dict from hdbscan_sweep()
spectrum_idx – Spectrum index
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.gmm_model_selection_curves(samples: ndarray, k_min: int = 1, k_max: int = 10, *, whiten: bool = True, reg_covar: float = 1e-06, n_init: int = 10, seed: int = 42, logger=None) Dict[str, Any]¶
Calculate BIC, AIC and Silhouette scores over a range of K values for GMM.
- Parameters:
samples – Parameter samples (num_samples, num_params)
k_min – Minimum K to test
k_max – Maximum K to test
whiten – Apply StandardScaler for stable BIC
reg_covar – Covariance regularization
n_init – Number of GMM initializations
seed – Random seed for reproducible GMM fitting and silhouette sampling
logger – Optional logger
- Returns:
‘K’, ‘bic’, ‘aic’, ‘silhouette’
- Return type:
Dict with keys
- vipr_reflectometry.flow_models.postprocess.cluster.model_selection.hdbscan_sweep(samples: ndarray, min_cluster_size_values: List[int], *, min_samples: int = 10, whiten: bool = True, seed: int = 42, logger=None) Dict[str, Any]¶
Sweep over min_cluster_size values for HDBSCAN.
Returns number of clusters and silhouette score for each configuration.
- Parameters:
samples – Parameter samples (num_samples, num_params)
min_cluster_size_values – List of min_cluster_size values to test
min_samples – HDBSCAN min_samples parameter
whiten – Apply StandardScaler
seed – Random seed for silhouette sampling (HDBSCAN itself is deterministic)
logger – Optional logger
- Returns:
‘min_cluster_size’, ‘n_clusters’, ‘silhouette’
- Return type:
Dict with keys