vipr_reflectometry.flow_models.postprocess.cluster.clustering package¶
Subpackages¶
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization package
- Submodules
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.centroids module
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.corner module
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.interactive module
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.marginals module
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.parallel_coordinates module
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.utils module
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.visualization.validation module
- Module contents
Submodules¶
vipr_reflectometry.flow_models.postprocess.cluster.clustering.algorithms module¶
Clustering Algorithms and Validation Metrics.
Simplified version without excessive scaling/whitening/merging complexity.
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.algorithms.apply_bayesian_gmm(samples: ndarray, n_components: int = 10, logger=None, *, seed: int | None = None, weight_concentration_prior: float | None = None, weight_concentration_prior_type: str = 'dirichlet_process', covariance_type: str = 'full', reg_covar: float = 1e-06, n_init: int = 1, whiten: bool = True) tuple[ndarray, Dict[str, Any]]¶
Bayesian GMM clustering with Dirichlet Process prior. Automatically determines the effective number of components.
- Parameters:
samples – Parameter samples (N, D)
n_components – Upper bound on number of components
logger – Logger instance
seed – Random seed for reproducible fitting
weight_concentration_prior – Dirichlet concentration (None=auto, low=few clusters)
weight_concentration_prior_type – ‘dirichlet_process’ or ‘dirichlet_distribution’
covariance_type – Covariance structure (‘full’, ‘tied’, ‘diag’, ‘spherical’)
reg_covar – Covariance regularization
n_init – Number of initializations
whiten – Apply StandardScaler for equal parameter weighting
- Returns:
labels: Cluster labels array (0 to K-1, never -1)
gmm_params: Dict with ‘weights’, ‘means’, ‘covariances’ (the k Gaussians)
- Return type:
Tuple of (labels, gmm_params) where
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.algorithms.apply_gmm(samples: ndarray, n_components: int | None = None, logger=None, *, seed: int | None = None, k_max: int = 8, reg_covar: float = 1e-06, n_init: int = 3, covariance_type: str = 'full', whiten: bool = True) tuple[ndarray, Dict[str, Any]]¶
- Pure GMM clustering without pruning or noise detection.
K selection via BIC (if None)
Hard assignment via gmm.predict()
No -1/noise labels, no min_cluster_size filtering
Optional standardization for stable BIC-based K selection
- Parameters:
samples – Parameter samples (N, D)
n_components – Fixed K (None = auto via BIC)
logger – Logger instance
seed – Random seed for reproducible GMM fitting
k_max – Maximum K to test for BIC
reg_covar – Covariance regularization
n_init – GMM initializations
whiten – Apply StandardScaler for equal parameter weighting
- Returns:
labels: Cluster labels array (0 to K-1, never -1)
gmm_params: Dict with ‘weights’, ‘means’, ‘covariances’ (the k Gaussians)
- Return type:
Tuple of (labels, gmm_params) where
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.algorithms.apply_hdbscan(samples: ndarray, min_cluster_size: int, min_samples: int, logger=None, *, seed: int | None = None, whiten: bool = True) ndarray¶
HDBSCAN clustering with optional standardization.
- Parameters:
samples – Array of shape (num_samples, num_params)
min_cluster_size – Minimum number of samples per cluster
min_samples – Minimum number of samples for core points
logger – Optional logger for debug messages
seed – Random seed for reproducibility (unused - HDBSCAN is deterministic)
whiten – Apply StandardScaler for equal parameter weighting
- Returns:
Array of cluster labels (-1 for noise)
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.algorithms.calculate_validation_metrics(samples: ndarray, cluster_labels: ndarray, logger=None) Dict[str, Any]¶
Calculate validation metrics on scaled features.
Metrics are calculated on standardized features to match the clustering feature space when whiten=True was used.
- Parameters:
samples – Parameter samples (num_samples, num_params)
cluster_labels – Cluster assignments
logger – Optional logger for warnings
- Returns:
Dict with validation metrics and quality assessment
vipr_reflectometry.flow_models.postprocess.cluster.clustering.clustering module¶
Clustering Module - Orchestration Layer.
Pure orchestrator that coordinates the clustering workflow. Delegates algorithm execution to the algorithms module.
- class vipr_reflectometry.flow_models.postprocess.cluster.clustering.clustering.ClusterProcessor(app)¶
Bases:
objectMain clustering orchestrator that coordinates the workflow.
Responsibilities: - Handle batch processing (multiple spectra) - Coordinate algorithm execution (delegates to algorithms module) - Coordinate visualization and simulation
This class is a pure orchestrator - it doesn’t implement algorithms itself.
- process(data: Dict[str, Any], method: str = 'gmm', n_components: int | None = 5, min_cluster_size: int = 50, min_samples: int = 10, seed: int | None = None, n_init: int = 10, covariance_type: str = 'full', weight_concentration_prior: float | None = None, weight_concentration_prior_type: str = 'dirichlet_process', **_) Dict[str, Any]¶
Process clustering for a single spectrum.
- Parameters:
data – Prediction results
method – Clustering method (‘hdbscan’ or ‘gmm’)
n_components – Number of GMM components
min_cluster_size – Minimum HDBSCAN cluster size
min_samples – Minimum HDBSCAN samples
- Returns:
Enriched data with cluster information
vipr_reflectometry.flow_models.postprocess.cluster.clustering.hook module¶
VIPR Filter Adapter for Clustering.
Thin adapter between VIPR filter hook and clustering service.
- class vipr_reflectometry.flow_models.postprocess.cluster.clustering.hook.ClusterHook(app: VIPR)¶
Bases:
objectThin adapter between VIPR filter hook and clustering processor.
Responsibilities: - Register as VIPR filter - Validate input data - Delegate to service layer
- class vipr_reflectometry.flow_models.postprocess.cluster.clustering.hook.ClusterHookParams(*, method: str = 'gmm', n_components: int | None = 2, min_cluster_size: int = 50, min_samples: int = 10, seed: int = 42, n_init: int = 10, covariance_type: str = 'full', weight_concentration_prior: float | None = None, weight_concentration_prior_type: str = 'dirichlet_process', polish_centroids: bool = False)¶
Bases:
BaseModelConfiguration for posterior-sample clustering.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation module¶
Forward Simulation Module.
Handles centroid calculation and Reflectorch forward simulation. Contains only simulation logic - visualization is in visualization.py.
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation.simulate_centroids(app, samples: ndarray, cluster_labels: ndarray, cluster_sizes: List[Tuple[int, int]], spectrum_idx: int = 0, polish_centroids: bool = False) Tuple[List[Dict] | None, Any, Any]¶
Calculate cluster centroids and perform forward simulation.
This is a pure data transformation function that returns simulation results. The orchestrator (clustering.py) handles visualization.
- Parameters:
app – VIPR application instance
samples – Parameter samples (num_samples, num_params)
cluster_labels – Cluster assignments
cluster_sizes – List of (label, size) tuples sorted by size
spectrum_idx – Spectrum index (always 0 in single-spectrum mode)
- Returns:
Tuple of (centroid_results, q_values, original_data) - centroid_results: List of dicts with centroid info and simulated curves - q_values: Q-values for plotting - original_data: Original experimental data (if available) Returns (None, None, None) if simulation fails
vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation copy module¶
vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation2 module¶
Forward Simulation Module.
Handles centroid calculation and Reflectorch forward simulation. Contains only simulation logic - visualization is in visualization.py.
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation2.simulate_centroids(app, samples: ndarray, cluster_labels: ndarray, cluster_sizes: List[Tuple[int, int]], spectrum_idx: int = 0, polish_centroids: bool = False) Tuple[List[Dict] | None, Any, Any]¶
Calculate cluster centroids and perform forward simulation.
This is a pure data transformation function that returns simulation results. The orchestrator (clustering.py) handles visualization.
- Parameters:
app – VIPR application instance
samples – Parameter samples (num_samples, num_params)
cluster_labels – Cluster assignments
cluster_sizes – List of (label, size) tuples sorted by size
spectrum_idx – Spectrum index (always 0 in single-spectrum mode)
- Returns:
Tuple of (centroid_results, q_values, original_data) - centroid_results: List of dicts with centroid info and simulated curves - q_values: Q-values for plotting - original_data: Original experimental data (if available) Returns (None, None, None) if simulation fails
vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation3 module¶
Forward Simulation Module.
Handles centroid calculation and Reflectorch forward simulation. Contains only simulation logic - visualization is in visualization.py.
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.simulation3.simulate_centroids(app, samples: ndarray, cluster_labels: ndarray, cluster_sizes: List[Tuple[int, int]], spectrum_idx: int = 0, polish_centroids: bool = False) Tuple[List[Dict] | None, Any, Any]¶
Calculate cluster centroids and perform forward simulation.
This is a pure data transformation function that returns simulation results. The orchestrator (clustering.py) handles visualization.
- Parameters:
app – VIPR application instance
samples – Parameter samples (num_samples, num_params)
cluster_labels – Cluster assignments
cluster_sizes – List of (label, size) tuples sorted by size
spectrum_idx – Spectrum index (always 0 in single-spectrum mode)
- Returns:
Tuple of (centroid_results, q_values, original_data) - centroid_results: List of dicts with centroid info and simulated curves - q_values: Q-values for plotting - original_data: Original experimental data (if available) Returns (None, None, None) if simulation fails
Module contents¶
Clustering Module.
Provides clustering algorithms, orchestration, and visualization for posterior samples.
- class vipr_reflectometry.flow_models.postprocess.cluster.clustering.ClusterHook(app: VIPR)¶
Bases:
objectThin adapter between VIPR filter hook and clustering processor.
Responsibilities: - Register as VIPR filter - Validate input data - Delegate to service layer
- class vipr_reflectometry.flow_models.postprocess.cluster.clustering.ClusterProcessor(app)¶
Bases:
objectMain clustering orchestrator that coordinates the workflow.
Responsibilities: - Handle batch processing (multiple spectra) - Coordinate algorithm execution (delegates to algorithms module) - Coordinate visualization and simulation
This class is a pure orchestrator - it doesn’t implement algorithms itself.
- process(data: Dict[str, Any], method: str = 'gmm', n_components: int | None = 5, min_cluster_size: int = 50, min_samples: int = 10, seed: int | None = None, n_init: int = 10, covariance_type: str = 'full', weight_concentration_prior: float | None = None, weight_concentration_prior_type: str = 'dirichlet_process', **_) Dict[str, Any]¶
Process clustering for a single spectrum.
- Parameters:
data – Prediction results
method – Clustering method (‘hdbscan’ or ‘gmm’)
n_components – Number of GMM components
min_cluster_size – Minimum HDBSCAN cluster size
min_samples – Minimum HDBSCAN samples
- Returns:
Enriched data with cluster information
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.apply_gmm(samples: ndarray, n_components: int | None = None, logger=None, *, seed: int | None = None, k_max: int = 8, reg_covar: float = 1e-06, n_init: int = 3, covariance_type: str = 'full', whiten: bool = True) tuple[ndarray, Dict[str, Any]]¶
- Pure GMM clustering without pruning or noise detection.
K selection via BIC (if None)
Hard assignment via gmm.predict()
No -1/noise labels, no min_cluster_size filtering
Optional standardization for stable BIC-based K selection
- Parameters:
samples – Parameter samples (N, D)
n_components – Fixed K (None = auto via BIC)
logger – Logger instance
seed – Random seed for reproducible GMM fitting
k_max – Maximum K to test for BIC
reg_covar – Covariance regularization
n_init – GMM initializations
whiten – Apply StandardScaler for equal parameter weighting
- Returns:
labels: Cluster labels array (0 to K-1, never -1)
gmm_params: Dict with ‘weights’, ‘means’, ‘covariances’ (the k Gaussians)
- Return type:
Tuple of (labels, gmm_params) where
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.apply_hdbscan(samples: ndarray, min_cluster_size: int, min_samples: int, logger=None, *, seed: int | None = None, whiten: bool = True) ndarray¶
HDBSCAN clustering with optional standardization.
- Parameters:
samples – Array of shape (num_samples, num_params)
min_cluster_size – Minimum number of samples per cluster
min_samples – Minimum number of samples for core points
logger – Optional logger for debug messages
seed – Random seed for reproducibility (unused - HDBSCAN is deterministic)
whiten – Apply StandardScaler for equal parameter weighting
- Returns:
Array of cluster labels (-1 for noise)
- vipr_reflectometry.flow_models.postprocess.cluster.clustering.calculate_validation_metrics(samples: ndarray, cluster_labels: ndarray, logger=None) Dict[str, Any]¶
Calculate validation metrics on scaled features.
Metrics are calculated on standardized features to match the clustering feature space when whiten=True was used.
- Parameters:
samples – Parameter samples (num_samples, num_params)
cluster_labels – Cluster assignments
logger – Optional logger for warnings
- Returns:
Dict with validation metrics and quality assessment