# Filter Documentation Filters transform data during the inference pipeline. They are activated in YAML configurations and can be chained together. ## Quick Start for Reflectometry **For most users: No action needed!** - ✅ Standard workflow uses **Reflectorch Interpolation** (enabled by default) - ✅ No normalization required (Reflectorch has built-in scaling) - ✅ Just run: `vipr --config @vipr_reflectometry/reflectorch/examples/configs/Ni500.yaml inference run` **Optional: Clean noisy data** Add `NeutronDataCleaner` to remove high-error points: ```yaml filters: INFERENCE_PREPROCESS_PRE_FILTER: - class: vipr_reflectometry.shared.preprocessing.neutron_data_cleaner.NeutronDataCleaner enabled: true weight: -10 parameters: error_threshold: 0.5 ``` **Discover available filters:** ```bash vipr discovery filters ``` 📖 *For detailed information, see sections below.* --- ## Available Filters ### Normalization Filters (INFERENCE_NORMALIZE_PRE_FILTER) **Execution Order:** After data loading, before preprocessing **Note:** Normalization is typically **not required** for reflectometry data in the example configs, as the data is already in the correct format for the models. These filters are provided for other use cases or custom workflows. #### MinMaxNormalizer Scales intensity values to [0,1] range. **Formula:** ``` y_norm = (y - min(y)) / (max(y) - min(y)) dy_norm = dy / (max(y) - min(y)) ``` **Default:** Disabled (`enabled_in_config=False`) **YAML Configuration:** ```yaml vipr: inference: filters: INFERENCE_NORMALIZE_PRE_FILTER: - class: vipr.plugins.normalizers.minmax_normalizer.MinMaxNormalizer enabled: true method: normalize_filter weight: 0 ``` #### ZScoreNormalizer Standardizes data to mean=0, standard deviation=1. **Formula:** ``` y_norm = (y - mean(y)) / std(y) dy_norm = dy / std(y) ``` **Default:** Disabled (`enabled_in_config=False`) **YAML Configuration:** ```yaml vipr: inference: filters: INFERENCE_NORMALIZE_PRE_FILTER: - class: vipr.plugins.normalizers.zscore_normalizer.ZScoreNormalizer enabled: true method: normalize_filter weight: 0 ``` #### LogNormalizer Logarithmic transformation of intensity values. **Formula:** ``` y_norm = log(y + offset) # offset if y ≤ 0 dy_norm = dy / (y + offset) ``` **Default:** Disabled (`enabled_in_config=False`) **YAML Configuration:** ```yaml vipr: inference: filters: INFERENCE_NORMALIZE_PRE_FILTER: - class: vipr.plugins.normalizers.log_normalizer.LogNormalizer enabled: true method: normalize_filter weight: 0 ``` --- ### Preprocessing Filters (INFERENCE_PREPROCESS_PRE_FILTER) **Execution Order:** After normalization, before prediction #### NeutronDataCleaner Cleans experimental neutron reflectometry data. **Functions:** 1. Removes points with negative intensity (R < 0) 2. Filters/truncates curves at consecutive high-error points **Default:** Disabled (`enabled_in_config=False`) **Parameters:** - `error_threshold` (float, default=0.5): Relative error threshold dR/R (range: 0.0-1.0) - `consecutive_errors` (int, default=3): Number of consecutive high-error points to trigger truncation (minimum: 1) - `remove_single_errors` (bool, default=false): Remove isolated high-error points before truncation **YAML Configuration:** ```yaml vipr: inference: filters: INFERENCE_PREPROCESS_PRE_FILTER: - class: vipr_reflectometry.shared.preprocessing.neutron_data_cleaner.NeutronDataCleaner enabled: true method: clean_experimental_data weight: -10 parameters: error_threshold: 0.5 consecutive_errors: 3 remove_single_errors: false ``` **Note:** `weight: -10` ensures execution before interpolation. #### Reflectorch Interpolation Interpolates experimental curves to the model Q-grid. **Functions:** - Q-grid interpolation (logarithmic for reflectivity) - Propagates Q-resolution (dQ) and intensity errors (dR) - Batch processing **Default:** **Enabled** (`enabled_in_config=True`) - Standard for Reflectorch workflows **YAML Configuration:** ```yaml vipr: inference: filters: INFERENCE_PREPROCESS_PRE_FILTER: - class: vipr_reflectometry.reflectorch.reflectorch_extension.Reflectorch enabled: true method: _preprocess_interpolate weight: 0 ``` #### FlowPreprocessor Preprocessing for flow models (CINN, NSF, MAF). **Functions:** - Q-grid interpolation - Flow-specific curve scaling - Tensor formatting for inverse sampling **Default:** Disabled (`enabled_in_config=False`) **YAML Configuration:** ```yaml vipr: inference: filters: INFERENCE_PREPROCESS_PRE_FILTER: - class: vipr_reflectometry.flow_models.flow_preprocessor.FlowPreprocessor enabled: true method: _preprocess_flow weight: 0 ``` --- ## Filter Chaining Filters are executed in order by `weight` (lower values first): ```yaml vipr: inference: filters: INFERENCE_PREPROCESS_PRE_FILTER: # 1. First: Data cleaning (weight: -10) - class: vipr_reflectometry.shared.preprocessing.neutron_data_cleaner.NeutronDataCleaner enabled: true method: clean_experimental_data weight: -10 parameters: error_threshold: 0.5 consecutive_errors: 3 # 2. Then: Interpolation (weight: 0) - class: vipr_reflectometry.reflectorch.reflectorch_extension.Reflectorch enabled: true method: _preprocess_interpolate weight: 0 ``` --- ## Practical Examples ### Standard Workflow (interpolation only) ```bash vipr --config @vipr_reflectometry/reflectorch/examples/configs/Ni500.yaml inference run ``` Config uses only Reflectorch Interpolation (default). ### Quality Filtering + Interpolation ```bash vipr --config @vipr_reflectometry/reflectorch/examples/configs/D17_SiO.yaml inference run ``` Config uses: 1. NeutronDataCleaner (removes problematic points) 2. Reflectorch Interpolation (interpolates cleaned data) ### Filter Discovery Show all available filters: ```bash vipr discovery filters ``` --- ## Best Practices ### When to use NeutronDataCleaner? - For noisy experimental data - When curves have high error bars at the end - With negative intensity values ### Parameter Tuning for NeutronDataCleaner - **error_threshold=0.5**: Standard for neutron reflectometry (50% relative error) - **consecutive_errors=3**: Balance between noise tolerance and data loss - **remove_single_errors=false**: Preserves curve structure, removes only consecutive issues ### Normalization - **Not required** for reflectometry data with Reflectorch models - **Why no normalization for Reflectorch?** - Reflectorch has **built-in scaling** (`LogAffineCurvesScaler`) that applies `log10(R + eps) * weight + bias` - Models expect **raw reflectivity values** as measured experimentally (typically from ~1 down to measurement sensitivity limit) - The **absolute scale contains physical information** needed for accurate predictions - External normalization would interfere with the model's trained scaling transformation - **Use cases for normalization filters:** - **LogNormalizer**: For custom models or other domains with data spanning multiple orders of magnitude - **MinMax/ZScore**: When adapting VIPR for other domains or training custom ML models ### Uncertainty Propagation Filters that transform data values also transform measurement uncertainties (dQ for Q-values, dR for reflectivity) using standard error propagation formulas. --- ## Notes - **enabled_in_config**: Determines if filter is enabled by default in generated standard configs - **weight**: Determines execution order (lower values = earlier execution) - **DataSet**: Filters work with immutable DataSet objects (Pydantic) - **Batch Processing**: All filters support batch processing of multiple spectra