# Filter Documentation

Filters transform data during the inference pipeline. They are activated in YAML configurations and can be chained together.

## Quick Start for Reflectometry

**For most users: No action needed!**
- ✅ Standard workflow uses **Reflectorch Interpolation** (enabled by default)
- ✅ No normalization required (Reflectorch has built-in scaling)
- ✅ Just run: `vipr --config @vipr_reflectometry/reflectorch/examples/configs/Ni500.yaml inference run`

**Optional: Clean noisy data**

Add `NeutronDataCleaner` to remove high-error points:
```yaml
filters:
  INFERENCE_PREPROCESS_PRE_FILTER:
  - class: vipr_reflectometry.shared.preprocessing.neutron_data_cleaner.NeutronDataCleaner
    enabled: true
    weight: -10
    parameters:
      error_threshold: 0.5
```

**Discover available filters:**
```bash
vipr discovery filters
```

📖 *For detailed information, see sections below.*

---

## Available Filters

### Normalization Filters (INFERENCE_NORMALIZE_PRE_FILTER)

**Execution Order:** After data loading, before preprocessing

**Note:** Normalization is typically **not required** for reflectometry data in the example configs, as the data is already in the correct format for the models. These filters are provided for other use cases or custom workflows.

#### MinMaxNormalizer
Scales intensity values to [0,1] range.

**Formula:**
```
y_norm = (y - min(y)) / (max(y) - min(y))
dy_norm = dy / (max(y) - min(y))
```

**Default:** Disabled (`enabled_in_config=False`)

**YAML Configuration:**
```yaml
vipr:
  inference:
    filters:
      INFERENCE_NORMALIZE_PRE_FILTER:
      - class: vipr.plugins.normalizers.minmax_normalizer.MinMaxNormalizer
        enabled: true
        method: normalize_filter
        weight: 0
```

#### ZScoreNormalizer
Standardizes data to mean=0, standard deviation=1.

**Formula:**
```
y_norm = (y - mean(y)) / std(y)
dy_norm = dy / std(y)
```

**Default:** Disabled (`enabled_in_config=False`)

**YAML Configuration:**
```yaml
vipr:
  inference:
    filters:
      INFERENCE_NORMALIZE_PRE_FILTER:
      - class: vipr.plugins.normalizers.zscore_normalizer.ZScoreNormalizer
        enabled: true
        method: normalize_filter
        weight: 0
```

#### LogNormalizer
Logarithmic transformation of intensity values.

**Formula:**
```
y_norm = log(y + offset)  # offset if y ≤ 0
dy_norm = dy / (y + offset)
```

**Default:** Disabled (`enabled_in_config=False`)

**YAML Configuration:**
```yaml
vipr:
  inference:
    filters:
      INFERENCE_NORMALIZE_PRE_FILTER:
      - class: vipr.plugins.normalizers.log_normalizer.LogNormalizer
        enabled: true
        method: normalize_filter
        weight: 0
```

---

### Preprocessing Filters (INFERENCE_PREPROCESS_PRE_FILTER)

**Execution Order:** After normalization, before prediction

#### NeutronDataCleaner
Cleans experimental neutron reflectometry data.

**Functions:**
1. Removes points with negative intensity (R < 0)
2. Filters/truncates curves at consecutive high-error points

**Default:** Disabled (`enabled_in_config=False`)

**Parameters:**
- `error_threshold` (float, default=0.5): Relative error threshold dR/R (range: 0.0-1.0)
- `consecutive_errors` (int, default=3): Number of consecutive high-error points to trigger truncation (minimum: 1)
- `remove_single_errors` (bool, default=false): Remove isolated high-error points before truncation

**YAML Configuration:**
```yaml
vipr:
  inference:
    filters:
      INFERENCE_PREPROCESS_PRE_FILTER:
      - class: vipr_reflectometry.shared.preprocessing.neutron_data_cleaner.NeutronDataCleaner
        enabled: true
        method: clean_experimental_data
        weight: -10
        parameters:
          error_threshold: 0.5
          consecutive_errors: 3
          remove_single_errors: false
```

**Note:** `weight: -10` ensures execution before interpolation.

#### Reflectorch Interpolation
Interpolates experimental curves to the model Q-grid.

**Functions:**
- Q-grid interpolation (logarithmic for reflectivity)
- Propagates Q-resolution (dQ) and intensity errors (dR)
- Batch processing

**Default:** **Enabled** (`enabled_in_config=True`) - Standard for Reflectorch workflows

**YAML Configuration:**
```yaml
vipr:
  inference:
    filters:
      INFERENCE_PREPROCESS_PRE_FILTER:
      - class: vipr_reflectometry.reflectorch.reflectorch_extension.Reflectorch
        enabled: true
        method: _preprocess_interpolate
        weight: 0
```

#### FlowPreprocessor
Preprocessing for flow models (CINN, NSF, MAF).

**Functions:**
- Q-grid interpolation
- Flow-specific curve scaling
- Tensor formatting for inverse sampling

**Default:** Disabled (`enabled_in_config=False`)

**YAML Configuration:**
```yaml
vipr:
  inference:
    filters:
      INFERENCE_PREPROCESS_PRE_FILTER:
      - class: vipr_reflectometry.flow_models.flow_preprocessor.FlowPreprocessor
        enabled: true
        method: _preprocess_flow
        weight: 0
```

---

## Filter Chaining

Filters are executed in order by `weight` (lower values first):

```yaml
vipr:
  inference:
    filters:
      INFERENCE_PREPROCESS_PRE_FILTER:
      # 1. First: Data cleaning (weight: -10)
      - class: vipr_reflectometry.shared.preprocessing.neutron_data_cleaner.NeutronDataCleaner
        enabled: true
        method: clean_experimental_data
        weight: -10
        parameters:
          error_threshold: 0.5
          consecutive_errors: 3
      
      # 2. Then: Interpolation (weight: 0)
      - class: vipr_reflectometry.reflectorch.reflectorch_extension.Reflectorch
        enabled: true
        method: _preprocess_interpolate
        weight: 0
```

---

## Practical Examples

### Standard Workflow (interpolation only)
```bash
vipr --config @vipr_reflectometry/reflectorch/examples/configs/Ni500.yaml inference run
```

Config uses only Reflectorch Interpolation (default).

### Quality Filtering + Interpolation
```bash
vipr --config @vipr_reflectometry/reflectorch/examples/configs/D17_SiO.yaml inference run
```

Config uses:
1. NeutronDataCleaner (removes problematic points)
2. Reflectorch Interpolation (interpolates cleaned data)

### Filter Discovery
Show all available filters:
```bash
vipr discovery filters
```

---

## Best Practices

### When to use NeutronDataCleaner?
- For noisy experimental data
- When curves have high error bars at the end
- With negative intensity values

### Parameter Tuning for NeutronDataCleaner
- **error_threshold=0.5**: Standard for neutron reflectometry (50% relative error)
- **consecutive_errors=3**: Balance between noise tolerance and data loss
- **remove_single_errors=false**: Preserves curve structure, removes only consecutive issues

### Normalization
- **Not required** for reflectometry data with Reflectorch models
- **Why no normalization for Reflectorch?**
  - Reflectorch has **built-in scaling** (`LogAffineCurvesScaler`) that applies `log10(R + eps) * weight + bias`
  - Models expect **raw reflectivity values** as measured experimentally (typically from ~1 down to measurement sensitivity limit)
  - The **absolute scale contains physical information** needed for accurate predictions
  - External normalization would interfere with the model's trained scaling transformation
- **Use cases for normalization filters:**
  - **LogNormalizer**: For custom models or other domains with data spanning multiple orders of magnitude
  - **MinMax/ZScore**: When adapting VIPR for other domains or training custom ML models

### Uncertainty Propagation
Filters that transform data values also transform measurement uncertainties (dQ for Q-values, dR for reflectivity) using standard error propagation formulas.

---

## Notes

- **enabled_in_config**: Determines if filter is enabled by default in generated standard configs
- **weight**: Determines execution order (lower values = earlier execution)
- **DataSet**: Filters work with immutable DataSet objects (Pydantic)
- **Batch Processing**: All filters support batch processing of multiple spectra