vipr_reflectometry.shared.load_data.readers package

Submodules

vipr_reflectometry.shared.load_data.readers.experimental_data_manager module

class vipr_reflectometry.shared.load_data.readers.experimental_data_manager.ExperimentalDataManager(file_path, dataset_name, num_params=8, _preloaded_full_file_dict=None)

Bases: object

dataset: AbstractExperimentalDataset | None
extract_fit_parameters()

Extracts and processes the fit parameters.

get_q_model(q_generator)

Generates or retrieves the q_model based on the trainer.

Parameters:

q_generator (QGenerator) – Generator for model q values.

Returns:

Array of q_model values.

Return type:

torch.Tensor

get_reshaped_experiment_data()

Retrieves and reshapes experimental q values and curves. :returns:

(q_exp, curve_exp, curve_errors, q_errors) where:
  • q_exp is as stored in the dataset.

  • curve_exp is reshaped to 2D if necessary.

  • curve_errors is reshaped to 2D if available, None otherwise.

  • q_errors is reshaped to 2D if available, None otherwise.

Return type:

tuple

get_scaled_data(q_generator, prior_sampler, curves_scaler)

Retrieves scaled experimental parameters and curves.

Parameters:
  • q_generator (QGenerator) – Object for generating q values.

  • prior_sampler (PriorSampler) – Handles parameter scaling and sampling.

  • curves_scaler (CurvesScaler) – Scales experimental curves for model training.

Returns:

(scaled_params, scaled_curves, q_values, unscaled_params, unscaled_curves).

Return type:

tuple

interpolate_experimental_curves(q_generator, curves_scaler, device)

Interpolates experimental reflectivity curves.

Parameters:
  • q_generator (QGenerator) – Object for generating q values.

  • curves_scaler (CurvesScaler) – Object to scale interpolated curves.

  • device (torch.device) – Device where the tensors are stored (e.g., ‘cpu’ or ‘cuda’).

Returns:

(unscaled_curve, scaled_curve)

unscaled_curve (torch.Tensor): The interpolated experimental curve (raw, unscaled). scaled_curve (torch.Tensor): The experimental curve after scaling.

Return type:

tuple

static interpolate_reflectivity(q_model, q_exp, curve_exp)

Interpolates reflectivity curves between experimental and model q values.

Parameters:
  • q_model (np.ndarray) – Model q values.

  • q_exp (np.ndarray) – Experimental q values.

  • curve_exp (np.ndarray) – Experimental curve data.

Returns:

Interpolated curve.

Return type:

np.ndarray

prepare_unscaled_params(device)

Prepares the unscaled fit parameters.

Parameters:

device (torch.device) – Device to store the unscaled parameters.

Returns:

Tensor of unscaled parameters.

Return type:

torch.Tensor

split_dataset(q_generator, prior_sampler, curves_scaler, test_size=0.2, random_seed=None, allow_single_sample=False)

Splits the experimental dataset into training and testing subsets.

Parameters:
  • q_generator (QGenerator) – Generates q values for reflectivity calculations.

  • prior_sampler (PriorSampler) – Handles parameter scaling and sampling.

  • curves_scaler (CurvesScaler) – Scales experimental curves for model training.

  • test_size (float) – Proportion of the dataset to include in the test split (e.g., 0.2 for 20% test).

  • random_seed (int or None) – Random seed for reproducibility. If None, no seed is set.

  • allow_single_sample (bool) – If True and the dataset contains only one sample,

  • manager. (return self as the test manager and None as the train)

Returns:

(train_manager, test_manager) where each manager’s processed_data contains:
  • scaled_params, scaled_curves, q_values,

  • unscaled_params, unscaled_curves.

Return type:

tuple

vipr_reflectometry.shared.load_data.readers.experimental_datasets module

class vipr_reflectometry.shared.load_data.readers.experimental_datasets.AbstractExperimentalDataset(file_path: str, dataset_name: str, preloaded_raw_data_for_this_dataset: Dict[str, Any] | None = None)

Bases: ABC

Defines the interface for experimental dataset access.

ensure_batch_shape(value, batch_size)

Ensures that a value has the correct batch shape.

abstract extract_fit_parameters() Dict[str, Any]

Extracts and returns the fit parameters.

abstract get_experiment_data() Dict[str, Any]

Returns the experiment data (containing ‘q’ and ‘data’).

raw_data: Dict[str, Any]
class vipr_reflectometry.shared.load_data.readers.experimental_datasets.MariaExperimentalDataset(file_path: str, dataset_name: str, preloaded_raw_data_for_this_dataset: Dict[str, Any] | None = None)

Bases: AbstractExperimentalDataset

extract_fit_parameters()

Extracts and returns the fit parameters.

get_experiment_data()

Returns the experiment data (containing ‘q’ and ‘data’).

class vipr_reflectometry.shared.load_data.readers.experimental_datasets.XrrExperimentalDataset(file_path: str, dataset_name: str, preloaded_raw_data_for_this_dataset: Dict[str, Any] | None = None)

Bases: AbstractExperimentalDataset

extract_fit_parameters()

Extracts and returns the fit parameters.

get_experiment_data()

Returns the experiment data (containing ‘q’ and ‘data’).

vipr_reflectometry.shared.load_data.readers.experimental_datasets.detect_format(file_path, dataset_name, preloaded_full_file_dict=None)
vipr_reflectometry.shared.load_data.readers.experimental_datasets.discover_experimental_datasets(file_path)

Scans an HDF5 file and identifies groups that likely represent experimental datasets compatible with ExperimentalDataManager.

Parameters:

file_path (str) – Path to the HDF5 file.

Returns:

A list of names of the discovered dataset groups.

Return type:

list[str]

vipr_reflectometry.shared.load_data.readers.experimental_datasets.load_experimental_dataset(file_path: str, dataset_name: str, preloaded_full_file_dict: Dict[str, Any] | None = None) AbstractExperimentalDataset

vipr_reflectometry.shared.load_data.readers.hdf5_cache module

HDF5 Caching for Reflectometry Data

Process-persistent caching functionality for HDF5 files using VIPR’s cache infrastructure. This module is co-located with HDF5-related logic in the flow_models plugin, following the same pattern as streaming_handler.py.

vipr_reflectometry.shared.load_data.readers.hdf5_cache.clear_hdf5_cache(file_path: str | None = None)

Clear HDF5 cache entries.

Parameters:

file_path – Optional specific file to clear. If None, clears all HDF5 cache entries.

vipr_reflectometry.shared.load_data.readers.hdf5_cache.get_hdf5_cache_info()

Get information about current HDF5 cache entries.

Returns:

Dict with cache statistics

vipr_reflectometry.shared.load_data.readers.hdf5_cache.get_or_load_hdf5_data(file_path: str)

Load HDF5 file data with process-persistent caching.

This function provides intelligent caching for HDF5 files by: - Using file modification time for automatic cache invalidation - Leveraging VIPR’s existing process cache infrastructure - Working across both VIPR-Core and FastAPI contexts

Parameters:

file_path – Path to the HDF5 file

Returns:

Loaded HDF5 data (dict-like structure from nxtodict)

Raises:

vipr_reflectometry.shared.load_data.readers.spectra_reader module

Spectra Reader Adapter for Reflectometry Data

Provides unified API for different reflectometry data formats using adapter pattern: - HDF5 files (using ExperimentalDataManager) - CSV/DAT files (Maria format)

Architecture: SpectraReader (manager) delegates to format-specific adapters (HDF5Adapter/CSVAdapter). Adapters create lightweight SpectrumProxy objects that load data explicitly via resolve() method.

Usage:

reader = SpectraReader(“path/to/data”) proxies = reader.list() # Get lightweight proxies proxy = proxies[0] # Select first spectrum data = proxy.resolve() # Explicit data loading with caching q = data.q # Direct attribute access I = data.I # Direct attribute access dI = data.dI # Direct attribute access (can be None)

class vipr_reflectometry.shared.load_data.readers.spectra_reader.BaseDataHandle(*, format: Literal['hdf5', 'csv'])

Bases: BaseModel

Base model shared by all data handles.

format: Literal['hdf5', 'csv']
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class vipr_reflectometry.shared.load_data.readers.spectra_reader.CSVAdapter(file_path: str, column_mapping: dict | None = None)

Bases: SpectraAdapter

Adapter for single CSV/DAT/TXT files with configurable column mapping.

datasets()

CSV has no datasets/groups - return empty list.

fetch(data_handle: CSVDataHandle) SpectrumData

Optimized fetch that reads the file once (and caches it), then extracts all required data columns using a clean helper method.

list(dataset=None)

Create SpectrumProxy object for the single spectrum.

size(dataset=None)

Number of spectra - always 1 for single file.

class vipr_reflectometry.shared.load_data.readers.spectra_reader.CSVDataHandle(*, format: Literal['csv'] = 'csv')

Bases: BaseDataHandle

Specific model for CSV data handles (no additional fields needed).

format: Literal['csv']
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class vipr_reflectometry.shared.load_data.readers.spectra_reader.HDF5Adapter(file_path: str)

Bases: SpectraAdapter

datasets()

List of all samples/datasets.

fetch(data_handle: HDF5DataHandle) SpectrumData

Optimized fetch that reads the entire dataset batch once (and caches it), then extracts the data for the requested spectrum index.

list(dataset=None)

Create SpectrumProxy objects for available spectra.

size(dataset=None)

Total count or count within a sample.

class vipr_reflectometry.shared.load_data.readers.spectra_reader.HDF5DataHandle(*, format: Literal['hdf5'] = 'hdf5', dataset_name: str, spectrum_index: int)

Bases: BaseDataHandle

Specific model for HDF5 data handles.

dataset_name: str
format: Literal['hdf5']
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

spectrum_index: int
class vipr_reflectometry.shared.load_data.readers.spectra_reader.SpectraAdapter

Bases: ABC

Abstract base class for all spectra adapters.

abstract datasets() list[str]

Return all available datasets/groups.

abstract fetch(data_handle: HDF5DataHandle | CSVDataHandle) SpectrumData

Fetch all spectrum data (q, I, dI, dQ) in a single optimized call.

This method replaces the individual get_q/get_I/get_dI/get_dQ methods and performs all data loading in one operation to minimize I/O overhead.

Parameters:

data_handle – Data handle containing spectrum reference information

Returns:

Container with all spectrum data (q, I, dI, dQ)

Return type:

SpectrumData

abstract list(dataset: str | None = None) list[SpectrumProxy]

Return all spectra as SpectrumProxy objects.

abstract size(dataset: str | None = None) int

Return number of spectra (total or for specific dataset).

class vipr_reflectometry.shared.load_data.readers.spectra_reader.SpectraReader(data_path: str, column_mapping: dict | None = None)

Bases: object

Adapter for various reflectometry data formats.

datasets() list[str]

HDF5 only: List of samples/datasets; empty list for CSV.

get(dataset: str, index: int) SpectrumProxy

Direct access to spectrum proxy.

list(dataset: str | None = None) list[SpectrumProxy]
size(dataset: str | None = None) int
class vipr_reflectometry.shared.load_data.readers.spectra_reader.SpectrumData(q: ndarray, I: ndarray, dI: ndarray | None = None, dQ: ndarray | None = None)

Bases: object

Pure data container for spectrum data.

This dataclass holds the actual loaded spectrum data and provides direct attribute access to Q values, intensities, and errors.

I: ndarray
dI: ndarray | None = None
dQ: ndarray | None = None
q: ndarray
class vipr_reflectometry.shared.load_data.readers.spectra_reader.SpectrumProxy(adapter: SpectraAdapter, data_handle: HDF5DataHandle | CSVDataHandle)

Bases: object

Lightweight proxy for a single spectrum with explicit loading.

This proxy doesn’t contain actual spectrum data, but knows how to fetch it explicitly when resolve() is called. The loaded data is cached for performance. This design makes expensive I/O operations explicit and avoids hidden costs.

resolve() SpectrumData

Load the actual spectrum data from the source.

This is the primary method to explicitly load data. The result is cached so subsequent calls return the same SpectrumData object without reloading.

Returns:

Container with q, I, dI, dQ arrays

Return type:

SpectrumData

Module contents