utils

check_observations(observations, required_columns, allow_overlapping=False, allow_unsorted=False)

Checks that the observations are valid.

Parameters:
  • observations (DataFrame) – The observations to check.

  • required_columns (Iterable[str]) – The columns that are required in the observations.

  • allow_overlapping (bool, default: False) – Whether overlapping intervals are allowed.

  • allow_unsorted (bool, default: False) – Whether unsorted intervals are allowed.

Return type:

DataFrame

Returns:

The checked observations.

Raises:
  • ValueError – If the observations are missing required columns.

  • ValueError – If the observations are not sorted by ‘start’.

  • ValueError – If the observations are overlapping.

ensure_matching_index_columns(observations, reference_observations, index_columns)

Validates if two sets of observations have matching index columns.

Parameters:
  • observations (DataFrame) – The first set of observations.

  • reference_observations (DataFrame) – The second set of observations.

  • index_columns (tuple[str, ...]) – The columns to use as index.

Return type:

tuple[DataFrame, DataFrame]

Returns:

The validated observations.

ensure_single_index(observations, *, index_columns, drop=True)

Ensure that the observations DataFrame has a single index key combination.

Parameters:
  • observations (DataFrame) – The observations to validate.

  • index_columns (tuple[str, ...]) – The columns to use as the index.

  • drop (bool, default: True) – Whether to drop the index columns from the DataFrame.

Return type:

DataFrame

Returns:

The validated observations.

infill_observations(observations, observation_stop=None, *, background_category='none')

Infill observations with intervals of the background category.

Parameters:
  • observations (DataFrame) – The observations to infill.

  • observation_stop (int | None, default: None) – The stop time of the observations. If none, the maximum stop time of the observations is used.

  • background_category (str, default: 'none') – The category to use for the background intervals.

Return type:

DataFrame

Returns:

The infilled observations.

remove_overlapping_observations(observations, *, index_columns, priority_function, max_allowed_overlap, drop_overlapping=True, drop_overlapping_column=True)

Removes overlapping observations.

Parameters:
  • observations (DataFrame) – The set of observations.

  • index_columns (tuple[str, ...]) – The columns to use as index.

  • priority_function (Callable[[DataFrame], Iterable[float]]) – A function that assigns a priority to each observation, lower values indicate higher priority.

  • max_allowed_overlap (float) – The maximum allowed overlap between observations.

  • drop_overlapping (bool, default: True) – Whether to drop overlapping observations.

  • drop_overlapping_column (bool, default: True) – Whether to drop the overlapping column.

Return type:

DataFrame

Returns:

Non-overlapping observations.

to_observations(y, category_names, drop=None, timestamps=None)

Convert a 1D array of category labels to a DataFrame of observations.

Parameters:
  • y (ndarray) – A 1D array of category labels.

  • category_names (Iterable[str]) – Category names.

  • drop (Iterable[str] | None, default: None) – Categories that should be dropped from the resulting observations.

  • timestamps (ndarray | None, default: None) – Timestamps that correspond to the category labels. If not provided, timestamps are starting from 0.

Return type:

DataFrame

Returns:

Observations with columns “start”, “stop”, and “category”.

to_y(observations, *, start=0, stop=None, dtype=<class 'str'>)

Convert observations to a 1D array of category labels.

Parameters:
  • observations (DataFrame) – Observations, requires columns “start”, “stop”, and “category”.

  • start (int, default: 0) – Start timestamp.

  • stop (int | None, default: None) – Stop timestamp.

  • dtype (type, default: <class 'str'>) – Data type of the output array.

Return type:

ndarray

Returns:

A 1D array of category labels.

with_duration(func)

Decorator to add a ‘duration’ column to the output of a function that returns a DataFrame.

Return type:

Callable[[ParamSpec(P)], DataFrame]

Parameters:

func (Callable[[P], DataFrame])