postprocessing

class ParameterSuggestionFunction(*args, **kwargs)

Bases: Protocol

Protocol for parameter suggestion functions.

Parameters:
  • trial (Trial) – The trial object.

  • *args – Additional arguments.

  • categories (Iterable[str]) – Category names.

  • **kwargs – Additional keyword arguments.

Returns:

Parameters for postprocessing.

Return type:

PostprocessingParameters

class PostprocessingFunction(*args, **kwargs)

Bases: Protocol, Generic

Protocol for postprocessing functions.

Parameters:
Returns:

The result after postprocessing.

Return type:

(ClassificationResult | GroupClassificationResult | DatasetClassificationResult)

See also

PostprocessingParameters to specify the two required keyword arguments with **postprocessing_parameters.

class PostprocessingParameters

Bases: TypedDict

Typed dictionary that holds parameters for postprocessing.

decision_thresholds: Iterable[float]
postprocessing_parameters: dict[str, Any]
optimize_postprocessing_parameters(dataset, extractor, classifier, postprocessing_function, suggest_postprocessing_parameters_function, *, num_trials, k, sampling_function, balance_sample_weights=True, experiment, optimize_across_runs=False, parallel_optimization=False, log=None)

Sequentially perform a k-fold prediction experiment and use the results to optimize postprocessing parameters.

See also

Parameters:
  • dataset (AnnotatedDataset) – The dataset to use.

  • extractor (BaseExtractor[TypeVar(F, bound= Shaped)]) – The extractor to use for feature extraction.

  • classifier (Any) – The classifier to use for classification.

  • postprocessing_function (PostprocessingFunction) – The postprocessing function to use.

  • suggest_postprocessing_parameters_function (ParameterSuggestionFunction) – A callable that suggests postprocessing parameters.

  • num_trials (int) – The number of trials to perform in the Optuna optimization study.

  • k (int) – The number of folds to use for the k-fold experiment.

  • sampling_function (SamplingFunction) – The sampling function to use during k-fold prediction.

  • balance_sample_weights (bool, default: True) – Whether to balance the sample weights during model fitting.

  • experiment (Experiment) – The experiment to use for the experiment (specifies number of runs and random state).

  • optimize_across_runs (bool, default: False) – Whether to optimize postprocessing parameters across experiment runs, or for each run individually.

  • parallel_optimization (bool, default: False) – Whether to perform the Optuna optimization study/studies in parallel.

  • log (Logger | None, default: None) – The Loguru logger to use for the experiment.

Return type:

list[Study] | Study

Returns:

The list of Optuna studies (optimize_across_runs=False) or alternatively, a single study.

optuna_parameter_optimization(classification_result, *, num_trials, random_state, postprocessing_function, suggest_postprocessing_parameters_function, parallel_optimization, experiment, log)

Perform a parameter optimization study using Optuna.

Parameters:
Return type:

Study

Returns:

The Optuna study.

optuna_score_postprocessing_trial(classification_result, trial, *, postprocessing_function, postprocessing_parameters, loop_log)

Run a single Trial to evaluate postprocessing parameters for a specified postprocessing function.

Parameters:
Return type:

float

Returns:

The score of the postprocessed classification result, calculated as the average of ‘timestamp’, ‘annotation’ and ‘prediction’ macro F1 scores.

run_k_fold_experiment(dataset, extractor, classifier, *, k, sampling_function, balance_sample_weights=True, experiment, log=None, cache=True)

Perform a k-fold prediction experiment on the given dataset.

Parameters:
  • dataset (AnnotatedDataset) – The dataset to perform the experiment on.

  • extractor (BaseExtractor[TypeVar(F, bound= Shaped)]) – The extractor to use for feature extraction.

  • classifier (Any) – The classifier to use for classification.

  • k (int) – The number of folds to use.

  • sampling_function (SamplingFunction) – The sampling function to use.

  • balance_sample_weights (bool, default: True) – Whether to balance sample weights for model fitting.

  • experiment (Experiment) – The experiment to run, can also be a DistributedExperiment.

  • log (Logger | None, default: None) – The logger to use.

  • cache (bool, default: True) – Whether to cache the results.

Return type:

list[DatasetClassificationResult] | list[str]

Returns:

The results of the experiment (as list of cache files if cache=True)

summarize_experiment(studies, *, results_file='optimization-results.yaml', summary_file='optimization-summary.yaml', trials_file='optimization-trials.csv', log=None)

Summarize one or more Optuna studies resulting from an optimization experiment.

Parameters:
  • studies (list[Study] | Study) – One or more Optuna studies to summarize.

  • results_file (str, default: 'optimization-results.yaml') – Path to the file where the results will be saved.

  • summary_file (str, default: 'optimization-summary.yaml') – Path to the file where the summary will be saved.

  • trials_file (str, default: 'optimization-trials.csv') – Path to the file where the trials will be saved.

  • log (Logger | None, default: None) – Loguru logger to use for logging.

Returns:

None