postprocessing¶

class ParameterSuggestionFunction(*args, **kwargs)¶

Bases: Protocol

Protocol for parameter suggestion functions.

Parameters:

trial (Trial) – The trial object.
*args – Additional arguments.
categories (Iterable[str]) – Category names.
**kwargs – Additional keyword arguments.

Returns:

Parameters for postprocessing.

Return type:

PostprocessingParameters

class PostprocessingFunction(*args, **kwargs)¶

Bases: Protocol, Generic

Protocol for postprocessing functions.

Parameters:

result (ClassificationResult | GroupClassificationResult | DatasetClassificationResult) – The result to be postprocessed.
*args – Additional arguments.
postprocessing_parameters (dict[str, Any]) – Parameters for postprocessing.
decision_thresholds (Iterable[float]) – Category-specific decision thresholds.
**kwargs – Additional keyword arguments.

Returns:

The result after postprocessing.

Return type:

(ClassificationResult | GroupClassificationResult | DatasetClassificationResult)

See also

PostprocessingParameters to specify the two required keyword arguments with **postprocessing_parameters.

class PostprocessingParameters¶

Bases: TypedDict

Typed dictionary that holds parameters for postprocessing.

decision_thresholds: Iterable[float]¶

postprocessing_parameters: dict[str, Any]¶

optimize_postprocessing_parameters(dataset, extractor, classifier, postprocessing_function, suggest_postprocessing_parameters_function, *, num_trials, k, sampling_function, balance_sample_weights=True, experiment, optimize_across_runs=False, parallel_optimization=False, log=None)¶

Sequentially perform a k-fold prediction experiment and use the results to optimize postprocessing parameters.

See also

run_k_fold_experiment() to perform a k-fold prediction experiment.
optuna_parameter_optimization() to optimize postprocessing parameters on existing classification results.

Parameters:

dataset (AnnotatedDataset) – The dataset to use.
extractor (BaseExtractor[TypeVar(F, bound= Shaped)]) – The extractor to use for feature extraction.
classifier (Any) – The classifier to use for classification.
postprocessing_function (PostprocessingFunction) – The postprocessing function to use.
suggest_postprocessing_parameters_function (ParameterSuggestionFunction) – A callable that suggests postprocessing parameters.
num_trials (int) – The number of trials to perform in the Optuna optimization study.
k (int) – The number of folds to use for the k-fold experiment.
sampling_function (SamplingFunction) – The sampling function to use during k-fold prediction.
balance_sample_weights (bool, default: True) – Whether to balance the sample weights during model fitting.
experiment (Experiment) – The experiment to use for the experiment (specifies number of runs and random state).
optimize_across_runs (bool, default: False) – Whether to optimize postprocessing parameters across experiment runs, or for each run individually.
parallel_optimization (bool, default: False) – Whether to perform the Optuna optimization study/studies in parallel.
log (Logger | None, default: None) – The Loguru logger to use for the experiment.

Return type:

list[Study] | Study

Returns:

The list of Optuna studies (optimize_across_runs=False) or alternatively, a single study.

optuna_parameter_optimization(classification_result, *, num_trials, random_state, postprocessing_function, suggest_postprocessing_parameters_function, parallel_optimization, experiment, log)¶

Perform a parameter optimization study using Optuna.

Parameters:

classification_result (TypeVar(T, bound= ClassificationResult | GroupClassificationResult | DatasetClassificationResult) | str | list[TypeVar(T, bound= ClassificationResult | GroupClassificationResult | DatasetClassificationResult)] | list[str]) – The classification result(s) to postprocess, can also be passed as a string or a list of strings to read from cache files.
num_trials (int) – The number of Optuna trials to run.
random_state (int | Generator | None) – The random state to use for the optimization.
postprocessing_function (PostprocessingFunction) – The postprocessing function to use.
suggest_postprocessing_parameters_function (ParameterSuggestionFunction) – A callable that suggests postprocessing parameters for the specified postprocessing function.
parallel_optimization (bool) – Whether to run the Optuna study in parallel.
experiment (Experiment | None) – Should be specified in a distributed setting.
log (Logger) – The Loguru logger to use for logging.

Return type:

Returns:

The Optuna study.

optuna_score_postprocessing_trial(classification_result, trial, *, postprocessing_function, postprocessing_parameters, loop_log)¶

Run a single Trial to evaluate postprocessing parameters for a specified postprocessing function.

Parameters:

classification_result (TypeVar(T, bound= ClassificationResult | GroupClassificationResult | DatasetClassificationResult) | str | list[TypeVar(T, bound= ClassificationResult | GroupClassificationResult | DatasetClassificationResult)] | list[str]) – The classification result(s) to postprocess, can also be passed as a string or a list of strings to read from cache files.
trial (Trial) – One optuna trial of the current optuna study.
postprocessing_function (PostprocessingFunction[TypeVar(T, bound= ClassificationResult | GroupClassificationResult | DatasetClassificationResult)]) – The postprocessing function to use.
postprocessing_parameters (PostprocessingParameters | ParameterSuggestionFunction) – The postprocessing parameters to evaluate, can also be a callable that returns a dictionary of parameters given the optuna trial.
loop_log (tuple[Logger, str] | tuple[tuple[dict[str, Any], int], str]) – The logger and log name to use for logging. In a multiprocessing environment, the logger should be passed as a tuple of logger parameters (dict) and log level (int).

Return type:

Returns:

The score of the postprocessed classification result, calculated as the average of ‘timestamp’, ‘annotation’ and ‘prediction’ macro F1 scores.

run_k_fold_experiment(dataset, extractor, classifier, *, k, sampling_function, balance_sample_weights=True, experiment, log=None, cache=True)¶

Perform a k-fold prediction experiment on the given dataset.

Parameters:

dataset (AnnotatedDataset) – The dataset to perform the experiment on.
extractor (BaseExtractor[TypeVar(F, bound= Shaped)]) – The extractor to use for feature extraction.
classifier (Any) – The classifier to use for classification.
k (int) – The number of folds to use.
sampling_function (SamplingFunction) – The sampling function to use.
balance_sample_weights (bool, default: True) – Whether to balance sample weights for model fitting.
experiment (Experiment) – The experiment to run, can also be a DistributedExperiment.
log (Logger | None, default: None) – The logger to use.
cache (bool, default: True) – Whether to cache the results.

Return type:

list[DatasetClassificationResult] | list[str]

Returns:

The results of the experiment (as list of cache files if cache=True)

summarize_experiment(studies, *, results_file='optimization-results.yaml', summary_file='optimization-summary.yaml', trials_file='optimization-trials.csv', log=None)¶

Summarize one or more Optuna studies resulting from an optimization experiment.

See also

optimize_postprocessing_parameters()

Parameters:

studies (list[Study] | Study) – One or more Optuna studies to summarize.
results_file (str, default: 'optimization-results.yaml') – Path to the file where the results will be saved.
summary_file (str, default: 'optimization-summary.yaml') – Path to the file where the summary will be saved.
trials_file (str, default: 'optimization-trials.csv') – Path to the file where the trials will be saved.
log (Logger | None, default: None) – Loguru logger to use for logging.

Returns:

None