postprocessing¶
- class ParameterSuggestionFunction(*args, **kwargs)¶
Bases:
ProtocolProtocol for parameter suggestion functions.
- Parameters:
- Returns:
Parameters for postprocessing.
- Return type:
- class PostprocessingFunction(*args, **kwargs)¶
-
Protocol for postprocessing functions.
- Parameters:
result (
ClassificationResult|GroupClassificationResult|DatasetClassificationResult) – The result to be postprocessed.*args – Additional arguments.
postprocessing_parameters (dict[str, Any]) – Parameters for postprocessing.
decision_thresholds (Iterable[float]) – Category-specific decision thresholds.
**kwargs – Additional keyword arguments.
- Returns:
The result after postprocessing.
- Return type:
(
ClassificationResult|GroupClassificationResult|DatasetClassificationResult)
See also
PostprocessingParametersto specify the two required keyword arguments with**postprocessing_parameters.
- class PostprocessingParameters¶
Bases:
TypedDictTyped dictionary that holds parameters for postprocessing.
- optimize_postprocessing_parameters(dataset, extractor, classifier, postprocessing_function, suggest_postprocessing_parameters_function, *, num_trials, k, sampling_function, balance_sample_weights=True, experiment, optimize_across_runs=False, parallel_optimization=False, log=None)¶
Sequentially perform a k-fold prediction experiment and use the results to optimize postprocessing parameters.
See also
run_k_fold_experiment()to perform a k-fold prediction experiment.optuna_parameter_optimization()to optimize postprocessing parameters on existing classification results.
- Parameters:
dataset (
AnnotatedDataset) – The dataset to use.extractor (
BaseExtractor[TypeVar(F, bound=Shaped)]) – The extractor to use for feature extraction.classifier (
Any) – The classifier to use for classification.postprocessing_function (
PostprocessingFunction) – The postprocessing function to use.suggest_postprocessing_parameters_function (
ParameterSuggestionFunction) – A callable that suggests postprocessing parameters.num_trials (
int) – The number of trials to perform in the Optuna optimization study.k (
int) – The number of folds to use for the k-fold experiment.sampling_function (
SamplingFunction) – The sampling function to use during k-fold prediction.balance_sample_weights (
bool, default:True) – Whether to balance the sample weights during model fitting.experiment (
Experiment) – The experiment to use for the experiment (specifies number of runs and random state).optimize_across_runs (
bool, default:False) – Whether to optimize postprocessing parameters across experiment runs, or for each run individually.parallel_optimization (
bool, default:False) – Whether to perform the Optuna optimization study/studies in parallel.log (
Logger|None, default:None) – The Loguru logger to use for the experiment.
- Return type:
- Returns:
The list of Optuna studies (
optimize_across_runs=False) or alternatively, a single study.
- optuna_parameter_optimization(classification_result, *, num_trials, random_state, postprocessing_function, suggest_postprocessing_parameters_function, parallel_optimization, experiment, log)¶
Perform a parameter optimization study using Optuna.
- Parameters:
classification_result (
TypeVar(T, bound=ClassificationResult|GroupClassificationResult|DatasetClassificationResult) |str|list[TypeVar(T, bound=ClassificationResult|GroupClassificationResult|DatasetClassificationResult)] |list[str]) – The classification result(s) to postprocess, can also be passed as a string or a list of strings to read from cache files.num_trials (
int) – The number of Optuna trials to run.random_state (
int|Generator|None) – The random state to use for the optimization.postprocessing_function (
PostprocessingFunction) – The postprocessing function to use.suggest_postprocessing_parameters_function (
ParameterSuggestionFunction) – A callable that suggests postprocessing parameters for the specified postprocessing function.parallel_optimization (
bool) – Whether to run the Optuna study in parallel.experiment (
Experiment|None) – Should be specified in a distributed setting.log (
Logger) – The Loguru logger to use for logging.
- Return type:
- Returns:
The Optuna study.
- optuna_score_postprocessing_trial(classification_result, trial, *, postprocessing_function, postprocessing_parameters, loop_log)¶
Run a single
Trialto evaluate postprocessing parameters for a specified postprocessing function.- Parameters:
classification_result (
TypeVar(T, bound=ClassificationResult|GroupClassificationResult|DatasetClassificationResult) |str|list[TypeVar(T, bound=ClassificationResult|GroupClassificationResult|DatasetClassificationResult)] |list[str]) – The classification result(s) to postprocess, can also be passed as a string or a list of strings to read from cache files.trial (
Trial) – One optuna trial of the current optuna study.postprocessing_function (
PostprocessingFunction[TypeVar(T, bound=ClassificationResult|GroupClassificationResult|DatasetClassificationResult)]) – The postprocessing function to use.postprocessing_parameters (
PostprocessingParameters|ParameterSuggestionFunction) – The postprocessing parameters to evaluate, can also be a callable that returns a dictionary of parameters given the optuna trial.loop_log (
tuple[Logger,str] |tuple[tuple[dict[str,Any],int],str]) – The logger and log name to use for logging. In a multiprocessing environment, the logger should be passed as a tuple of logger parameters (dict) and log level (int).
- Return type:
- Returns:
The score of the postprocessed classification result, calculated as the average of ‘timestamp’, ‘annotation’ and ‘prediction’ macro F1 scores.
- run_k_fold_experiment(dataset, extractor, classifier, *, k, sampling_function, balance_sample_weights=True, experiment, log=None, cache=True)¶
Perform a k-fold prediction experiment on the given dataset.
- Parameters:
dataset (
AnnotatedDataset) – The dataset to perform the experiment on.extractor (
BaseExtractor[TypeVar(F, bound=Shaped)]) – The extractor to use for feature extraction.classifier (
Any) – The classifier to use for classification.k (
int) – The number of folds to use.sampling_function (
SamplingFunction) – The sampling function to use.balance_sample_weights (
bool, default:True) – Whether to balance sample weights for model fitting.experiment (
Experiment) – The experiment to run, can also be aDistributedExperiment.log (
Logger|None, default:None) – The logger to use.cache (
bool, default:True) – Whether to cache the results.
- Return type:
- Returns:
The results of the experiment (as list of cache files if
cache=True)
- summarize_experiment(studies, *, results_file='optimization-results.yaml', summary_file='optimization-summary.yaml', trials_file='optimization-trials.csv', log=None)¶
Summarize one or more Optuna studies resulting from an optimization experiment.
See also
- Parameters:
studies (
list[Study] |Study) – One or more Optuna studies to summarize.results_file (
str, default:'optimization-results.yaml') – Path to the file where the results will be saved.summary_file (
str, default:'optimization-summary.yaml') – Path to the file where the summary will be saved.trials_file (
str, default:'optimization-trials.csv') – Path to the file where the trials will be saved.log (
Logger|None, default:None) – Loguru logger to use for logging.
- Returns:
None