pydgn.evaluation

evaluation.config

class pydgn.evaluation.config.Config(config_dict: dict)

Bases: object

Simple class to manage the configuration dictionary as a Python object with fields.

Parameters: config_dict (dict) – the configuration dictionary

get(key: str, default: object)

Returns the key from the dictionary if present, otherwise the default value specified

Parameters

key (str) – the key to look up in the dictionary
default (object) – the default object

Returns

a value from the dictionaryu

items()

Invokes the items() method of the configuration dictionary

Returns: a list of (key, value) pairs

keys()

Invokes the keys() method of the configuration dictionary

Returns: the set of keys in the dictionary

evaluation.evaluator

class pydgn.evaluation.evaluator.RiskAssesser(outer_folds: int, inner_folds: int, experiment_class: Callable[[...], pydgn.experiment.experiment.Experiment], exp_path: str, splits_filepath: str, model_configs: Union[pydgn.evaluation.grid.Grid, pydgn.evaluation.random_search.RandomSearch], final_training_runs: int, higher_is_better: bool, gpus_per_task: float, base_seed: int = 42)

Bases: object

Class implementing a K-Fold technique to do Risk Assessment (estimate of the true generalization performances) and K-Fold Model Selection (select the best hyper-parameters for each external fold

Parameters

outer_folds (int) – The number K of outer TEST folds. You should have generated the splits accordingly
outer_folds – The number K of inner VALIDATION folds. You should have generated the splits accordingly
experiment_class – (Callable[…, Experiment]): the experiment class to be instantiated
exp_path (str) – The folder in which to store all results
splits_filepath (str) – The splits filepath with additional meta information
model_configs – (Union[Grid, RandomSearch]): an object storing all possible model configurations, e.g., config.base.Grid
final_training_runs (int) – no of final training runs to mitigate bad initializations
higher_is_better (bool) – whether or not the best model for each external fold should be selected by higher or lower score values
gpus_per_task (float) – Number of gpus to assign to each experiment. Can be < 1.
base_seed (int) – Seed used to generate experiments seeds. Used to replicate results. Default is 42

model_selection(kfold_folder: str, outer_k: int, debug: bool)

Performs model selection.

Parameters

kfold_folder – The root folder for model selection
outer_k – the current outer fold to consider
debug – if True, sequential execution is performed and logs are printed to screen

process_config(config_folder: str, config: pydgn.evaluation.config.Config)

Computes the best configuration for each external fold and stores it into a file.

Parameters

config_folder (str) –
config (Config) – the configuration object

process_final_runs(outer_k: int)

Computes the average scores for the final runs of a specific outer fold

Parameters: outer_k (int) – id of the outer fold from 0 to K-1

process_inner_results(folder: str, outer_k: int, no_configurations: int)

Chooses the best hyper-parameters configuration using the HIGHEST validation mean score.

Parameters

folder (str) – a folder which holds all configurations results after K INNER folds
outer_k (int) – the current outer fold to consider
no_configurations (int) – number of possible configurations

process_outer_results(): Aggregates Outer Folds results and compute Training and Test mean/std

risk_assessment(debug: bool)

Performs risk assessment to evaluate the performances of a model.

Parameters: debug – if True, sequential execution is performed and logs are printed to screen

run_final_model(outer_k: int, debug: bool)

Performs the final runs once the best model for outer fold outer_k has been chosen.

Parameters

outer_k (int) – the current outer fold to consider
debug (bool) – if True, sequential execution is performed and logs are printed to screen

wait_configs(): Waits for configurations to terminate and updates the state of the progress manager

pydgn.evaluation.evaluator.run_test(experiment_class: Callable[[...], pydgn.experiment.experiment.Experiment], dataset_getter: Callable[[...], pydgn.data.provider.DataProvider], best_config: dict, outer_k: int, i: int, final_run_exp_path: str, final_run_torch_path: str, exp_seed: int, logger: pydgn.log.logger.Logger) → Tuple[int, int, float]

Ray job that performs a risk assessment run and returns bookkeeping information for the progress manager.

Parameters

experiment_class – (Callable[…, Experiment]): the class of the experiment to instantiate
dataset_getter – (Callable[…, DataProvider]): the class of the data provider to instantiate
best_config (dict) – the best configuration to use for this specific outer fold
i (int) – the id of the final run (for bookkeeping reasons)
final_run_exp_path (str) – path of the experiment root folder
final_run_torch_path (str) – path where to store the results of the experiment
exp_seed (int) – seed of the experiment
logger (Logger) – a logger to log information in the appropriate file

Returns

a tuple with outer fold id, final run id, and time elapsed

pydgn.evaluation.evaluator.run_valid(experiment_class: Callable[[...], pydgn.experiment.experiment.Experiment], dataset_getter: Callable[[...], pydgn.data.provider.DataProvider], config: dict, config_id: int, fold_exp_folder: str, fold_results_torch_path: str, exp_seed: int, logger: pydgn.log.logger.Logger) → Tuple[int, int, int, float]

Ray job that performs a model selection run and returns bookkeeping information for the progress manager.

Parameters

experiment_class – (Callable[…, Experiment]): the class of the experiment to instantiate
dataset_getter – (Callable[…, DataProvider]): the class of the data provider to instantiate
config (dict) – the configuration of this specific experiment
config_id (int) – the id of the configuration (for bookkeeping reasons)
fold_exp_folder (str) – path of the experiment root folder
fold_results_torch_path (str) – path where to store the results of the experiment
exp_seed (int) – seed of the experiment
logger (Logger) – a logger to log information in the appropriate file

Returns

a tuple with outer fold id, inner fold id, config id, and time elapsed

pydgn.evaluation.evaluator.send_telegram_update(bot_token: str, bot_chat_ID: str, bot_message: str)

Sends a message using Telegram APIs. Markdown can be used.

Parameters

bot_token (str) – token of the user’s bot
bot_chat_ID (str) – identifier of the chat where to write the message
bot_message (str) – the message to be sent

evaluation.grid

class pydgn.evaluation.grid.Grid(configs_dict: dict)

Bases: object

Class that implements grid-search. It computes all possible configurations starting from a suitable config file.

Parameters: configs_dict (dict) – the configuration dictionary specifying the different configurations to try

_gen_configs() → List[dict]

Takes a dictionary of key:list pairs and computes all possible combinations.

Returns: A list of al possible configurations in the form of dictionaries

_gen_helper(cfgs_dict: dict) → dict: Helper generator that yields one possible configuration at a time.

_list_helper(values: object) → object: Recursively parses lists of possible options for a given hyper-parameter.

property exp_name: str

Computes the name of the root folder

Returns: the name of the root folder as made of EXP-NAME_DATASET-NAME

property num_configs: int

Computes the number of configurations to try during model selection

Returns: the number of configurations

evaluation.random_search

class pydgn.evaluation.random_search.RandomSearch(configs_dict: dict)

Bases: pydgn.evaluation.grid.Grid

Class that implements random-search. It computes all possible configurations starting from a suitable config file.

Parameters: configs_dict (dict) – the configuration dictionary specifying the different configurations to try

_dict_helper(configs)

Recursively parses a dictionary

Returns: A dictionary

_gen_helper(cfgs_dict)

Takes a dictionary of key:list pairs and computes all possible combinations.

Returns: A list of al possible configurations in the form of dictionaries

_sampler_helper(configs)

Samples possible hyperparameter(s) and returns it (them, in this case as a dict)

Returns:
A dictionary

evaluation.util

class pydgn.evaluation.util.ProgressManager(outer_folds, inner_folds, no_configs, final_runs, show=True)

Bases: object

Class that is responsible for drawing progress bars.

Parameters

outer_folds (int) – number of external folds for model assessment
inner_folds (int) – number of internal folds for model selection
no_configs (int) – number of possible configurations in model selection
final_runs (int) – number of final runs per outer fold once the best model has been selected
show (bool) – whether to show the progress bar or not. Default is True

_init_assessment_pbar(i: int)

Initializes the progress bar for risk assessment

Parameters: i (int) – the id of the outer fold (from 0 to outer folds - 1)

_init_selection_pbar(i: int, j: int)

Initializes the progress bar for model selection

Parameters

i (int) – the id of the outer fold (from 0 to outer folds - 1)
j (int) – the id of the inner fold (from 0 to inner folds - 1)

refresh(): Refreshes the progress bar

show_footer(): Prints the footer of the progress bar

show_header(): Prints the header of the progress bar

update_state(msg: dict)

Updates the state of the progress bar (different from showing it on screen, see refresh()) once a message is received

Parameters: msg (dict) – message with updates to be parsed

pydgn.evaluation.util.choice(*args): Implements a random choice between a list of values

pydgn.evaluation.util.clear_screen(): Clears the CLI interface.

pydgn.evaluation.util.filter_experiments(config_list: List[dict], logic: bool = 'AND', parameters: dict = {})

Filters the list of configurations returned by the method retrieve_experiments according to a dictionary. The dictionary contains the keys and values of the configuration files you are looking for.

If you specify more then one key/value pair to look for, then the logic parameter specifies whether you want to filter using the AND or OR rule.

For a key, you can specify more than one possible value you are interested in by passing a list as the value, for instance {‘device’: ‘cpu’, ‘lr’: [0.1, 0.01]}

Parameters

config_list – The list of configuration files
logic – if AND, a configuration is selected iff all conditions are satisfied. If OR, a config is selected when at least one of the criteria is met.
parameters – dictionary with parameters used to filter the configurations

Returns

a list of filtered configurations like the one in input

pydgn.evaluation.util.instantiate_data_provider_from_config(config: dict, splits_filepath: str, n_outer_folds: int, n_inner_folds: int) → pydgn.data.provider.DataProvider: Instantiate a data provider from a configuration file. :param config (dict): the configuration file :param splits_filepath (str): the path to data splits file :param n_outer_folds (int): the number of outer folds :param n_inner_folds (int): the number of inner folds :return: an instance of DataProvider, i.e., the data provider

pydgn.evaluation.util.instantiate_dataset_from_config(config: dict) → pydgn.data.dataset.DatasetInterface: Instantiate a dataset from a configuration file. :param config (dict): the configuration file :return: an instance of DatasetInterface, i.e., the dataset

pydgn.evaluation.util.instantiate_model_from_config(config: dict, dataset: pydgn.data.dataset.DatasetInterface, config_type: str = 'supervised_config') → pydgn.model.interface.ModelInterface

Instantiate a model from a configuration file. :param config (dict): the configuration file :param dataset (DatasetInterface): the dataset used in the experiment :param config_type (str): the type of model in [“supervised_config”, “unsupervised_config”],

as written on the YAML experiment configuration file. Defaults to “supervised_config”

Returns: an instance of ModelInterface, i.e., the model

pydgn.evaluation.util.load_checkpoint(checkpoint_path: str, model: pydgn.model.interface.ModelInterface, device: torch.device): Load a checkpoint from a checkpoint file into a model. :param checkpoint_path: the checkpoint file path :param model (ModelInterface): the model :param device (torch.device): the device, e.g, “cpu” or “cuda”

pydgn.evaluation.util.loguniform(*args)

Performs a log-uniform random selection.

Parameters: *args – a tuple of (log min, log max, [base]) to use. Base 10 is used if the third argument is not available.
Returns: a randomly chosen value

pydgn.evaluation.util.normal(*args): Implements a univariate normal sampling given its parameters

pydgn.evaluation.util.randint(*args): Implements a random integer sampling in an interval

pydgn.evaluation.util.retrieve_best_configuration(model_selection_folder) → dict

Once the experiments are done, retrieves the winning configuration from a specific model selection folder, and returns it as a dictionaries

Parameters: model_selection_folder – path to the folder of a model selection, that is, your_results_path/…./MODEL_SELECTION/
Returns: a dictionary with info about the best configuration

pydgn.evaluation.util.retrieve_experiments(model_selection_folder, skip_results_not_found: bool = False) → List[dict]

Once the experiments are done, retrieves the config_results.json files of all configurations in a specific model selection folder, and returns them as a list of dictionaries

Parameters

model_selection_folder – path to the folder of a model selection, that is, your_results_path/…./MODEL_SELECTION/
skip_results_not_found – whether to skip an experiment if a config_results.json file has not been produced yet. Useful when analyzing experiments while others still run.

Returns

a list of dictionaries, one per configuration, each with an extra key “exp_folder” which identifies the config folder.

pydgn.evaluation.util.return_class_and_args(config: dict, key: str, return_class_name: bool = False) → Tuple[Callable[[...], object], dict]

Returns the class and arguments associated to a specific key in the configuration file.

Parameters

config (dict) – the configuration dictionary
key (str) – a string representing a particular class in the configuration dictionary
return_class_name (bool) – if True, returns the class name as a string rather than the class object

Returns

a tuple (class, dict of arguments), or (None, None) if the key is not present in the config dictionary

pydgn.evaluation.util.uniform(*args): Implements a uniform sampling given an interval