Relationsdiscover documentation¶

class logflow.relationsdiscover.Cardinality.Cardinality(cardinality: int, path_list_classes: str, path_w2v, size=-1, one_model=False, set_cardinalities=())[source]¶

A cardinality describes the number of examples per pattern. Each cardinality contains a 10 power examples. For example, the cardinality 7 contains all the patterns with 10^7 examples. It extendes the dataloader class of pytorch to be able the provide the data to the pytorch deep learning model.

Parameters:

cardinality (int) – the value of the cardinality
path_list_classes (str) – path to the data. The data is the list of patterns to learn.
path_w2v ([type]) – path to the word2vec model. The word2vec model is used to turn a pattern into a vector.
size (int, optional) – number of examples to use. Defaults to -1.
one_model (bool, optional) – use one global model instead of one model per cardinality.
set_cardinalities (set, optional) – use several cardinalities for the learning step of one model. Must be used with one_model.

compute_position()[source]¶: Compute the position of each example in the initial list of patterns. Indeed, one cardinality learns only to predict the patterns with a specific cardinality. We store the index of each pattern with the right cardinality on the initial data to be able to provide it to the model during the learning step.

load_files()[source]¶: Load the data and the word2vec model.

class logflow.relationsdiscover.Dataset.Dataset(path_model='', path_data='', name_dataset='', size=-1, one_model=False)[source]¶

Load the files and create the cardinalities

Parameters:

path_model (str, optional) – path to the word2vec model. Defaults to “”.
path_data (str, optional) – path to the data (list of patterns). Defaults to “”.
name_dataset (str, optional) – name of the dataset to load. Defaults to “”.
size (int, optional) – number of examples to load. Defaults to -1.
one_model (bool, optional) – use one global model instead of one model per cardinality.

Raises:

Exception – model file is not found
Exception – data file is not found

creating_cardinalities(min_cardinality=0, max_cardinality=inf)[source]¶

Create the cardinality object for the learning step.

Parameters:	min_cardinality (int, optional) – minimum value of cardinality to be selected. Defaults to 0. max_cardinality (float, optional) – maximum value of cardinality to be selected. Defaults to float(“+inf”).

loading_files()[source]¶: Load the data, the word2vec and the counter file.

run() → List[logflow.relationsdiscover.Cardinality.Cardinality][source]¶

Start the workflow for the multithreading implementation

Returns:	list of the cardinalities created
Return type:	List[Cardinality]

class logflow.relationsdiscover.Model.LSTMCell(input_size: int, hidden_size: int)[source]¶

LSTM cell to be connected to the attention layer

Parameters:	input_size (int) – size of input hidden_size (int) – size of hidden layer

forward(x, hidden, att)[source]¶

Forward through the attention layer

Parameters:	x – input value hidden – hidden layer att – attention layer
Returns:	return the hidden value of the layer
Return type:	torch

class logflow.relationsdiscover.Model.LSTMLayer(num_classes: int, input_size=20, hidden_size=50, num_layers=1, batch_size=128, length_sentence=30, unidirectional=True, test=False)[source]¶

Deep learning model

Parameters:

num_classes (int) – number of classes
input_size (int, optional) – size of the embedding vector. Defaults to 20.
hidden_size (int, optional) – size of the hidden layer. Defaults to 50.
num_layers (int, optional) – number of layer. Defaults to 1.
batch_size (int, optional) – size of the batch. Defaults to 128.
length_sentence (int, optional) – size of the window. Defaults to 30.
unidirectional (bool, optional) – Unidirectional or BiDirectional LSTM. Defaults to True.
test (bool, optional) – Testing or training step. During the training step, the value of the attention layer is not returned for performance maximization. Defaults to False.

forward(x)[source]¶

Forward through the deep learning network

Parameters:	x – list of Tensors forward through the neural network
Returns:	if Test return the predictions and the values of the attention layer. If Learn, return only the predictions
Return type:	torch.Tensor

class logflow.relationsdiscover.Result.Result(cardinality: logflow.relationsdiscover.Cardinality.Cardinality, condition='Train', subsample=False)[source]¶

Compute the results based on the predictions and the ground truth

Parameters:	cardinality (Cardinality) – cardinality object. condition (str, optional) – Testing or Training results. Use only for results display. Defaults to “Train”. subsample (bool, optional) – Results computed on subsample or not. Use only for results display. Defaults to False.

computing_result(progress=0, reinit=True, printing=True)[source]¶

Compute the results

Parameters:	progress (int, optional) – value of the progression. Only for display task. Defaults to 0. reinit (bool, optional) – reset the matrix value. Defaults to True. printing (bool, optional) – print the result. Defaults to True.

print_results(progress=0)[source]¶

print the result

Parameters:	progress (int, optional) – value of the progression. Defaults to 0.

update(preds: torch.Tensor, labels: torch.Tensor)[source]¶

Update the confusion matrix according to the new predictions and labels

Parameters:	preds (torch.Tensor) – predictions provided by the model labels (torch.Tensor) – labels provided by the dataloader

class logflow.relationsdiscover.Result.Results(path_model: str, name_model: str)[source]¶

Compute the results based on the results saved during the learning step by each cardinality.

compute_results(condition='Test')[source]¶

Compute the results

Parameters:	condition (str, optional) – Compute the results for the testing or training step. Only “Train” and “Test” are accepted. Defaults to “Test”.

load_files()[source]¶: load the associated files

print_results()[source]¶: Print the result

class logflow.relationsdiscover.Saver.Saver(name_model: str, path_model: str, cardinality=-1, lock=-1)[source]¶

Save and load the model from a file.

The file is saved as follow : file[“LSTM”][cardinality] = model

Parameters:	name_model (str) – name of the dataset path_model (str) – path of the model to save cardinality (int, optional) – cardinality to save. Defaults to -1. lock (int, optional) – lock for the file. Defaults to -1.

load(model: logflow.relationsdiscover.Model.LSTMLayer) → logflow.relationsdiscover.Model.LSTMLayer[source]¶

Load the model. Note that the model must be created before. This function loads only the parameters inside the model.

Parameters:	model (LSTMLayer) – object to use for loading the model.
Raises:	`Exception` – the file is not found
Returns:	the loaded model
Return type:	LSTMLayer

save(model: logflow.relationsdiscover.Model.LSTMLayer, result: logflow.relationsdiscover.Result.Result, condition='Test')[source]¶

Save the model

Parameters:	model (LSTMLayer) – model to save result (Result) – result to save condition (str) – Test or train results to save

class logflow.relationsdiscover.StoppingCondition.StoppingCondition(method='earlystopping', condition_value=0.005, condition_step=3, duration=60, condition_epoch=3)[source]¶

Condition to stop the learning. Three conditions can be selected:

Increase of the F1 value needs to be less than 0.005 during more than 3 steps to stop the learning process.
A timer. If the duration of the training is longer than the timer, the training step is stopped
Number of epochs. Do the learning step during a fixed number of epochs.

Please note that the timer excludes the duration of the testing step. :param method: 3 options: “earlystopping”, “timer”, “epoch”. Earlystopping uses the increase of the macro f1 value accros multiples steps, timer uses a timer, and epoch uses a nb of epoch. Defaults to “earlystopping”. :type method: str, optional :param condition_value: value of the increase. Defaults to 0.005. :type condition_value: float, optional :param condition_step: number of steps. Defaults to 3. :type condition_step: int, optional :param duration: duration of the learning step in seconde. Defaults to 60. :type duration: int, optional :param condition_epoch: number of epochs to be done. Defaults to 3. :type condition_epoch: int, optional

stop() → bool[source]¶

Compute the condition

Returns:	If the stopping condition is reached return True, else return False
Return type:	bool

update(metric=0.1)[source]¶

Update the new value of the metric and compute the number of increase steps.

Parameters:	metric (optional, float) – value of the metric. Should only be used with the earlystopping method.

class logflow.relationsdiscover.Worker.Worker(list_cardinalities: List[logflow.relationsdiscover.Cardinality.Cardinality], batch_size=128, multithreading=True, path_model='', name_dataset='', cardinalities_choosen=[-1], one_model=False, exclude_test=False, stoppingcondition='earlystopping', condition_value=0.005, condition_step=3, duration=5, condition_epoch=3)[source]¶

Handle the learning and the testing of each worker_per_cardinality in a multithreading way.

Parameters:

list_cardinalities (List[Cardinality]) – list of the cardinality objects to be used.
batch_size (int, optional) – size of the batch. Defaults to 128.
multithreading (bool, optional) – use a multithreading implementation. Sequential implementation is not available yet. Defaults to True.
path_model (str, optional) – path to the model to save. Defaults to “”.
name_dataset (str, optional) – name of the dataset. Defaults to “”.
cardinalities_choosen (List[int], optional) – list of cardinalities to use. This list contains only the value of cardinalities to be used. [-1] means all cardinalities. Defaults to [-1].
one_model (bool, optional) – use one global model instead of one model per cardinality.
exclude_test (boolean, optional) – exlude the testing step during the learning step. Can be use with the timer as stopping condition to have an exact duration.
stoppingcondition (str, optional) – 3 options: “earlystopping”, “timer”, “epoch”. Earlystopping uses the increase of the macro f1 value accros multiples steps, timer uses a timer, and epoch uses a nb of epoch. Defaults to “earlystopping”.
condition_value (float, optional) – value of the increase. Defaults to 0.005.
condition_step (int, optional) – number of steps. Defaults to 3.
duration (int, optional) – duration of the learning step in seconde. Defaults to 60.
condition_epoch (int, optional) – number of epochs to be done. Defaults to 3.

static execute_test(i, *args)[source]¶

Execute the test function for the multithreading implementation

Parameters:	i (int) – value of the cardinality selected args (List[Worker_single]) – list of all the cardinalities

static execute_train(i, *args)[source]¶

Execute the training function for the multithreading implementation

Parameters:	i (int) – value of the cardinality selected args (List[Worker_single]) – list of all the cardinalities

test()[source]¶: Start the testing

train(resume=False)[source]¶

Start the training

Parameters:	resume (bool, optional) – resume from a previous training. Not implemented yet. Defaults to False.

class logflow.relationsdiscover.Worker_per_cardinality.Worker_single(cardinality: logflow.relationsdiscover.Cardinality.Cardinality, lock: _thread.allocate_lock, batch_size=128, path_model='', name_dataset='', batch_result=20000, exclude_test=False, stoppingcondition='earlystopping', condition_value=0.005, condition_step=3, duration=5, condition_epoch=3)[source]¶

A single worker is responsible for the creation of the dataloader, the learning/testing step and for saving files of one cardinality.

Parameters:

cardinality (Cardinality) – the cardinality object containing the data.
lock (threading.Lock) – lock used for saving files in the same file for all cardinalities.
batch_size (int, optional) – size of the batch. Defaults to 128.
path_model (str, optional) – path to the model to save. Defaults to “”.
name_dataset (str, optional) – name of the dataset. Defaults to “”.
batch_result (int, optional) – show results each batch_result number of batchs. Defaults to 2000.
exclude_test (boolean, optional) – exlude the testing step during the learning step. Can be use with the timer as stopping condition to have an exact duration.
stoppingcondition (str, optional) – condition to stop the learning step (timer, earlystopping, epoch). Defaults to earlystopping.
condition_value (float, optional) – stoppingcondition option. Value of the increase. Defaults to 0.005.
condition_step (int, optional) – stoppingcondition option. Number of steps. Defaults to 3.
duration (int, optional) – stoppingcondition option. Duration of the learning step in minute. Defaults to 60.
condition_epoch (int, optional) – stoppingcondition option. Number of epochs to be done. Defaults to 3.

create_dataloader(validation_split=0.6, condition='Test', subsample=False, subsample_split=0.01) → torch.utils.data.dataloader.DataLoader[source]¶

Create the dataloader for the learning/testing step.

Parameters:	validation_split (float, optional) – ratio between the learning and the testing set. Defaults to 0.6. condition (str, optional) – if Test the dataloader contains the test data. Else it contains the learning data. Defaults to “Test”. subsample (bool, optional) – use only a subsample of the data. Can be used for the learning and/or the testing step. Defaults to False. subsample_split (float, optional) – ratio of the data to use. Defaults to 0.01.
Returns:	PyTorch dataloader corresponding to the previous features.
Return type:	DataLoader

load_model()[source]¶

Load the learned model from a previous state

Raises:	`e` – file is not found

test(validation_split=0.6, subsample=False, subsample_split=0.01)[source]¶

Test the model

Parameters:	validation_split (float, optional) – ratio between testing and learning set. Defaults to 0.6. subsample (bool, optional) – if False, use all the available data, if True, use only a ratio of the data (subsample_splitdata). Defaults to False. subsample_split* (float, optional) – ratio of the data to use. Defaults to 0.01.

train(validation_split=0.6, resuming=False)[source]¶

Train the model

Parameters:	validation_split (float, optional) – ratio between testing and learning set. Defaults to 0.6. resuming (bool, optional) – resume the learning from a previous step. Not implemented yet. Defaults to False.