Relationsdiscover documentation¶
-
class
logflow.relationsdiscover.Cardinality.Cardinality(cardinality: int, path_list_classes: str, path_w2v, size=-1, one_model=False, set_cardinalities=())[source]¶ A cardinality describes the number of examples per pattern. Each cardinality contains a 10 power examples. For example, the cardinality 7 contains all the patterns with 10^7 examples. It extendes the dataloader class of pytorch to be able the provide the data to the pytorch deep learning model.
Parameters: - cardinality (int) – the value of the cardinality
- path_list_classes (str) – path to the data. The data is the list of patterns to learn.
- path_w2v ([type]) – path to the word2vec model. The word2vec model is used to turn a pattern into a vector.
- size (int, optional) – number of examples to use. Defaults to -1.
- one_model (bool, optional) – use one global model instead of one model per cardinality.
- set_cardinalities (set, optional) – use several cardinalities for the learning step of one model. Must be used with one_model.
-
compute_position()[source]¶ Compute the position of each example in the initial list of patterns. Indeed, one cardinality learns only to predict the patterns with a specific cardinality. We store the index of each pattern with the right cardinality on the initial data to be able to provide it to the model during the learning step.
-
class
logflow.relationsdiscover.Dataset.Dataset(path_model='', path_data='', name_dataset='', size=-1, one_model=False)[source]¶ Load the files and create the cardinalities
Parameters: - path_model (str, optional) – path to the word2vec model. Defaults to “”.
- path_data (str, optional) – path to the data (list of patterns). Defaults to “”.
- name_dataset (str, optional) – name of the dataset to load. Defaults to “”.
- size (int, optional) – number of examples to load. Defaults to -1.
- one_model (bool, optional) – use one global model instead of one model per cardinality.
Raises: Exception– model file is not foundException– data file is not found
-
creating_cardinalities(min_cardinality=0, max_cardinality=inf)[source]¶ Create the cardinality object for the learning step.
Parameters: - min_cardinality (int, optional) – minimum value of cardinality to be selected. Defaults to 0.
- max_cardinality (float, optional) – maximum value of cardinality to be selected. Defaults to float(“+inf”).
-
run() → List[logflow.relationsdiscover.Cardinality.Cardinality][source]¶ Start the workflow for the multithreading implementation
Returns: list of the cardinalities created Return type: List[Cardinality]
-
class
logflow.relationsdiscover.Model.LSTMCell(input_size: int, hidden_size: int)[source]¶ LSTM cell to be connected to the attention layer
Parameters: - input_size (int) – size of input
- hidden_size (int) – size of hidden layer
-
class
logflow.relationsdiscover.Model.LSTMLayer(num_classes: int, input_size=20, hidden_size=50, num_layers=1, batch_size=128, length_sentence=30, unidirectional=True, test=False)[source]¶ Deep learning model
Parameters: - num_classes (int) – number of classes
- input_size (int, optional) – size of the embedding vector. Defaults to 20.
- hidden_size (int, optional) – size of the hidden layer. Defaults to 50.
- num_layers (int, optional) – number of layer. Defaults to 1.
- batch_size (int, optional) – size of the batch. Defaults to 128.
- length_sentence (int, optional) – size of the window. Defaults to 30.
- unidirectional (bool, optional) – Unidirectional or BiDirectional LSTM. Defaults to True.
- test (bool, optional) – Testing or training step. During the training step, the value of the attention layer is not returned for performance maximization. Defaults to False.
-
class
logflow.relationsdiscover.Result.Result(cardinality: logflow.relationsdiscover.Cardinality.Cardinality, condition='Train', subsample=False)[source]¶ Compute the results based on the predictions and the ground truth
Parameters: - cardinality (Cardinality) – cardinality object.
- condition (str, optional) – Testing or Training results. Use only for results display. Defaults to “Train”.
- subsample (bool, optional) – Results computed on subsample or not. Use only for results display. Defaults to False.
-
computing_result(progress=0, reinit=True, printing=True)[source]¶ Compute the results
Parameters: - progress (int, optional) – value of the progression. Only for display task. Defaults to 0.
- reinit (bool, optional) – reset the matrix value. Defaults to True.
- printing (bool, optional) – print the result. Defaults to True.
-
class
logflow.relationsdiscover.Result.Results(path_model: str, name_model: str)[source]¶ Compute the results based on the results saved during the learning step by each cardinality.
-
class
logflow.relationsdiscover.Saver.Saver(name_model: str, path_model: str, cardinality=-1, lock=-1)[source]¶ Save and load the model from a file.
The file is saved as follow : file[“LSTM”][cardinality] = model
Parameters: - name_model (str) – name of the dataset
- path_model (str) – path of the model to save
- cardinality (int, optional) – cardinality to save. Defaults to -1.
- lock (int, optional) – lock for the file. Defaults to -1.
-
load(model: logflow.relationsdiscover.Model.LSTMLayer) → logflow.relationsdiscover.Model.LSTMLayer[source]¶ Load the model. Note that the model must be created before. This function loads only the parameters inside the model.
Parameters: model (LSTMLayer) – object to use for loading the model. Raises: Exception– the file is not foundReturns: the loaded model Return type: LSTMLayer
-
class
logflow.relationsdiscover.StoppingCondition.StoppingCondition(method='earlystopping', condition_value=0.005, condition_step=3, duration=60, condition_epoch=3)[source]¶ - Condition to stop the learning. Three conditions can be selected:
- Increase of the F1 value needs to be less than 0.005 during more than 3 steps to stop the learning process.
- A timer. If the duration of the training is longer than the timer, the training step is stopped
- Number of epochs. Do the learning step during a fixed number of epochs.
Please note that the timer excludes the duration of the testing step. :param method: 3 options: “earlystopping”, “timer”, “epoch”. Earlystopping uses the increase of the macro f1 value accros multiples steps, timer uses a timer, and epoch uses a nb of epoch. Defaults to “earlystopping”. :type method: str, optional :param condition_value: value of the increase. Defaults to 0.005. :type condition_value: float, optional :param condition_step: number of steps. Defaults to 3. :type condition_step: int, optional :param duration: duration of the learning step in seconde. Defaults to 60. :type duration: int, optional :param condition_epoch: number of epochs to be done. Defaults to 3. :type condition_epoch: int, optional
-
class
logflow.relationsdiscover.Worker.Worker(list_cardinalities: List[logflow.relationsdiscover.Cardinality.Cardinality], batch_size=128, multithreading=True, path_model='', name_dataset='', cardinalities_choosen=[-1], one_model=False, exclude_test=False, stoppingcondition='earlystopping', condition_value=0.005, condition_step=3, duration=5, condition_epoch=3)[source]¶ Handle the learning and the testing of each worker_per_cardinality in a multithreading way.
Parameters: - list_cardinalities (List[Cardinality]) – list of the cardinality objects to be used.
- batch_size (int, optional) – size of the batch. Defaults to 128.
- multithreading (bool, optional) – use a multithreading implementation. Sequential implementation is not available yet. Defaults to True.
- path_model (str, optional) – path to the model to save. Defaults to “”.
- name_dataset (str, optional) – name of the dataset. Defaults to “”.
- cardinalities_choosen (List[int], optional) – list of cardinalities to use. This list contains only the value of cardinalities to be used. [-1] means all cardinalities. Defaults to [-1].
- one_model (bool, optional) – use one global model instead of one model per cardinality.
- exclude_test (boolean, optional) – exlude the testing step during the learning step. Can be use with the timer as stopping condition to have an exact duration.
- stoppingcondition (str, optional) – 3 options: “earlystopping”, “timer”, “epoch”. Earlystopping uses the increase of the macro f1 value accros multiples steps, timer uses a timer, and epoch uses a nb of epoch. Defaults to “earlystopping”.
- condition_value (float, optional) – value of the increase. Defaults to 0.005.
- condition_step (int, optional) – number of steps. Defaults to 3.
- duration (int, optional) – duration of the learning step in seconde. Defaults to 60.
- condition_epoch (int, optional) – number of epochs to be done. Defaults to 3.
-
static
execute_test(i, *args)[source]¶ Execute the test function for the multithreading implementation
Parameters: - i (int) – value of the cardinality selected
- args (List[Worker_single]) – list of all the cardinalities
-
static
execute_train(i, *args)[source]¶ Execute the training function for the multithreading implementation
Parameters: - i (int) – value of the cardinality selected
- args (List[Worker_single]) – list of all the cardinalities
-
class
logflow.relationsdiscover.Worker_per_cardinality.Worker_single(cardinality: logflow.relationsdiscover.Cardinality.Cardinality, lock: _thread.allocate_lock, batch_size=128, path_model='', name_dataset='', batch_result=20000, exclude_test=False, stoppingcondition='earlystopping', condition_value=0.005, condition_step=3, duration=5, condition_epoch=3)[source]¶ A single worker is responsible for the creation of the dataloader, the learning/testing step and for saving files of one cardinality.
Parameters: - cardinality (Cardinality) – the cardinality object containing the data.
- lock (threading.Lock) – lock used for saving files in the same file for all cardinalities.
- batch_size (int, optional) – size of the batch. Defaults to 128.
- path_model (str, optional) – path to the model to save. Defaults to “”.
- name_dataset (str, optional) – name of the dataset. Defaults to “”.
- batch_result (int, optional) – show results each batch_result number of batchs. Defaults to 2000.
- exclude_test (boolean, optional) – exlude the testing step during the learning step. Can be use with the timer as stopping condition to have an exact duration.
- stoppingcondition (str, optional) – condition to stop the learning step (timer, earlystopping, epoch). Defaults to earlystopping.
- condition_value (float, optional) – stoppingcondition option. Value of the increase. Defaults to 0.005.
- condition_step (int, optional) – stoppingcondition option. Number of steps. Defaults to 3.
- duration (int, optional) – stoppingcondition option. Duration of the learning step in minute. Defaults to 60.
- condition_epoch (int, optional) – stoppingcondition option. Number of epochs to be done. Defaults to 3.
-
create_dataloader(validation_split=0.6, condition='Test', subsample=False, subsample_split=0.01) → torch.utils.data.dataloader.DataLoader[source]¶ Create the dataloader for the learning/testing step.
Parameters: - validation_split (float, optional) – ratio between the learning and the testing set. Defaults to 0.6.
- condition (str, optional) – if Test the dataloader contains the test data. Else it contains the learning data. Defaults to “Test”.
- subsample (bool, optional) – use only a subsample of the data. Can be used for the learning and/or the testing step. Defaults to False.
- subsample_split (float, optional) – ratio of the data to use. Defaults to 0.01.
Returns: PyTorch dataloader corresponding to the previous features.
Return type: DataLoader
-
test(validation_split=0.6, subsample=False, subsample_split=0.01)[source]¶ Test the model
Parameters: - validation_split (float, optional) – ratio between testing and learning set. Defaults to 0.6.
- subsample (bool, optional) – if False, use all the available data, if True, use only a ratio of the data (subsample_split*data). Defaults to False.
- subsample_split (float, optional) – ratio of the data to use. Defaults to 0.01.