4. pahelix.utils¶
Table of Contents
4.1. compound_tools¶
- class pahelix.utils.compound_tools.CompoundConstants[source]¶
Constants of atom and bond properties.
- pahelix.utils.compound_tools.atom_numeric_feat(n, allowable, to_one_hot=True)[source]¶
Restrict the numeric feature to [0, max_n].
- pahelix.utils.compound_tools.check_smiles_validity(smiles)[source]¶
Check whether the smile can’t be converted to rdkit mol object.
- pahelix.utils.compound_tools.create_standardized_mol_id(smiles)[source]¶
- Parameters
smiles – smiles sequence
- Returns
inchi
- pahelix.utils.compound_tools.get_gasteiger_partial_charges(mol, n_iter=12)[source]¶
Calculates list of gasteiger partial charges for each atom in mol object.
- Parameters
mol – rdkit mol object
n_iter (int) – number of iterations. Default 12
- Returns
list of computed partial charges for each atom.
- pahelix.utils.compound_tools.get_largest_mol(mol_list)[source]¶
Given a list of rdkit mol objects, returns mol object containing the largest num of atoms. If multiple containing largest num of atoms, picks the first one
- Parameters
mol_list (list) – a list of rdkit mol object.
- Returns
the largest mol.
- pahelix.utils.compound_tools.mol_to_graph_data(mol, add_self_loop=True)[source]¶
- Converts rdkit mol object to graph data which is a dict of numpy ndarray.NB: Uses simplified atom and bond features, and represent as indices.
- Parameters
mol – rdkit mol object.
add_self_loop – whether to add self loop or not.
- Returns
a dict of numpy ndarray for the graph data. It consists of atom attibutes, edge attibutes and edge index.
4.2. data_utils¶
- pahelix.utils.data_utils.get_part_files(data_path, trainer_id, trainer_num)[source]¶
Split the files in data_path so that each trainer can train from different examples.
4.3. language_model_tools¶
4.4. paddle_utils¶
4.5. protein_tools¶
Tools for protein features.
- class pahelix.utils.protein_tools.ProteinTokenizer[source]¶
Protein Tokenizer.
- convert_token_to_id(token)[source]¶
Converts a token to an id.
- Parameters
token – Token.
- Returns
The id of the input token.
- Return type
id
- convert_tokens_to_ids(tokens)[source]¶
Convert multiple tokens to ids.
- Parameters
tokens – The list of tokens.
- Returns
The id list of the input tokens.
- Return type
ids
4.6. splitters¶
- class pahelix.utils.splitters.RandomSplitter[source]¶
Random splitter.
- split(dataset, frac_train=None, frac_valid=None, frac_test=None, seed=None)[source]¶
- Parameters
dataset (InMemoryDataset) – the dataset to split.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
seed (int|None) – the random seed.
- class pahelix.utils.splitters.IndexSplitter[source]¶
Split daatasets that has already been orderd. The first frac_train proportion is used for train set, the next frac_valid for valid set and the final frac_test for test set.
- split(dataset, frac_train=None, frac_valid=None, frac_test=None)[source]¶
- Parameters
dataset (InMemoryDataset) – the dataset to split.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
- class pahelix.utils.splitters.ScaffoldSplitter[source]¶
Adapted from https://github.com/deepchem/deepchem/blob/master/deepchem/splits/splitters.py
Split dataset by Bemis-Murcko scaffolds
- split(dataset, frac_train=None, frac_valid=None, frac_test=None)[source]¶
- Parameters
dataset (InMemoryDataset) – the dataset to split. Make sure each element in the dataset has key “smiles” which will be used to calculate the scaffold.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
- class pahelix.utils.splitters.RandomScaffoldSplitter[source]¶
-
Split dataset by Bemis-Murcko scaffolds
- split(dataset, frac_train=None, frac_valid=None, frac_test=None, seed=None)[source]¶
- Parameters
dataset (InMemoryDataset) – the dataset to split. Make sure each element in the dataset has key “smiles” which will be used to calculate the scaffold.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
seed (int|None) – the random seed.
4.7. Helpful Link¶
Please refer to our GitHub repo to see the whole module.