4. pahelix.utils¶
4.1. basic_utils¶
4.2. compound_tools¶
- class pahelix.utils.compound_tools.Compound3DKit[source]¶
the 3Dkit of Compound
- pahelix.utils.compound_tools.check_smiles_validity(smiles)[source]¶
Check whether the smile can’t be converted to rdkit mol object.
- pahelix.utils.compound_tools.create_standardized_mol_id(smiles)[source]¶
- Parameters:
smiles – smiles sequence.
- Returns:
inchi.
- pahelix.utils.compound_tools.get_gasteiger_partial_charges(mol, n_iter=12)[source]¶
Calculates list of gasteiger partial charges for each atom in mol object.
- Parameters:
mol – rdkit mol object.
n_iter (int) – number of iterations. Default 12.
- Returns:
list of computed partial charges for each atom.
- pahelix.utils.compound_tools.get_largest_mol(mol_list)[source]¶
Given a list of rdkit mol objects, returns mol object containing the largest num of atoms. If multiple containing largest num of atoms, picks the first one.
- Parameters:
mol_list (list) – a list of rdkit mol object.
- Returns:
the largest mol.
- pahelix.utils.compound_tools.mol_to_geognn_graph_data(mol, atom_poses, dir_type)[source]¶
mol: rdkit molecule dir_type: direction type for bond_angle grpah
- pahelix.utils.compound_tools.mol_to_graph_data(mol)[source]¶
- Parameters:
atom_features – Atom features.
edge_features – Edge features.
morgan_fingerprint – Morgan fingerprint.
functional_groups – Functional groups.
- pahelix.utils.compound_tools.new_mol_to_graph_data(mol)[source]¶
mol_to_graph_data
- Parameters:
atom_features – Atom features.
edge_features – Edge features.
morgan_fingerprint – Morgan fingerprint.
functional_groups – Functional groups.
- pahelix.utils.compound_tools.new_smiles_to_graph_data(smiles, **kwargs)[source]¶
Convert smiles to graph data.
- pahelix.utils.compound_tools.rdchem_enum_to_list(values)[source]¶
values = {0: rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED, 1: rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW, 2: rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW, 3: rdkit.Chem.rdchem.ChiralType.CHI_OTHER}
4.3. data_utils¶
- pahelix.utils.data_utils.get_part_files(data_path, trainer_id, trainer_num)[source]¶
Split the files in data_path so that each trainer can train from different examples.
4.4. language_model_tools¶
4.5. protein_tools¶
- class pahelix.utils.protein_tools.ProteinTokenizer[source]¶
Protein Tokenizer.
- convert_token_to_id(token)[source]¶
Converts a token to an id.
- Parameters:
token – Token.
- Returns:
The id of the input token.
- Return type:
id
- convert_tokens_to_ids(tokens)[source]¶
Convert multiple tokens to ids.
- Parameters:
tokens – The list of tokens.
- Returns:
The id list of the input tokens.
- Return type:
ids
4.6. splitters¶
- class pahelix.utils.splitters.RandomSplitter[source]¶
Random splitter.
- split(dataset, frac_train=None, frac_valid=None, frac_test=None, seed=None)[source]¶
- Parameters:
dataset (InMemoryDataset) – the dataset to split.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
seed (int|None) – the random seed.
- class pahelix.utils.splitters.IndexSplitter[source]¶
Split daatasets that has already been orderd. The first frac_train proportion is used for train set, the next frac_valid for valid set and the final frac_test for test set.
- split(dataset, frac_train=None, frac_valid=None, frac_test=None)[source]¶
- Parameters:
dataset (InMemoryDataset) – the dataset to split.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
- class pahelix.utils.splitters.ScaffoldSplitter[source]¶
Adapted from https://github.com/deepchem/deepchem/blob/master/deepchem/splits/splitters.py
Split dataset by Bemis-Murcko scaffolds
- split(dataset, frac_train=None, frac_valid=None, frac_test=None)[source]¶
- Parameters:
dataset (InMemoryDataset) – the dataset to split. Make sure each element in the dataset has key “smiles” which will be used to calculate the scaffold.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
- class pahelix.utils.splitters.RandomScaffoldSplitter[source]¶
-
Split dataset by Bemis-Murcko scaffolds
- split(dataset, frac_train=None, frac_valid=None, frac_test=None, seed=None)[source]¶
- Parameters:
dataset (InMemoryDataset) – the dataset to split. Make sure each element in the dataset has key “smiles” which will be used to calculate the scaffold.
frac_train (float) – the fraction of data to be used for the train split.
frac_valid (float) – the fraction of data to be used for the valid split.
frac_test (float) – the fraction of data to be used for the test split.
seed (int|None) – the random seed.
4.7. Helpful Link¶
Please refer to our GitHub repo to see the whole module.