1. pahelix.featurizers

1.1. featurizer

Compound datasets from pretrain-gnn.

class pahelix.featurizers.featurizer.Featurizer[source]

This is an abstract class for feature extraction.

It has two steps:

firstly gen_features is used to convert a single raw_data to a single data

secondly collate_fn is used to aggregate a list of data into a batch data.

collate_fn(batch_data_list)[source]

Aggregate batch_data_list into a batch data.

Parameters

batch_data_list (list) – a list of data generated by gen_features.

Returns

a dict of numpy ndarray.

Return type

batch_data(dict)

gen_features(raw_data)[source]

Convert raw_data into data, which is usually a process of feature extraction. Return None if failed.

Parameters

raw_data – can be any type.

Returns

a single data of any self-defined type. Return None if failed.

1.2. pretrain_gnn_featurizer

Featurizers for pretrain-gnn.
class pahelix.featurizers.pretrain_gnn_featurizer.PreGNNAttrMaskFeaturizer(graph_wrapper, atom_type_num=None, mask_ratio=None)[source]

Featurizer for attribute mask model of pretrain gnns

collate_fn(batch_data_list)[source]

Aggregate a list of graph data into a batch data

gen_features(raw_data)[source]

Convert smiles into graph data.

Returns

a dict of numpy ndarray consists of graph features.

Return type

data(dict)

class pahelix.featurizers.pretrain_gnn_featurizer.PreGNNContextPredFeaturizer(substruct_graph_wrapper, context_graph_wrapper, k, l1, l2)[source]

Featurizer for context pred model of pretrain gnns

collate_fn(batch_data_list)[source]

Aggregate a list of graph data into a batch data

gen_features(raw_data)[source]

Convert smiles into graph data.

Returns

a dict of numpy ndarray consists of graph features.

Return type

data(dict)

class pahelix.featurizers.pretrain_gnn_featurizer.PreGNNSupervisedFeaturizer(graph_wrapper)[source]

Featurizer for supervised model of pretrain gnns

collate_fn(batch_data_list)[source]

Aggregate a list of graph data into a batch data

gen_features(raw_data)[source]

Convert smiles into graph data.

Returns

a dict of numpy ndarray consists of graph features.

Return type

data(dict)

pahelix.featurizers.pretrain_gnn_featurizer.graph_data_obj_to_nx_simple(data)[source]

Converts graph data object into a network x data object.

NB: Uses simplified atom and bond features, and represent as indices.

NB: possible issues with recapitulating relative stereochemistry since the edges in the nx object are unordered.

Parameters

data (dict) – a dict of numpy ndarray consists of graph features.

Returns

a network x object

Return type

G

pahelix.featurizers.pretrain_gnn_featurizer.nx_to_graph_data_obj_simple(G)[source]

Converts nx graph to graph data. Assume node indices are numbered from 0 to num_nodes - 1.

NB: Uses simplified atom and bond features, and represent as indices.

NB: possible issues with recapitulating relative stereochemistry since the edges in the nx object are unordered.

Parameters

G – nx graph object

Returns

a dict of numpy ndarray consists of graph features.

Return type

data(dict)

pahelix.featurizers.pretrain_gnn_featurizer.reset_idxes(G)[source]

Resets node indices such that they are numbered from 0 to num_nodes - 1

Parameters

G – network x object.

Returns

copy of G with relabelled node indices. mapping:

Return type

new_G

pahelix.featurizers.pretrain_gnn_featurizer.transform_contextpred(data, k, l1, l2)[source]

Randomly selects a node from the data object, and adds attributes that contain the substructure that corresponds to k hop neighbours rooted at the node, and the context substructures that corresponds to the subgraph that is between l1 and l2 hops away from the root node.