Tutorials

Backgrounds

Machine learning (ML), especially deep learning (DL), is playing an increasingly important role in the pharmaceutical industry and bio-informatics. For instance, the DL-based methodology is found to predict the drug target interaction and molecule properties with reasonable precision and quite low computational cost, while those properties can only be accessed through in vivo/ in vitro experiments or computationally expensive simulations (molecular dynamics simulation etc.) before. As another example, in silico RNA folding and protein folding are becoming more likely to be accomplished with the help of deep neural models. The usage of ML and DL can greatly improve efficiency, and thus reduce the cost of drug discovery, vaccine design, etc.

In contrast to the powerful ability of DL metrics, a key challenge lying in utilizing them in the drug industry is the contradiction between the demand for huge data for training and the limited annotated data. Recently, there is a tremendous success in adopting self-supervised learning in natural language processing and computer vision, showing that a large corpus of unlabeled data can be beneficial to learning universal tasks. In molecule representations, there is a similar situation. We have a large amount of unlabeled data, including protein sequences (over 100 million) and compounds (over 50 million) but relatively small annotated data. It is quite promising to adopt the DL-based pre-training technique in the representation learning of chemical compounds, proteins, RNA, etc.

PaddleHelix is a high-performance ML-based bio-computing framework. It features large-scale representation learning and easy-to-use APIs, providing pharmaceutical and biological researchers and engineers convenient access to the most up-to-date and state-of-the-art AI tools.

Tutorials

Run tutorials locally

The tutorials are written as Jupyter Notebooks and designed to be smoothly run on you own machine. If you don’t have Jupyter installed, please refer to here. And please also install PaddleHelix before proceeding (Installation guide).

After the installation of Jypyter, please go through the following steps:

  1. Clone this repository to your own machine

  2. Change the working directory of your shell to path_to_your_repo/PaddleHelix/tutorials/

  3. Open Jupyter lab with the command jupyter-lab, wait for your web browser being called out

  4. All the tutorials should be in the File Browser now, click and enjoy!