About

The book is intended for a graduate course on machine learning/artificial intelligence for drug discovery and development. It can also serve the graduate course “Machine Learning/Artificial Intelligence and its Application.” The target audiences are two groups:

Graduate (e.g., MS and PhD students) or advanced undergraduate students majoring in computer science, engineering, medicine, chemistry, biology, medical informatics, or biostatistics.
Data scientists from the pharmaceutical and biotech industry.

Our book covers a comprehensive overview of most of the essential topics in AI for drug discovery and development and starts from basic concepts in machine learning and pharmaceutical science. Thus it is more friendly to readers who do not have sufficient background knowledge.

We plan to write the book from three different aspects: data, method, and application.

Data We will cover all the different data types that are related to drug discovery and development, including drug target, genomics data, small molecules (SMILES, molecular graphs, fingerprint), biologics (amino acid sequence), clinical trials (drug, disease code, text-based feature), patient data (electronic health records, medical claims, wearables) and scientific literature.
Machine Learning Methods (special focus on deep learning methods) We will present different machine learning methods with a focus on deep learning. In particular, we will cover basic machine learning methods (supervised, unsupervised learning, model optimization, and training), representation learning (various deep learning models - MLP, RNN, CNN, Attention model, transformer, graph neural network, autoencoder), deep generative models (variation autoencoder, generative adversarial networks), and Combinatorial methods (Reinforcement learning, Genetic algorithm, Bayesian optimization)
Drug Discovery Applications We will introduce the different drug discovery and development tasks and how deep learning models can help. Specifically, we will cover DNA/RNA-protein binding, Target identification, Small molecule design ( De novo small-molecule design, lead optimization, property prediction: ADMET, QSAR, adverse drug effect (drug side effect) prediction, virtual screening, retrosynthesis, drug combination prediction (synergy), drug-drug interaction, drug-target interaction prediction), large molecule design (protein sequence learning, biologics property prediction, epitope/antetope prediction prediction, protein amino acid sequence prediction, protein 3D structure prediction, antibody design).

Table of Content (tentative)

Data
- Target protein
- Small-molecule drug
- Biologics
- Clinical trial data
- Literature data
Machine learning Basics
- Supervised Learning
- Unsupervised Learning
- Numerical optimization
- Data split
- Hyperparameter
- Ensemble methods
Deep learning methods
- Multiple Layer Perceptron (MLP)
- Convolutional neural network (CNN)
- Recurrent neural network (RNN)
- Graph neural network (GNN)
- Embedding
- Attention mechanism
- Transformer
- Memory network
Advanced Machine learning methods
- Variational Auto-Encoder (VAE)
- Generative Adversarial Network (GAN)
- Normalizing Flow model
- Reinforcement Learning (RL)
- Genetic algorithm (GA)
- Bayesian optimization (BO)
- Self-supervised learning and pretraining
Small-molecule drug discovery
- virtual screening and high-throughput screening
- drug property prediction: ADMET
- de novo drug design
- lead optimization
Large-molecule drug discovery
- protein property prediction
- protein design
Drug Development
- Clinical trial basics
- clinical trial outcome prediction
- Drug repurposing
- Drug combination
- Patient recruitment and Patient-trial matching
- Survival analysis
- Clinical Trial site Selection

Please follow us on Twitter: Tianfan, Danica, Jimeng for the latest news.

Tentative Release Dates

2023

Authors

Tianfan Fu
Assistant Professor at Rensselaer Polytechnic Institute (RPI)

Cao (Danica) Xiao
VP of AI at GE Healthcare

Jimeng Sun
Professor at University of Illinois Urbana-Champaign

Suggestion

Any feedbacks, suggestions and comments for improve our paper are warmly welcome! Please feel free to reach out to us via the google form.

Textbook: Machine Learning for Drug Discovery and Development

A Textbook

About

Table of Content (tentative)

Follow Us

Tentative Release Dates

Authors

Tianfan Fu
Assistant Professor at Rensselaer Polytechnic Institute (RPI)

Cao (Danica) Xiao
VP of AI at GE Healthcare

Jimeng Sun
Professor at University of Illinois Urbana-Champaign

Suggestion

About

Table of Content (tentative)

Follow Us

Tentative Release Dates

Authors

Tianfan FuAssistant Professor at Rensselaer Polytechnic Institute (RPI)

Cao (Danica) XiaoVP of AI at GE Healthcare

Jimeng SunProfessor at University of Illinois Urbana-Champaign

Suggestion

Tianfan Fu
Assistant Professor at Rensselaer Polytechnic Institute (RPI)

Cao (Danica) Xiao
VP of AI at GE Healthcare

Jimeng Sun
Professor at University of Illinois Urbana-Champaign