
HOB - Human oral bio-availability, describes the fraction of an orally administered drug that reaches circulation. Predicting this feature effectively, as well as other ADMET properties can have tremendous impact on the drug discovery and development process, reducing failed experiments, time and costs. Traditional methods often struggle with generalization across diverse datasets. To tackle this challenge, we apply transfer learning using the GWEN platform and explore it's impact on adapting to unseen bio-chemical spaces.
For a drug to reach the circulatory system and perform its intended effect, it must successfully navigate a complex journey through the human body. This involves crossing several semi-permeable cell membranes and interacting with a variety of biochemical players, such as enzymes, efflux transporters, and metabolites. These interactions can alter the drug's structure and concentration, often reducing its ability to reach it's target, making the prediction of oral bio-availability a challenging but critical aspect of pharmaceutical development.
Predicting ADMET properties such as HOB is an area in drug development where AI can significantly impact, particularly through the use of advanced machine learning (ML) models like GWEN. The existing landscape of HOB prediction models, largely based on Quantitative Structure-Property Relationships (QSPR) and standard ML techniques, faces considerable challenges due to the complexity of the molecular space.

Apart from the crazy complicated system involved, another challenge in HOB prediction is the low availability and quality of training data. There are some promising works that have attempted to predict HOB, but work with limited public datasets. Sufficient diversity and quality of training data greatly determine the performance and generalization ability of ML models.
To explore the generalization ability of current solutions we gathered HOB labeled datasets from related works in literature. Our benchmarks include the work [KNIME] which has curated the currently largest publicly available HOB dataset with ~1400 molecules, as well as the works of [ICDrug] and [AdmetSAR]. These works utilize molecular fingerprints such as MACCS, Morgan and AtomParis and ML algorithms including random forest (RF), support vector machine (SVM) and k-nearest neighbors (kNN).
Reflecting on the success of large language models (LLM's) like GPT, which have revolutionized the NLP domain, we can notice the shift that brought forth tremendous capabilities with foundational models and Transfer Learning. Moving from specialized model development to fine-tuning of general models, enabled us to tackle separate problems in the NLP domain by solving the common problem of language comprehension.
In order to predict a molecular property such as bio-availability, given a small subset of the entire molecular space, we must first learn the language of molecules. Following the lessons learned in NLP, we address this challenge by utilizing Transfer Learning. To achieve this we fine-tune the foundational small molecules model GWEN, on the particular task of HOB prediction. GWEN is pretrained on a large unlabeled dataset of molecules and understands their language, enabling our fine-tuned HOB model to generalize to out-of-sample predictions.
متن بالا رو به فارسی ترجمه کن
ola