Compound solubility prediction in medicinal chemistry and drug discovery

Expert-driven In Silico Drug Discovery Solutions
8 May 2023
Svitlana Kondovych
Senior Researcher

Compound solubility is a crucial factor in medicinal chemistry and drug discovery, as it significantly influences the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of potential drug candidates. Poor solubility can limit a drug's bioavailability, with these limitations ultimately leading to its failure in clinical trials. Thus, accurate prediction of the compound solubility is of vital importance for successful drug discovery [1-3].

In response to this challenge, computational, or in silico, methods [4] have emerged as an important tool for predicting and calculating compound solubility, offering a cost- and time-efficient alternative to experimental methods. They can broadly be divided into analytical [5,6] and numerical [7] methods, which can be combined to enhance the reliability of predictions [8].

Computational methods are based on a range of statistical and thermodynamic approaches (Fig. 1). The most widely spread analytical methods for solubility prediction are quantitative structure-property relationships (QSPR) and thermodynamics-based methods involving the calculation of the solvation-free energy and solution of corresponding equations. In their turn, numerical methods comprise molecular dynamics (MD) simulations, quantum mechanics-based models, and machine learning.

Basic computational methods for solubility prediction 

Figure 1. Basic computational methods for solubility prediction.

QSPR models are built on the analysis of a large set of compounds with known solubility. In these models, mathematical equations show the relationship between the structural properties of the compounds and their solubility [5,9]. QSPR models have been applied to predict the solubility of various classes of compounds, including small molecules, peptides, and polymers. However, the accuracy of QSPR models depends on the quality and size of the training set used to develop the model.

Another analytical approach deals with the General Solubility Equation (GSE) or similar thermodynamic-based methods, which relate the solubility of a compound to its molecular structure and properties [6]. The GSE is based on the principle stating that the solubility of a compound depends on the balance between the enthalpy of dissolution and the entropy of mixing. It also involves a set of molecular descriptors to predict the solubility of organic compounds.

Molecular mechanics-based methods, such as the generalized Born Solvation model [10], are popular in drug discovery due to their simplicity and speed. These methods use empirical force fields to describe the interactions between atoms and molecules and to calculate the energy of a molecule in a given solvent environment, allowing for the prediction of the solvation-free energy and, therefore, the solubility.

MD simulations are founded on the analysis of the molecular interactions between the solute and solvent molecules while modeling the time evolution of a molecular system in real-time [11]. MD simulations can provide detailed information on the solubility of a compound, including the thermodynamic properties of the solvation process. However, MD simulations require significant computational resources and expertise, making them less accessible for most medicinal chemistry research groups.

Quantum mechanics-based methods [7], such as density functional theory, quantum Monte Carlo, or the polarizable continuum model, offer a more accurate approach by considering the electronic structure of the molecule and the surrounding solvent molecules. However, these methods are computationally expensive and may not be practical for large-scale screening of compound libraries.

Machine learning-based methods [12-13], such as Support Vector Machines, Random Forest, or Deep Learning algorithms are gaining popularity in drug discovery due to their ability to handle large datasets and provide accurate predictions with high throughput. These methods require a training set of experimentally measured solubility data to determine the relationship between molecular descriptors and solubility, which can then be applied to predict the solubility of new compounds.

Overall, despite the promise of computational methods for predicting compound solubility, there remain many issues to be addressed in terms of their reliability and accuracy. Among these key tasks, there is an essential need for accurate and diverse training sets of solubility data, as predictions can be influenced by their quality and representativeness. Moreover, such factors as crystal packing, polymorphism, and solubility-enhancing excipients can complicate solubility prediction, highlighting the top priority of careful validation of computational methods against experimental data.

At Life Chemicals, we successfully apply both thermodynamic and kinetic HTS solubility measurement methods. This service is available on request together with an array of complementary in vitro ADMET tests and customizable quality assurance services.

Additionally, we offer an off-the-shelf collection of soluble fragment-like molecules (Fig. 2):

Please, contact us at for any additional information and price quotations.

Visit our Website for a detailed product description.

Download SD files with compound structures directly from our Downloads section

Custom compound selection based on specific parameters can be performed on request, with competitive pricing and the most convenient terms provided.


  1. Savjani, K. T., Gajjar, A. K., & Savjani, J. K. (2012). Drug solubility: importance and enhancement techniques. ISRN pharmaceutics, 2012, 195727. DOI: 10.5402/2012/195727
  2. Li Di, Paul V. Fish, Takashi Mano. (2012) Bridging solubility between drug discovery and development, Drug Discovery Today, 17, 9–10, 2012, 486-495, DOI: 10.1016/j.drudis.2011.11.007
  3. Coltescu A. R., Butnariu M, Sarac I. (2020). The Importance of Solubility for New Drug Molecules. Biomed Pharmacol J;13(2). DOI: 10.13005/bpj/1920
  4. Das, T., Mehta, C. H., & Nayak, U. Y. (2020). Multiple approaches for achieving drug solubility: an in silico perspective. Drug discovery today, 25(7), 1206-1212. DOI: 10.1016/j.drudis.2020.04.016
  5. Gao H, Shanmugasundaram V, Lee P. (2002). Estimation of aqueous solubility of organic compounds with QSPR approach. Pharm Res. 19(4):497-503. DOI: 10.1023/a:1015103914543
  6. Ran Y., Jain N., and Yalkowsky S. H.. (2001). Prediction of Aqueous Solubility of Organic Compounds by the General Solubility Equation (GSE). Journal of Chemical Information and Computer Sciences 41 (5), 1208-1217. DOI: 10.1021/ci010287z
  7. Palmer, D. S.; McDonagh, J. L.; Mitchell, J. B. O.; van Mourik, T.; Fedorov, M. V. (2012). First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules. Journal of Chemical Theory and Computation. 8 (9): 3322–3337. DOI: 10.1021/ct300345m
  8. McDonagh, J. L.; Nath, N.; De Ferrari, L.; van Mourik, T.; Mitchell, J. B. O. (2014). Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules. Journal of Chemical Information and Modeling. 54 (3): 844–856. DOI: 10.1021/ci4005805
  9. Cheng, A., & Merz, K. M. (2003). Prediction of aqueous solubility of a diverse set of compounds using quantitative structure-property relationships. Journal of medicinal chemistry, 46(17), 3572-3580. DOI: 10.1021/jm020266b
  10. Tsui, V., & Case, D. A. (2000). Theory and applications of the generalized Born solvation model in macromolecular simulations. Biopolymers: Original Research on Biomolecules, 56(4), 275-291. DOI:;2-E
  11. Hossain, S., Kabedev, A., Parrow, A., Bergström, C. A., & Larsson, P. (2019). Molecular simulation as a computational pharmaceutics tool to predict drug solubility, solubilization process, es, and partitioning. European Journal of Pharmaceutics and Biopharmaceutics, 137, 46-55. DOI: 10.1016/j.ejpb.2019.02.007
  12. Ye, Z., Ouyang, D. Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms. J Cheminform 13, 98 (2021). DOI: 10.1186/s13321-021-00575-3
  13. Boobier, S., Hose, D.R.J., Blacker, A.J. et al. Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 11, 5753 (2020). DOI: 10.1038/s41467-020-19594-z
8 May 2023, 14:14 Svitlana Kondovych Computational Chemistry

Comments ()

    This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used. By using our website, you accept our conditions of use of cookies to track data and create content (including advertising) based on your interest. Accept