QSAR is a computational methodology that seeks to establish a mathematical relationship between the chemical structure of compounds and their observed biological activities or physicochemical properties. The underlying principle is the “similarity principle”: structurally similar molecules are likely to exhibit similar biological effects. QSAR models are typically expressed as:
Activity = f(Descriptors) + error
where Activity is the biological response (e.g., inhibition constant, toxicity), Descriptors are numerical representations of molecular features, and f is a mathematical function (regression, classification, or machine learning algorithm)
QSAR models use descriptors—numerical values representing molecular features such as hydrophobicity, electronic properties, shape, and topology—to train statistical or machine‑learning algorithms. These models can then forecast key properties like potency, selectivity, toxicity, solubility, and metabolic stability.
QSAR models can be:
1. Regression models: Predict continuous outcomes (e.g., IC₅₀, logP).
2. Classification models: Predict categorical outcomes (e.g., toxic vs. non-toxic, active vs. inactive).
| Classification Types | Description |
|---|---|
| 2D‑QSAR | uses simple descriptors (logP, molecular weight, fingerprints) |
| 3D-QSAR | incorporates spatial and steric fields |
| ML‑based QSAR | uses algorithms like random forests, SVMs, and neural networks for higher predictive power |