Skip to main content
  • Research article
  • Open access
  • Published:

Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries

Abstract

Background

Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates.

Results

We evaluated support vector machines (SVM) as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33%) of 13.56M PubChem, 1,496 (0.89%) of 168 K MDDR, and 719 (7.73%) of 9,305 MDDR compounds similar to the known inhibitors.

Conclusions

SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.

Background

Src promotes tumour invasion and metastasis, facilitates VEGF-mediated angiogenesis and survival in endothelial cells, and enhances growth factor driven proliferation in fibroblasts [1]. It is one of the multiple kinase targets of a number of multi-target kinase inhibitors effective in the clinical treatment of leukemia and in clinical trials of other cancers [24]. The successes and problems of these inhibitors have raised significant interest and efforts in discovering new Src inhibitors [57]. Several in-silico methods have been used for facilitating the search and design of Src inhibitors, which include pharmacophore [8], Quantitative Structure Activity Relationship (QSAR) [9], and molecular docking [6].

While these in-silico methods have shown impressive capability in the identification of potential Src inhibitors, their applications may be affected by such problems as the vastness and sparse nature of chemical space needing to be searched, complexity and flexibility of target structures, difficulties in accurately estimating binding affinity and solvation effects on molecular binding, and limited representativeness of training active compounds [1012]. It is desirable to explore other in-silico methods that complement these methods by expanded coverage of chemical space, increased screening speed, and reduced false-hit rates without necessarily relying on the modelling of target structural flexibility, binding affinity and salvation effects.

Support vector machines (SVM) has recently been explored as a promising ligand-based virtual screening (VS) method that produces high yields and low false-hit rates in searching active agents of single and multiple mechanisms from large compound libraries [13] and in identifying active agents of diverse structures [1317]. Good VS performance can also be achieved by SVM trained from sparsely distributed active compounds [18]. SVM classifies active compounds based on the separation of active and inactive compounds in a hyperspace constructed by their physicochemical properties rather than structural similarity to active compounds per se, which has the advantage of not relying on the accurate computation of structural flexibility, activity-related features, binding affinity and solvation effects. Moreover, the fast speed of SVM enables efficient search of vast chemical space. Therefore, SVM may be a potentially useful VS tool to complement other in-silico methods for searching Src inhibitors from large libraries.

In this work, we developed a SVM VS model for identifying Src inhibitors, and evaluated its performance by both 5-fold cross validation test and large compound database screening test. In 5-fold cross validation test, a dataset of Src inhibitors and non-inhibitors was randomly divided into 5 groups of approximately equal size, with 4 groups used for training a SVM VS tool and 1 group used for testing it, and the test process is repeated for all 5 possible compositions to derive an average VS performance. In large database screening test, a SVM VS tool was developed by using Src inhibitors published before 2011, its yield (percent of known inhibitors identified as virtual-hits) was estimated by using Src inhibitors reported since 2011 and not included in the training datasets, virtual-hit rate and false-hit rate in searching large libraries were evaluated by using 13.56M PubChem and 168K MDDR compounds, and an additional set of 9,305 MDDR compounds similar in structural and physicochemical properties to the known Src inhibitors.

Moreover, VS performance of SVM was compared to those of two similarity-based VS methods, Tanimoto similarity searching and k nearest neighbour (kNN), and an alternative but equally popularly used machine learning method, probabilistic neural network (PNN) method, based on the same training and testing datasets (same sets of PubChem and MDDR compounds) and molecular descriptors. In a study that compares the performance of SVM to 16 classification methods and 9 regression methods, it has been reported that SVMs shows mostly good performances both on classification and regression tasks, but other methods proved to be very competitive [19]. Therefore, it is useful to evaluate the VS performance of SVM in searching large compound libraries by comparison with those of both similarity-based approaches and other typical machine learning method.

PubChem and MDDR contain high percentages of inactive compounds significantly different from the known Src inhibitors, and the easily distinguishable features may make VS enrichments artificially good [20]. Therefore, VS performance may be more strictly tested by using subsets of compounds that resemble the physicochemical properties of the known Src inhibitors so that enrichment is not simply a separation of trivial physicochemical features [21]. To further evaluate whether our SVM VS tool predict Src inhibitors and non-inhibitors rather than membership of certain compound families, distribution of the predicted active and inactive compounds in the compound families were analyzed.

Materials and methods

Compound collections and construction of training and testing datasets

We collected 1,703 Src inhibitors reported before 2011, with IC50<10 μM, from the literatures [2226] and the BindingDB database [27]. The inhibitor selection criterion of IC50<10 μM was used because it covers most of the reported HTS and VS hits [28, 29]. The structures of representative Src inhibitors are shown in Figure 1. As few non-inhibitors have been reported, putative non-inhibitors were generated by using our method for generating putative inactive compounds [13, 18]. This method requires no knowledge of known inactive compounds and active compounds of other target classes, which enables more expanded coverage of the “non-inhibitor” chemical space. Although the yet-to-be-discovered inhibitors are likely distributed in some of these “non-inhibitor” families, a substantial percentage of these inhibitors are expected to be identified as inhibitors rather than non-inhibitors even-though representatives of their families are putatively assigned as non-inhibitors [13]. 13.56M PubChem and 168 K MDDR compounds were grouped into 8,423 compound families by clustering them in the chemical space defined by their molecular descriptors [30, 31]. The number of generated families is consistent with the 12,800 compound-occupying neurons (regions of topologically close structures) for 26.4 million compounds of up to 11 atoms [32], and the 2,851 clusters for 171,045 natural products [33].

Figure 1
figure 1

The structures of representative c-Src inhibitors.

Our collected Src inhibitors are distributed in 493 families. Because of the extensive efforts in searching kinase inhibitors from known compound libraries, the number of undiscovered Src inhibitor families in PubChem and MDDR databases is expected to be relatively small, most likely no more than several hundred families. The ratio of the discovered and undiscovered inhibitor families (hundreds) and the families that contain no known Src inhibitor (8,423 based on the current versions of PubChem and MDDR) is expected to be <15%. Therefore, putative non-inhibitor training dataset can be generated by extracting a few representative compounds from each of those families that contain no known inhibitor, with a maximum possible “wrong” classification rate of <15% even when all of the undiscovered inhibitors are misplaced into the non-inhibitor class. The noise level generated by up to 15% “wrong” negative family representation is expected to be substantially smaller than the maximum 50% false-negative noise level tolerated by SVM [16]. Based on earlier studies [13, 18] and this work, it is expected that a substantial percentage of the un-discovered inhibitors in the putative “non-inhibitor” families can be classified as inhibitor despite their family representatives are placed into the non-inhibitor training sets.

In the database screening test, 60.1% of the families that contain Src inhibitors reported since 2011 [3439] are not covered by the Src inhibitor training dataset (inhibitors reported before 2011). The representative compounds of these families, none of which happen to be Src inhibitor, were deliberately placed into the inactive training sets because the inhibitors in these families are not supposed to be known in our study. As shown in earlier studies [13, 18] and in this work, a substantial percentage of the inhibitors in these misplaced inhibitor-containing “non-inhibitor” families were predicted as inhibitors by our SVM VS tool. Moreover, a small percentage of the compounds in these putative non-inhibitor datasets are expected to be un-reported and un-discovered inhibitors, their presence in these datasets is not expected to significantly affect the estimated false hit rate of SVM.

Molecular descriptors

Molecular descriptors are quantitative representations of structural and physicochemical features of molecules, which have been extensively used in deriving structure-activity relationships [40, 41], quantitative structure activity relationships [42, 43] and VS tools [4451]. A total of 98 1D and 2D descriptors derived by using our software [52] were used in this work. These descriptors and the relevant references are given in Table 1, which include 18 descriptors in the class of simple molecular properties, 3 descriptors in the class of chemical properties, 35 descriptors in the class of molecular connectivity and shape, 42 descriptors in the class of electro-topological state.

Table 1 Molecular descriptors used in this work

Support vector machines method

The process of training and using a SVM VS model for screening compounds based on their molecular descriptors is schematically illustrated in Figure 2. SVM is based on the structural risk minimization principle of statistical learning theory [57, 58], which consistently shows outstanding classification performance, is less penalized by sample redundancy, and has lower risk for over-fitting [59, 60]. In linearly separable cases, SVM constructs a hyper-plane to separate active and inactive classes of compounds with a maximum margin. A compound is represented by a vector x i composed of its molecular descriptors. The hyper-plane is constructed by finding another vector w and a parameter b that minimizes w2 and satisfies the following conditions:

w · x i + b + 1 , for y i = + 1 Class 1 active
(1)
w · x i + b 1 , for y i = 1 Class 2 inactive
(2)

where y i is the class index, w is a vector normal to the hyperplane, |b|/w is the perpendicular distance from the hyperplane to the origin and w2 is the Euclidean norm of w. Base on w and b, a given vector x can be classified by f(x) = sign[(w · x) + b. A positive or negative f( x ) value indicates that the vector x belongs to the active or inactive class respectively.

Figure 2
figure 2

The process of training and using a SVM VS model for screening compounds. Schematic diagram is illustrating the process of the training a prediction model and using it for predicting active compounds of a compound class from their structurally-derived properties (molecular descriptors) by using support vector machines. A, B, E, F and (hj, pj, vj,…) represents such structural and physicochemical properties as hydrophobicity, volume, polarizability, etc.

In nonlinearly separable cases, which almost always occur in classifying compounds of diverse structures [1417, 50, 6163], SVM maps the input vectors into a higher dimensional feature space by using a kernel function K(x i , x j ). We used RBF kernel K x i , x j = e x j x i 2 / 2 σ 2 which has been extensively used and consistently shown better performance than other kernel functions [6466]. Linear SVM can then applied to this feature space based on the following decision function: f x = s i g n i = 1 l α i 0 y i K ( x , x i ) + b , where the coefficients α i 0 and b are determined by maximizing the following Langrangian expression: i = 1 l α i 1 2 i = 1 l j = 1 l α i α j y i y j K x i , x j under the conditions α i  ≥ 0 and i = 1 l α i y i = 0 . A positive or negative f( x ) value indicates that the vector x belongs to the active or inactive class respectively.

In developing our SVM VS tool, a hard margin c=100,000 was used. The margin parameter c is penalty parameter that controls the trade-off between the training errors and sample separation. Increasing c imposes a higher penalty for training errors. Our chosen value corresponds to a very high penalty. The performance of SVM was evaluated by 5-fold cross-validation test. Table 2 shows the results of the 5-fold cross validation of SVM VS models of Src inhibitors and putative non-inhibitors. After the 5-fold cross-validation, the σ values were found to be 1.2 based on the average VS performance for the model development. Its performance indicators can be derived from the numbers of true positives TP (true inhibitors), true negatives TN (true non-inhibitors), false positives FP (false inhibitors), and false negatives FN (false non-inhibitors). Src inhibitor and non-inhibitor prediction accuracies are given by sensitivity SE=TP/(TP+FN)*100 and specificity SP=TN/(TN+FP)*100 respectively. Prediction accuracies have also been frequently measured by overall prediction accuracy (Q) and Matthews correlation coefficient (C)[67]

Q = T P + T N T P + T N + F P + F N
(3)
C = T P * T N F N * F P T P + F N T P + F P T N + F N T N + F P
(4)

In the large database screening tests, the yield and false-hit rate are given by TP/(TP+FN) and FP/(TP+FP) respectively.

Table 2 Performance of SVM for identifying Src inhibitors and non-inhibitors evaluated by 5-fold cross validation study

Tanimoto similarity searching method

Compounds similar to at least one known Src inhibitor in a training dataset can be identified by using the Tanimoto coefficient sim(i,j)[68]

s i m i , j = d = 1 l x di x dj d = 1 l x di 2 + d = 1 l x dj 2 d = 1 l x di x dj
(5)

where l is the number of molecular descriptors. A compound i is considered to be similar to a known active j in the active dataset if the corresponding sim(i,j) value is greater than a cut-off value. In this work, the similarity search was conducted for MDDR compounds. Therefore, in computing sim(i,j), the molecular descriptor vectors x i s were scaled with respect to all of the MDDR compounds. The cut-off values for similarity compounds are typically in the range of 0.8 to 0.9 [21, 69]. A stricter cut-off value of 0.9 was used in this study.

K-nearest neighbour method

kNN measures the Euclidean distance D = x x i 2 between a compound x and each individual inhibitor or non-inhibitor x i in the training set[70]. A total of k number of vectors nearest to the vector x are used to determine the decision function f( x ):

f ^ x arg max v V i = 1 k δ v , f x i
(6)

Where δ(a,b)=1 if a=b and δ(a,b)=0 if a≠b, argmax is the maximum of the function, V is a finite set of vectors {v1,…,vs} and f ^ x is an estimate of f(x). Here estimate refers to the class of the majority compound group (i.e. inhibitors or non-inhibitors) of the k nearest neighbours. The performance of kNN was evaluated by 5-fold cross-validation in the same manner as in SVM and Table 3 shows the results of the 5-fold cross-validation results of kNN model. After the 5-fold cross-validation, the parameter k=1 was found to give the best performance of this work.

Table 3 Performance of kNN for identifying Src inhibitors and non-inhibitors evaluated by 5-fold cross validation study

Probabilistic neural network method

PNN is a form of neural network that classifies objects based on Bayes’ optimal decision rule[71]h i c i f i (x) > h j c j f j (x), where h i and h j are the prior probabilities, c i and c j are the costs of misclassification and f i (x) and f j (x) are the probability density function for class i and j respectively. A compound x is classified into class i if the product of all the three terms is greater for class i than for any other class j (not equal to i). In most applications, the prior probabilities and costs of misclassifications are treated as being equal. The probability density function for each class for a univariate case can be estimated by using the Parzen’s nonparametric estimator [72].

g x = 1 n σ i = 1 n W x x i σ
(7)

where n is the sample size, σ is a scaling parameter which defines the width of the bell curve that surrounds each sample point, W(d) is a weight function which has its largest value at d = 0 and (xx i ) is the distance between the unknown vector and a vector in the training set. The Parzen’s nonparametric estimator was later expanded by Cacoullos [73] for the multivariate case.

g x 1 , , x p = 1 n σ 1 σ p i = 1 n W ( x 1 x 1 , i σ 1 , , x p x p , i σ p )
(8)

The Gaussian function is frequently used as the weight function because it is well behaved, easily calculated and satisfies the conditions required by Parzen’s estimator. Thus the probability density function for the multivariate case becomes

g x = 1 n i = 1 n exp ( j = 1 p x j x ij σ j 2 )
(9)

The network architectures of PNN are determined by the number of compounds and descriptors in the training set. There are 4 layers in a PNN. The input layer provides input values to all neurons in the pattern layer and has as many neurons as the number of descriptors in the training set. The number of pattern neurons is determined by the total number of compounds in the training set. Each pattern neuron computes a distance measure between the input and the training case represented by that neuron and then subjects the distance measure to the Parzen’s nonparameteric estimator. The summation layer has a neuron for each class and the neurons sum all the pattern neurons’ output corresponding to members of that summation neuron’s class to obtain the estimated probability density function for that class. The single neuron in the output layer then estimates the class of the unknown compound x by comparing all the probability density function from the summation neurons and choosing the class with the highest probability density function. The performance of PNN was validated by 5-fold cross-validation in the same manner as in SVM model development. Table 4 shows the results of the 5-fold cross-validation of PNN model. After the 5-fold cross-validation, the parameter of the developed PNN models was chosen as 0.02.

Table 4 Performance of PNN for identifying Src inhibitors and non-inhibitors evaluated by 5-fold cross validation study

Results and discussion

Performance of SVM, kNN and PNN identification of Src inhibitors based on 5-fold cross validation test

The parameters of our SVM, kNN and PNN models were determined by 5-fold cross-validation studies of Src inhibitors and non-inhibitors. The results of these tests for SVM, kNN and PNN are shown in Tables 234 and Figure 3 respectively. Overall, the sensitivity of SVM, kNN and PNN is in the range of 93.53%~95.01%, 88.56%~92.94% and 93.53%~97.06%, the specificity in the range of 99.81%~99.90%, 99.57%~99.77% and 97.76%~98.03%, and overall accuracy Q in the range of 99.67%~99.76%, 99.35%~99.48% and 97.69%~97.91% respectively. The inhibitor accuracies of our SVM are comparable to or slightly better than the reported accuracies of 58.3%~67.3% for protein kinase C inhibitors by SVM-RBF and CKD methods [74], 83% for Lck inhibitors by SVM method [75], and 74%~87% for inhibitors of any of the 8 kinases (3 Ser/Thr and 5 Tyr kinases) by SVM, ANN, GA/kNN, and RP methods [76]. The non-inhibitor accuracies are comparable to the value of 99.9% for Lck inhibitors [75] and substantially better than the typical values of 77%~96% of other studies [74, 76]. Caution needs to be exercised about straightforward comparison of these results, which might be misleading because the outcome of VS strongly depends on the datasets and molecular descriptors used. Based on these rough comparisons, SVM appears to show good capability in identifying Src inhibitors at low false-hit rates.

Figure 3
figure 3

Performance for identifying Src inhibitors evaluated by 5-fold cross validation study across methods. Figure 3 is illustrating the 5-fold cross-validation studies of Src inhibitors across methods with the averaged sensitivity together with their respective error bars.

Virtual screening performance of SVM in searching Src inhibitors from large compound libraries

As outlined in the methods section, we developed a SVM VS tool for searching Src inhibitors from large were developed by using Src kinases reported before 2011. The VS performance of SVM in identifying Src inhibitors reported since 2011 and in searching MDDR and PubChem databases is summarised in Table 5. The yield in searching Src inhibitors reported since 2011 is 70.45%, which is comparable to the reported 50%~94% yields of various VS tools [77]. Strictly speaking, direct comparison of the reported performances of these VS tools is inappropriate because of the differences in the type, composition and diversity of compounds screened, and in the molecular descriptors, VS tools and their parameters used. The comparison cannot go beyond the statistics of accuracies.

Table 5 Virtual screening performance of support vector machines for identifying Src inhibitors from large compound libraries

We also evaluated virtual-hit rates and false-hit rates of SVM in screening compounds that resemble the structural and physicochemical properties of the known Src inhibitors by using 9,305 MDDR compounds similar to an Src inhibitor in the training dataset. Similarity was defined by Tanimoto similarity coefficient ≥0.9 between a MDDR compound and its closest inhibitor [18]. This stricter similarity metric was used for conducting a stricter test of our SVM model. SVM identified 719 virtual-hits from these 9,305 MDDR similarity compounds (virtual-hit rate 7.73%), which suggests that SVM has some level of capability in distinguishing Src inhibitors from non-inhibitor similarity compounds. Significantly lower virtual-hit rates and thus false-hit rates were found in screening large libraries of 168 K MDDR and 13.56 M PubChem compounds. The numbers of virtual-hits and virtual-hit rates in screening 168 K MDDR compounds are 1,496 and 0.89% respectively. The numbers of virtual-hits and virtual-hit rates in screening 13.56 M PubChem compounds are 44,843 and 0.33% respectively.

Substantial percentages of the MDDR virtual-hits belong to the classes of antineoplastic, tyrosine-specific protein kinase inhibitors, signal transduction inhibitors, antiangiogenic, and antiarthritic (Table 6, details in next section). As some of these virtual-hits may be true Src inhibitors, the false-hit rate of our SVM is at most equal to and likely less than the virtual-hit rate. Hence the false-hit rate is <7.73% in screening 9,305 MDDR similarity compounds, <0.89% in screening 168 K MDDR compounds, and <0.33% in screening 13.56 M PubChem compounds, which are comparable and in some cases better than the reported false-hit rates of 0.0054%~8.3% of SVM [18, 78], 0.08%~3% of structure-based methods, 0.1%~5% by other machine learning methods, 0.16%~8.2% by clustering methods, and 1.15%~26% by pharmacophore models [77].

Table 6 MDDR classes that contain higher percentage (≥3%) of SVM virtual-hits and the percentage values

Experimental test of a SVM identified virtual-hit

Three virtual hits of the same novel scaffold from in-house libraries not found in the known the Src inhibitor were evaluated for inhibitory activity against Src. Src kinase was incubated with substrates, compounds and ATP in a final buffer of 25 mM HEPES (pH 7.4), 10 mM MgCl2, 0.01% Triton X-100, 100 μg/mL BSA, 2.5 mM DTT in 384-well plate with the total volume of 10 μl. The assay plate was incubated at 30°C for 1h and stopped with the addition of equal volume of kinase glo plus reagent. The luminescence was read at envision. The signal was correlated with the amount of ATP present in the reaction and was inversely correlated with the kinase activity. One of three virtual hits showing in Figure 4 was found to inhibit Src at a moderate rate of 4.85% at 20 μM.

Figure 4
figure 4

Virtual hit inhibiting Src at a moderate rate of 4.85% at 20 μM.

Evaluation of SVM identified MDDR virtual-hits

SVM identified MDDR virtual-hits were evaluated based on the known biological or therapeutic target classes specified in MDDR. Table 6 gives the MDDR classes that contain higher percentage (≥3%) of SVM virtual-hits and the percentage values. We found that 623 (41.6%) of the 1,496 virtual-hits belong to the antineoplastic class, which represent 2.9% of the 21,557 MDDR compounds in the class. In particular, 231 (15.4%) of the virtual-hits belong to the tyrosine-specific protein kinase inhibitor class, which represent 19.6% of the 1,181 MDDR compounds in the class. Moreover, 194 (13.0%) and 75 (5.0%) of the virtual-hits belong to the signal transduction inhibitor and antiangiogenic classes, representing 9.5% and 4.6% of the 2,037 and 1,629 members in these classes respectively. Therefore, many of the SVM virtual-hits are antineoplastic compounds that inhibit tyrosine kinases and possibly other kinases involved in signal transduction and angiogensis pathways. While some of these kinase inhibitors might be true Src inhibitors, a significant percentage of them are expected to arise from false selection of inhibitors of other kinases.

A total of 176 (11.8%) SVM virtual-hits belong to the antiarthritic class. A primary feature of rheumatoid arthritis in synovial tissues is the abnormal stimulation of fibrin deposition, angiogenesis and proinflammatory processes, which are promoted by thrombin increased IL-6 production via the PAR1 receptor/PI-PLC/PKC alpha/c-Src/NF-kappaB and p300 signaling pathways [79]. Therefore, Src inhibitors may have some effects against arthritis via interference with some of these processes. Moreover, several other kinases have been implicated in arthritis. An Abl inhibitor Gleevec has been reported to be effective in treatment of arthritis, which is probably due to its inhibition of other related kinases such as c-kit and PDGFR [80]. EGFR-like receptor stimulates synovial cells and its elevated activities may be involved in the pathogenesis of rheumatoid arthritis [78]. VEGF has been related to such autoimmune diseases as systemic lupus erythematosus, rheumatoid arthritis, and multiple sclerosis [81]. FGFR may partly mediates osteoarthritis [82]. PDGF-like factors stimulates the proliferative and invasive phenotype of rheumatoid arthritis synovial connective tissue cells [83]. Lck inhibition leads to immunosuppression and has been explored for the treatment of rheumatoid arthritis and asthma [84]. Therefore, some of the SVM virtual-hits in the antiarthritic class may be inhibitors of these kinases or their kinase-likes capable of producing antiarthritic activities.

Moreover, 83 (5.5%), 76 (5.1%), 55 (3.7%) and 49 (3.3%) of the SVM virtual hits are in the antiallergic/antiasthmatic, antihypertensive, osteoporosis treatment and antidepressant classes respectively. Src or Src family kinases have been implicated in and the respective inhibitors have shown observable effects against these diseases. For instance, Src family kinases and lipid mediators have been found to partly control allergic inflammation [85]. Inhibition of Src family kinase-dependent signaling cascades in mast cells may exert anti-allergic activity [86]. Up-regulation of Src signaling has been suggested to be important in the profibrotic and proinflammatory actions of aldosterone in a genetic model of hypertension, which can be significantly reduced by mineralocorticoid receptor blocker and Src inhibitor [87]. Src signalling pathways play critical roles in osteoclasts and osteoblasts, and Src inhibitors have been developed as therapeutic agents for bone diseases [88, 89]. Src-family protein tyrosine kinases negatively regulate cerebellar long-term depression, which can be recovered by the application of Src-family protein tyrosine kinase inhibitors [90]. Therefore, some of the SVM virtual hits in these four MDDR classes may be Src inhibitors or Src family kinase inhibitors capable of regulating allergic inflammation, hypertension, osteoporosis and depression respectively.

Comparison of virtual screening performance of SVM with those of other vrtual screening methods

To evaluate the level of performance of SVM and whether the performance is due to the SVM classification models or to the molecular descriptors used, SVM results were compared with those of three other VS methods based on the same molecular descriptors, training dataset of Src inhibitors reported before 2011, and the testing dataset of Src inhibitors reported since 2011 and 168K MDDR compounds. The three other VS methods include two similarity-based methods, Tanimoto-based similarity searching and kNN methods, and an alternative machine learning method PNN. As shown in Table 7, the yield and maximum possible false-hit rate of the Tanimoto-based similarity searching, kNN and PNN methods are 36.84% and 5.54%, 38.64% and 2.49%, and 50.00% and 2.60% respectively. Compared to these results, the yield of SVM is better than these similarity-based VS method, and the false-hit rate of SVM is significantly reduced by 6.22, 2.80, and 2.92 fold respectively. These suggests that SVM performance is due primarily to the SVM classification models rather than the molecular descriptors used, and SVM is capable of achieving comparable yield at substantially reduced false-hit rate as compared to both similarity-based approach and alternative machine learning method. Our results are consistent with the report that SVM shows mostly good performances both on classification and regression tasks, but other classification and regression methods proved to be very competitive [19].

Table 7 Comparison of virtual screening performance of SVM with those of other methods

Does SVM select Src inhibitors or membership of compound families?

To further evaluate whether SVM identifies Src inhibitors rather than membership of certain compound families, compound family distribution of the identified Src inhibitors and non-inhibitors were analyzed. 34.1% of the identified inhibitors belong to the families that contain no known Src inhibitors. For those families that contain at least one known Src inhibitor, >70% of the compounds (>90% in majority cases) in each of these families were predicted as non-inhibitor by SVM. These results suggest that SVM identify Src inhibitors rather than membership to certain compound families. Some of the identified inhibitors not in the family of known inhibitors may serve as potential “novel” Src inhibitors. Therefore, as in the case shown by earlier studies [13], SVM has certain capacity for identifying novel active compounds from sparse as well as regular-sized active datasets.

Conclusions

Our study suggested that SVM is capable of identifying Src inhibitors at comparable yield and in many cases substantially lower false-hit rate than those of typical VS tools reported in the literatures. It can be used for searching large compound libraries at sizes comparable to the 13.56 M PubChem and 168 K MDDR compounds at low false-hit rates. The performance of SVM is substantially improved against several other VS method based on the same datasets and molecular descriptors, suggesting that the VS performance of SVM is primarily due to SVM classification models rather than the molecular descriptors used. Three SVM virtual hits of the same novel scaffold were experimentally tested, one of which showed moderate Src inhibition rate. Because of its high computing speed and generalization capability for covering highly diverse spectrum compounds, SVM can be potentially explored to develop useful VS tools to complement other VS methods or to be used as part of integrated VS tools in facilitating the discovery of Src inhibitors and other active compounds [9193].

References

  1. Brunton VG, Frame MC: Src and focal adhesion kinase as therapeutic targets in cancer. Curr Opin Pharmacol. 2008, 8: 427-432. 10.1016/j.coph.2008.06.012.

    Article  CAS  Google Scholar 

  2. Gill AL, Verdonk M, Boyle RG, Taylor R: A comparison of physicochemical property profiles of marketed oral drugs and orally bioavailable anti-cancer protein kinase inhibitors in clinical development. Curr Top Med Chem. 2007, 7: 1408-1422. 10.2174/156802607781696819.

    Article  CAS  Google Scholar 

  3. Lee D, Gautschi O: Clinical development of SRC tyrosine kinase inhibitors in lung cancer. Clin Lung Cancer. 2006, 7: 381-384. 10.3816/CLC.2006.n.020.

    Article  CAS  Google Scholar 

  4. Hiscox S, Nicholson RI: Src inhibitors in breast cancer therapy. Expert Opin Ther Targets. 2008, 12: 757-767. 10.1517/14728222.12.6.757.

    Article  CAS  Google Scholar 

  5. Lin LG, Xie H, Li HL, Tong LJ, Tang CP, Ke CQ, Liu QF, Lin LP, Geng MY, Jiang H, et al: Naturally occurring homoisoflavonoids function as potent protein tyrosine kinase inhibitors by c-Src-based high-throughput screening. J Med Chem. 2008, 51: 4419-4429. 10.1021/jm701501x.

    Article  CAS  Google Scholar 

  6. Lee K, Kim J, Jeong KW, Lee KW, Lee Y, Song JY, Kim MS, Lee GS, Kim Y: Structure-based virtual screening of Src kinase inhibitors. Bioorg Med Chem. 2009, 17: 3152-3161. 10.1016/j.bmc.2009.02.054.

    Article  CAS  Google Scholar 

  7. Farard J, Lanceart G, Loge C, Nourrisson MR, Cruzalegui F, Pfeiffer B, Duflos M: Design, synthesis and evaluation of new 6-substituted-5-benzyloxy-4-oxo-4H-pyran-2-carboxamides as potential Src inhibitors. J Enzyme Inhib Med Chem. 2008, 23: 629-640. 10.1080/14756360802205299.

    Article  CAS  Google Scholar 

  8. Alfaro-Lopez J, Yuan W, Phan BC, Kamath J, Lou Q, Lam KS, Hruby VJ: Discovery of a novel series of potent and selective substrate-based inhibitors of p60c-src protein tyrosine kinase: conformational and topographical constraints in peptide design. J Med Chem. 1998, 41: 2252-2260. 10.1021/jm9707885.

    Article  CAS  Google Scholar 

  9. Chen P, Doweyko AM, Norris D, Gu HH, Spergel SH, Das J, Moquin RV, Lin J, Wityak J, Iwanowicz EJ, et al: Imidazoquinoxaline Src-family kinase p56Lck inhibitors: SAR, QSAR, and the discovery of (S)-N-(2-chloro-6-methylphenyl)-2-(3-methyl-1-piperazinyl)imidazo- [1,5-a]pyrido[3,2-e]pyrazin-6-amine (BMS-279700) as a potent and orally active inhibitor with excellent in vivo antiinflammatory activity. J Med Chem. 2004, 47: 4517-4529. 10.1021/jm030217e.

    Article  CAS  Google Scholar 

  10. Shoichet BK: Virtual screening of chemical libraries. Nature. 2004, 432: 862-865. 10.1038/nature03197.

    Article  CAS  Google Scholar 

  11. Ghosh S, Nie A, An J, Huang Z: Structure-based virtual screening of chemical libraries for drug discovery. Curr Opin Chem Biol. 2006, 10: 194-202. 10.1016/j.cbpa.2006.04.002.

    Article  CAS  Google Scholar 

  12. Li H, Yap CW, Ung CY, Xue Y, Li ZR, Han LY, Lin HH, Chen YZ: Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. J Pharm Sci. 2007, 96: 2838-2860. 10.1002/jps.20985.

    Article  CAS  Google Scholar 

  13. Han LY, Ma XH, Lin HH, Jia J, Zhu F, Xue Y, Li ZR, Cao ZW, Ji ZL, Chen YZ: A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model. 2008, 26: 1276-1286. 10.1016/j.jmgm.2007.12.002.

    Article  CAS  Google Scholar 

  14. Jorissen RN, Gilson MK: Virtual screening of molecular databases using a support vector machine. J Chem Inf Model. 2005, 45: 549-561. 10.1021/ci049641u.

    Article  CAS  Google Scholar 

  15. Lepp Z, Kinoshita T, Chuman H: Screening for new antidepressant leads of multiple activities by support vector machines. J Chem Inf Model. 2006, 46: 158-167. 10.1021/ci050301y.

    Article  CAS  Google Scholar 

  16. Glick M, Jenkins JL, Nettles JH, Hitchings H, Davies JW: Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers. J Chem Inf Model. 2006, 46: 193-200. 10.1021/ci050374h.

    Article  CAS  Google Scholar 

  17. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A: New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model. 2006, 46: 462-470. 10.1021/ci050348j.

    Article  CAS  Google Scholar 

  18. Ma XH, Wang R, Yang SY, Li ZR, Xue Y, Wei YC, Low BC, Chen YZ: Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. J Chem Inf Model. 2008, 48: 1227-1237. 10.1021/ci800022e.

    Article  CAS  Google Scholar 

  19. Mayer D, Leisch F, Hornik K: The support vector machine under test. Neurocomputing. 2003, 55: 169-186. 10.1016/S0925-2312(03)00431-4.

    Article  Google Scholar 

  20. Verdonk ML, Berdini V, Hartshorn MJ, Mooij WT, Murray CW, Taylor RD, Watson P: Virtual screening using protein-ligand docking: avoiding artificial enrichment. J Chem Inf Comput Sci. 2004, 44: 793-806. 10.1021/ci034289q.

    Article  CAS  Google Scholar 

  21. Huang N, Shoichet BK, Irwin JJ: Benchmarking sets for molecular docking. J Med Chem. 2006, 49: 6789-6801. 10.1021/jm0608356.

    Article  CAS  Google Scholar 

  22. Altmann E, Missbach M, Green J, Susa M, Wagenknecht HA, Widler L: 7-Pyrrolidinyl- and 7-piperidinyl-5-aryl-pyrrolo[2,3-d]pyrimidines–potent inhibitors of the tyrosine kinase c-Src. Bioorg Med Chem Lett. 2001, 11: 853-856. 10.1016/S0960-894X(01)00080-4.

    Article  CAS  Google Scholar 

  23. Widler L, Green J, Missbach M, Susa M, Altmann E: 7-Alkyl- and 7-cycloalkyl-5-aryl-pyrrolo[2,3-d]pyrimidines–potent inhibitors of the tyrosine kinase c-Src. Bioorg Med Chem Lett. 2001, 11: 849-852. 10.1016/S0960-894X(01)00079-8.

    Article  CAS  Google Scholar 

  24. Missbach M, Altmann E, Widler L, Susa M, Buchdunger E, Mett H, Meyer T, Green J: Substituted 5,7-diphenyl-pyrrolo[2,3d]pyrimidines: potent inhibitors of the tyrosine kinase c-Src. Bioorg Med Chem Lett. 2000, 10: 945-949. 10.1016/S0960-894X(00)00131-1.

    Article  CAS  Google Scholar 

  25. Klutchko SR, Hamby JM, Boschelli DH, Wu Z, Kraker AJ, Amar AM, Hartl BG, Shen C, Klohs WD, Steinkampf RW, et al: 2-Substituted aminopyrido[2,3-d]pyrimidin-7(8H)-ones. structure-activity relationships against selected tyrosine kinases and in vitro and in vivo anticancer activity. J Med Chem. 1998, 41: 3276-3292. 10.1021/jm9802259.

    Article  CAS  Google Scholar 

  26. Noronha G, Barrett K, Boccia A, Brodhag T, Cao J, Chow CP, Dneprovskaia E, Doukas J, Fine R, Gong X, et al: Discovery of [7-(2,6-dichlorophenyl)-5-methylbenzo [1,2,4]triazin-3-yl]-[4-(2-pyrrolidin-1-ylethoxy)phenyl]amine–a potent, orally active Src kinase inhibitor with anti-tumor activity in preclinical assays. Bioorg Med Chem Lett. 2007, 17: 602-608. 10.1016/j.bmcl.2006.11.006.

    Article  CAS  Google Scholar 

  27. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK: BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007, 35: D198-201. 10.1093/nar/gkl999.

    Article  CAS  Google Scholar 

  28. Keseru GM, Makara GM: The influence of lead discovery strategies on the properties of drug candidates. Nat Rev Drug Discov. 2009, 8: 203-212. 10.1038/nrd2796.

    Article  Google Scholar 

  29. Keseru GM, Makara GM: Hit discovery and hit-to-lead approaches. Drug Discov Today. 2006, 11: 741-748. 10.1016/j.drudis.2006.06.016.

    Article  Google Scholar 

  30. Bocker A, Schneider G, Teckentrup A: NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening. J Chem Inf Model. 2006, 46: 2220-2229. 10.1021/ci050541d.

    Article  Google Scholar 

  31. Oprea TI, Gottfries J: Chemography: the art of navigating in chemical space. J Comb Chem. 2001, 3: 157-166. 10.1021/cc0000388.

    Article  CAS  Google Scholar 

  32. Reymond TFJ-L: Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery. J Chem Inf Model. 2007, 47: 342-353. 10.1021/ci600423u.

    Article  Google Scholar 

  33. Koch MA, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, Ertl P, Waldmann H: Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proc Natl Acad Sci USA. 2005, 102: 17272-17277. 10.1073/pnas.0503647102.

    Article  CAS  Google Scholar 

  34. Kinoshita K, Kobayashi T, Asoh K, Furuichi N, Ito T, Kawada H, Hara S, Ohwada J, Hattori K, Miyagi T, et al: 9-substituted 6,6-dimethyl-11-oxo-6,11-dihydro-5H-benzo[b]carbazoles as highly selective and potent anaplastic lymphoma kinase inhibitors. J Med Chem. 2011, 54: 6286-6294. 10.1021/jm200652u.

    Article  CAS  Google Scholar 

  35. Schmidt S, Preu L, Lemcke T, Totzke F, Schachtele C, Kubbutat MH, Kunick C: Dual IGF-1R/SRC inhibitors based on a N'-aroyl-2-(1H-indol-3-yl)-2-oxoacetohydrazide structure. Eur J Med Chem. 2011, 46: 2759-2769. 10.1016/j.ejmech.2011.03.065.

    Article  CAS  Google Scholar 

  36. Crew AP, Bhagwat SV, Dong H, Bittner MA, Chan A, Chen X, Coate H, Cooke A, Gokhale PC, Honda A, et al: Imidazo[1,5-a]pyrazines: orally efficacious inhibitors of mTORC1 and mTORC2. Bioorg Med Chem Lett. 2011, 21: 2092-2097. 10.1016/j.bmcl.2011.01.139.

    Article  CAS  Google Scholar 

  37. Pevet I, Brule C, Tizot A, Gohier A, Cruzalegui F, Boutin JA, Goldstein S: Synthesis and pharmacological evaluation of thieno[2,3-b]pyridine derivatives as novel c-Src inhibitors. Bioorg Med Chem. 2011, 19: 2517-2528. 10.1016/j.bmc.2011.03.021.

    Article  CAS  Google Scholar 

  38. Guagnano V, Furet P, Spanka C, Bordas V, Le Douget M, Stamm C, Brueggen J, Jensen MR, Schnell C, Schmid H, et al: Discovery of 3-(2,6-dichloro-3,5-dimethoxy-phenyl)-1-{6-[4-(4-ethyl-piperazin-1-yl)-phenylamin o]-pyrimidin-4-yl}-1-methyl-urea (NVP-BGJ398), a potent and selective inhibitor of the fibroblast growth factor receptor family of receptor tyrosine kinase. J Med Chem. 2011, 54: 7066-7083. 10.1021/jm2006222.

    Article  CAS  Google Scholar 

  39. Kumar A, Ahmad I, Chhikara BS, Tiwari R, Mandal D, Parang K: Synthesis of 3-phenylpyrazolopyrimidine-1,2,3-triazole conjugates and evaluation of their Src kinase inhibitory and anticancer activities. Bioorg Med Chem Lett. 2011, 21: 1342-1346. 10.1016/j.bmcl.2011.01.047.

    Article  CAS  Google Scholar 

  40. Fang H, Tong W, Shi LM, Blair R, Perkins R, Branham W, Hass BS, Xie Q, Dial SL, Moland CL, Sheehan DM: Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chem Res Toxicol. 2001, 14: 280-294. 10.1021/tx000208y.

    Article  CAS  Google Scholar 

  41. Tong W, Xie Q, Hong H, Shi L, Fang H, Perkins R: Assessment of prediction confidence and domain extrapolation of two structure-activity relationship models for predicting estrogen receptor binding activity. Environ Health Perspect. 2004, 112: 1249-1254. 10.1289/ehp.7125.

    Article  CAS  Google Scholar 

  42. Jacobs MN: In silico tools to aid risk assessment of endocrine disrupting chemicals. Toxicology. 2004, 205: 43-53. 10.1016/j.tox.2004.06.036.

    Article  CAS  Google Scholar 

  43. Hu JY, Aizawa T: Quantitative structure-activity relationships for estrogen receptor binding affinity of phenolic chemicals. Water Res. 2003, 37: 1213-1222. 10.1016/S0043-1354(02)00378-0.

    Article  CAS  Google Scholar 

  44. Byvatov E, Fechner U, Sadowski J, Schneider G: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J Chem Inf Comput Sci. 2003, 43: 1882-1889. 10.1021/ci0341161.

    Article  CAS  Google Scholar 

  45. Doniger S, Hofman T, Yeh J: Predicting CNS Permeability of Drug Molecules:Comparison of Neural Network and Support Vector Machine Algorithms. J Comput Biol. 2002, 9: 849-864. 10.1089/10665270260518317.

    Article  CAS  Google Scholar 

  46. He L, Jurs PC, Custer LL, Durham SK, Pearl GM: Predicting the Genotoxicity of Polycyclic Aromatic Compounds from Molecular Structure with Different Classifiers. Chem Res Toxicol. 2003, 16: 1567-1580. 10.1021/tx030032a.

    Article  CAS  Google Scholar 

  47. Snyder RD, Pearl GS, Mandakas G, Choy WN, Goodsaid F, Rosenblum IY: Assessment of the sensitivity of the computational programs DEREK, TOPKAT, and MCASE in the prediction of the genotoxicity of pharmaceutical molecules. Environ Mol Mutagen. 2004, 43: 143-158. 10.1002/em.20013.

    Article  CAS  Google Scholar 

  48. Xue Y, Li ZR, Yap CW, Sun LZ, Chen X, Chen YZ: Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents. J Chem Inf Comput Sci. 2004, 44: 1630-1638. 10.1021/ci049869h.

    Article  CAS  Google Scholar 

  49. Yap CW, Cai CZ, Xue Y, Chen YZ: Prediction of torsade-causing potential of drugs by support vector machine approach. Toxicol Sci. 2004, 79: 170-177. 10.1093/toxsci/kfh082.

    Article  CAS  Google Scholar 

  50. Yap CW, Chen YZ: Quantitative Structure-Pharmacokinetic Relationships for drug distribution properties by using general regression neural network. J Pharm Sci. 2005, 94: 153-168. 10.1002/jps.20232.

    Article  CAS  Google Scholar 

  51. Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP, Pletnev IV: Drug discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions. J Chem Inf Comput Sci. 2003, 43: 2048-2056. 10.1021/ci0340916.

    Article  CAS  Google Scholar 

  52. Xue Y, Yap CW, Sun LZ, Cao ZW, Wang JF, Chen YZ: Prediction of P-glycoprotein substrates by a support vector machine approach. J Chem Inf Comput Sci. 2004, 44: 1497-1505. 10.1021/ci049971e.

    Article  CAS  Google Scholar 

  53. Todeschini R, Consonni V: Handbook of Molecular Descriptors. 2000, Weinheim: Wiley-VCH

    Book  Google Scholar 

  54. Miller KJ: Additive methods in molecular polarizability. J Am Chem Soc. 1990, 112: 8533-8542. 10.1021/ja00179a044.

    Article  CAS  Google Scholar 

  55. Schultz HP: Topological organic chemistry. 1. graph theory and topological indices of alkanes. J Chem Inf Comput Sci. 1989, 29: 227-228. 10.1021/ci00063a012.

    Article  CAS  Google Scholar 

  56. Hall LH, Kier LB: Electrotopological state indices for atom types: a novel combination of electronic, topological and valence state information. J Chem Inf Comput Sci. 1995, 35: 1039-1045. 10.1021/ci00028a014.

    Article  CAS  Google Scholar 

  57. Vapnik VN: The nature of statistical learning theory. 1995, New York: Springer

    Book  Google Scholar 

  58. Burges CJC: A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998, 2: 127-167.

    Article  Google Scholar 

  59. Pochet N, De Smet F, Suykens JA, De Moor BL: Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics. 2004, 20: 3185-3195. 10.1093/bioinformatics/bth383.

    Article  CAS  Google Scholar 

  60. Li F, Yang Y: Analysis of recursive gene selection approaches from microarray data. Bioinformatics. 2005, 21: 3741-3747. 10.1093/bioinformatics/bti618.

    Article  CAS  Google Scholar 

  61. Cui LYH J, Lin HH, Zhang HL, Tang ZQ, Zheng CJ, Cao ZW, Chen YZ: Prediction of MHC-Binding Peptides of Flexible Lengths from Sequence-Derived Structural and Physicochemical Properties. Mol Immunol. 2007, 44: 866-877. 10.1016/j.molimm.2006.04.001.

    Article  Google Scholar 

  62. Yap CW, Chen YZ: Prediction of cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. J Chem Inf Model. 2005, 45: 982-992. 10.1021/ci0500536.

    Article  CAS  Google Scholar 

  63. Grover II, Singh II, Bakshi II: Quantitative structure–property relationships in pharmaceutical research - Part 2. Pharm Sci Technol Today. 2000, 3: 50-57. 10.1016/S1461-5347(99)00215-1.

    Article  CAS  Google Scholar 

  64. Trotter MWB, Buxton BF, Holden SB: Support vector machines in combinatorial chemistry. Meas Control. 2001, 34: 235-239.

    Article  Google Scholar 

  65. Burbidge R, Trotter M, Buxton B, Holden S: Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput Chem. 2001, 26: 5-14. 10.1016/S0097-8485(01)00094-8.

    Article  CAS  Google Scholar 

  66. Czerminski R, Yasri A, Hartsough D: Use of support vector machine in pattern classification: Application to QSAR studies. Quant Struct-Act Rel. 2001, 20: 227-240. 10.1002/1521-3838(200110)20:3<227::AID-QSAR227>3.0.CO;2-Y.

    Article  CAS  Google Scholar 

  67. Matthews BW: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975, 405: 442-451. 10.1016/0005-2795(75)90109-9.

    Article  CAS  Google Scholar 

  68. Willett P: Chemical similarity searching. J Chem Inf Comput Sci. 1998, 38: 983-996. 10.1021/ci9800211.

    Article  CAS  Google Scholar 

  69. Bostrom J, Hogner A, Schmitt S: Do structurally similar ligands bind in a similar fashion?. J Med Chem. 2006, 49: 6716-6725. 10.1021/jm060167o.

    Article  Google Scholar 

  70. Johnson RA, Wichern DW: Applied multivariate statistical analysis. 1982, Englewood Cliffs, NJ: Prentice Hall

    Google Scholar 

  71. Specht DF: Probabilistic neural networks. Neural Netw. 1990, 3: 109-118. 10.1016/0893-6080(90)90049-Q.

    Article  Google Scholar 

  72. Parzen E: On estimation of a probability density function and mode. Ann Math Stat. 1962, 33: 1065-1076. 10.1214/aoms/1177704472.

    Article  Google Scholar 

  73. Cacoullos T: Estimation of a multivariate density. Ann I Stat Math. 1966, 18: 179-189. 10.1007/BF02869528.

    Article  Google Scholar 

  74. Chen B, Harrison RF, Papadatos G, Willett P, Wood DJ, Lewell XQ, Greenidge P, Stiefl N: Evaluation of machine-learning methods for ligand-based virtual screening. J Comput Aided Mol Des. 2007, 21: 53-62. 10.1007/s10822-006-9096-5.

    Article  Google Scholar 

  75. Liew CY, Ma XH, Liu X, Yap CW: SVM Model for Virtual Screening of Lck Inhibitors. J Chem Inf Model. 2009, 4: 877-885.

    Article  Google Scholar 

  76. Briem H, Gunther J: Classifying "kinase inhibitor-likeness" by using machine-learning methods. Chembiochem. 2005, 6: 558-566. 10.1002/cbic.200400109.

    Article  CAS  Google Scholar 

  77. Ma XH, Jia J, Zhu F, Xue Y, Li ZR, Chen YZ: Comparative analysis of machine learning methods in ligand based virtual screening of large compound libraries. Comb Chem High Throughput Screen. 2009, 12: 344-357. 10.2174/138620709788167944.

    Article  CAS  Google Scholar 

  78. Yamane S, Ishida S, Hanamoto Y, Kumagai K, Masuda R, Tanaka K, Shiobara N, Yamane N, Mori T, Juji T, et al: Proinflammatory role of amphiregulin, an epidermal growth factor family member whose expression is augmented in rheumatoid arthritis patients. J Inflamm (Lond). 2008, 5: 5-10.1186/1476-9255-5-5.

    Article  Google Scholar 

  79. Chiu YC, Fong YC, Lai CH, Hung CH, Hsu HC, Lee TS, Yang RS, Fu WM, Tang CH: Thrombin-induced IL-6 production in human synovial fibroblasts is mediated by PAR1, phospholipase C, protein kinase C alpha, c-Src, NF-kappa B and p300 pathway. Mol Immunol. 2008, 45: 1587-1599. 10.1016/j.molimm.2007.10.004.

    Article  CAS  Google Scholar 

  80. Paniagua RT, Sharpe O, Ho PP, Chan SM, Chang A, Higgins JP, Tomooka BH, Thomas FM, Song JJ, Goodman SB, et al: Selective tyrosine kinase inhibition by imatinib mesylate for the treatment of autoimmune arthritis. J Clin Invest. 2006, 116: 2633-2642.

    Article  CAS  Google Scholar 

  81. Carvalho JF, Blank M, Shoenfeld Y: Vascular endothelial growth factor (VEGF) in autoimmune diseases. J Clin Immunol. 2007, 27: 246-256. 10.1007/s10875-007-9083-1.

    Article  CAS  Google Scholar 

  82. Daouti S, Latario B, Nagulapalli S, Buxton F, Uziel-Fusi S, Chirn GW, Bodian D, Song C, Labow M, Lotz M, et al: Development of comprehensive functional genomic screens to identify novel mediators of osteoarthritis. Osteoarthritis Cartilage. 2005, 13: 508-518. 10.1016/j.joca.2005.02.003.

    Article  CAS  Google Scholar 

  83. Remmers EF, Sano H, Wilder RL: Platelet-derived growth factors and heparin-binding (fibroblast) growth factors in the synovial tissue pathology of rheumatoid arthritis. Semin Arthritis Rheum. 1991, 21: 191-199. 10.1016/0049-0172(91)90009-O.

    Article  CAS  Google Scholar 

  84. Meyn MA, Smithgall TE: Small molecule inhibitors of Lck: the search for specificity within a kinase family. Mini Rev Med Chem. 2008, 8: 628-637. 10.2174/138955708784534454.

    Article  CAS  Google Scholar 

  85. Rivera J, Olivera A: Src family kinases and lipid mediators in control of allergic inflammation. Immunol Rev. 2007, 217: 255-268. 10.1111/j.1600-065X.2007.00505.x.

    Article  CAS  Google Scholar 

  86. Lee JH, Kim JW, Ko NY, Mun SH, Kim do K, Kim JD, Won HS, Shin HS, Kim HS, Her E, et al: Mast cell-mediated allergic response is suppressed by Sophorae flos: inhibition of SRC-family kinase. Exp Biol Med (Maywood). 2008, 233: 1271-10.3181/0803-RM-89.

    Article  CAS  Google Scholar 

  87. Callera GE, Montezano AC, Yogi A, Tostes RC, He Y, Schiffrin EL, Touyz RM: c-Src-dependent nongenomic signaling responses to aldosterone are increased in vascular myocytes from spontaneously hypertensive rats. Hypertension. 2005, 46: 1032-1038. 10.1161/01.HYP.0000176588.51027.35.

    Article  CAS  Google Scholar 

  88. Metcalf CA, van Schravendijk MR, Dalgarno DC, Sawyer TK: Targeting protein kinases for bone disease: discovery and development of Src inhibitors. Curr Pharm Des. 2002, 8: 2049-2075. 10.2174/1381612023393323.

    Article  CAS  Google Scholar 

  89. Shakespeare WC, Wang Y, Bohacek R, Keenan T, Sundaramoorthi R, Metcalf C, Dilauro A, Roeloffzen S, Liu S, Saltmarsh J, et al: SAR of carbon-linked, 2-substituted purines: synthesis and characterization of AP23451 as a novel bone-targeted inhibitor of Src tyrosine kinase with in vivo anti-resorptive activity. Chem Biol Drug Des. 2008, 71: 97-105. 10.1111/j.1747-0285.2007.00615.x.

    Article  CAS  Google Scholar 

  90. Tsuruno S, Kawaguchi SY, Hirano T: Src-family protein tyrosine kinase negatively regulates cerebellar long-term depression. Neurosci Res. 2008, 61: 329-332. 10.1016/j.neures.2008.03.004.

    Article  CAS  Google Scholar 

  91. Vidal D, Thormann M, Pons M: A novel search engine for virtual screening of very large databases. J Chem Inf Model. 2006, 46: 836-843. 10.1021/ci050458q.

    Article  CAS  Google Scholar 

  92. Stiefl N, Zaliani A: A knowledge-based weighting approach to ligand-based virtual screening. J Chem Inf Model. 2006, 46: 587-596. 10.1021/ci050324c.

    Article  CAS  Google Scholar 

  93. Rella M, Rushworth CA, Guy JL, Turner AJ, Langer T, Jackson RM: Structure-based pharmacophore design and virtual screening for novel angiotensin converting enzyme 2 inhibitors. J Chem Inf Model. 2006, 46: 708-716. 10.1021/ci0503614.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported in part by grants from Singapore Academic Research Fund R-148-000-083-112, National Natural Science Foundation of China Grant 30772651, Ministry of Science and Technology, 863 Hi-Tech Program Grant 2006AA020400.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuyang Jiang or Yuzong Chen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

BC Han conceived study, implemented the methods and wrote the manuscript with assistance from XH Ma, RY Zhao, JX Zhang, XN Wei, XH Liu, X Liu and YZ Chen. The experiments were conducted by CL Zhang, CY Tan and YY Jiang. All co-authors participated in study's design, coordination and manuscript drafting. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Han, B., Ma, X., Zhao, R. et al. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries. Chemistry Central Journal 6, 139 (2012). https://doi.org/10.1186/1752-153X-6-139

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1752-153X-6-139

Keywords