TarDict: A RandomForestClassifier based software predicts drug-target interaction using SMILES

Peter T. Habib, Alsamman M. Alsamman, Sameh E. Hassanein, Aladdin Hamwieh


The future of therapeutics depends on understanding the interaction between the chemical structure of the drug and the target protein that contributes to the etiology of the disease in order to improve drug discovery. Predicting the target of unknown drugs being investigated from already identified drug data is very important not only for understanding different processes of drug and molecular interactions but also for the development of new drugs. Using machine learning and published drug information we design an easy-to-use tool that predicts biological target proteins for medical drugs. TarDict is based on a chemical-simplified line-entry molecular input system called SMILES. It receives SMILES entries and returns a list of possible similar drugs as well as possible drug-targets. TarDict uses 20442 drug entries that have well-known biological targets to construct a prognostic computational model capable of predicting novel drug targets with an accuracy of 95%. We developed a machine learning approach to recommend target proteins to approved drug targets. We have shown that the proposed method is highly predictive on a testing dataset consisting of 4088 targets and 102 manually entered drugs. The proposed computational model is an efficient and cost-effective tool for drug target discovery and prioritization. Such novel tool could be used to enhance drug design, predict potential target and identify combination therapy crossroads.


RandomForestClassifier, SMILES, drug-target interaction, Python, pathway

Full Text:



Kapetanovic IM. Computer-aided drug discovery and develop-ment (CADDD): in silico-chemico-biological approach. Chemico-biological interactions. 2008 Jan 30;171(2):165-76.

Gowthaman R, Miller SA, Rogers S, Khowsathit J, Lan L, BaiN, Johnson DK, Liu C, Xu L, Anbanandam A, Aubé J. DARC:mapping surface topography by ray-casting for effective virtualscreening at protein interaction sites. Journal of medicinal chem-istry. 2016 May 12;59(9):4152-70.

Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M.Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008 Jul1;24(13):i232-40

Ma H, Zhao H. iFad: an integrative factor analysis model fordrug-pathway association inference. Bioinformatics. 2012 Jul15;28(14):1911-1918.

Pujol A, Mosca R, Farrés J, Aloy P. Unveiling the role of networkand systems biology in drug discovery. Trends in pharmacologi-cal sciences. 2010 Mar 1;31(3):115-23.

Yildirim MA, Goh KI, Cusick ME, Barabasi AL. Vidal Marc.Drug-target network. Nat Biotechnol. 2007;25(10):1119-26.

Masetic Z, Subasi A. Congestive heart failure detection usingrandom forest classifier. Computer methods and programs inbiomedicine. 2016 Jul 1;130:54-64.

Belgiu M, Drgu L. Random forest in remote sensing: A reviewof applications and future directions. ISPRS Journal of Pho-togrammetry and Remote Sensing. 2016 Apr 1;114:24-31.

Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-VallvéS, Pujadas G. Molecular fingerprint similarity search in virtualscreening. Methods. 2015 Jan 1;71:58-63.

Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction ofprotein folding class using global description of amino acid se-quence. Proceedings of the National Academy of Sciences. 1995Sep 12;92(19):8700-4.

Cheng F, Zhou Y, Li J, Li W, Liu G, Tang Y. Prediction of chemi-calprotein interactions: multitarget-QSAR versus computationalchemogenomic methods. Molecular BioSystems. 2012;8(9):2373-84.

Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR,Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N. DrugBank5.0: a major update to the DrugBank database for 2018. Nucleic acids research. 2018 Jan 4;46(D1):D1074-82.

Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M,Hutter F. Efficient and robust automated machine learning. In:Advances in neural information processing systems 2015 (pp.2962-2970).


  • There are currently no refbacks.

Copyright (c) 2021 Habib et al.,

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.


International Library of Science is a nonprofit publisher, innovator, the science supporting and knowledge organization
Copyright 2018-2020 All copyrights are reserved by International Library of Science