An International Publisher for Academic and Scientific Journals
Author Login 
Scholars Academic Journal of Biosciences | Volume-7 | Issue-03
Prediction of SNP Pathogenic Site Based on Naïve Bayes Method
Xue Wu, Baoguang Tian
Published: March 14, 2019 | 263 166
DOI: 10.36347/sajb.2019.v07i03.001
Pages: 105-116
Downloads
Abstract
SNP site is an important basic variation data, which has the characteristics of large amount of data and uniform distribution. It is widely used in complex disease research, and the data mining of SNP pathogenic site by machine learning method has become research focus in field of bioinformatics. In this paper, we present a new SNP pathogenic site prediction method based on the naïve Bayes. First, we select 1000 samples of all the SNP sites (9445 sites) information on a chromosome fragment, and the base (A, T, C, G) of each SNP site has three manifestations, which are converted into 0,1,2 numerical codes. Secondly, 447 possible SNP pathogenic sites and one abnormal SNP site are selected by chi-square test according to the encoded information and information of those samples with genetic disease. Finally, the naïve Bayes model is established on 1000 samples to predict SNP pathogenic site. Five-fold cross validation indicates our method achieves superior performance with an ACC of 84.64% and MCC of 0.6937, respectively. Compared with those of other machine learning methods, the results show that the prediction performance of naïve Bayes model is better than that of K-nearest neighbor (KNN), AdaBoost, support vector machine (SVM) and random forest (RF) model.