SOMM4mC: a second-order Markov model for DNA N4-methylcytosine site prediction in six species
Publication in refereed journal

Times Cited
Altmetrics Information

Other information
AbstractMotivation: DNA N4-methylcytosine (4mC) modification is an important epigenetic modification in prokaryotic DNA due to its role in regulating DNA replication and protecting the host DNA against degradation. An efficient algorithm to identify 4mC sites is needed for downstream analyses.

Results: In this study we propose a new prediction method named SOMM4mC based on a second-order Markov model, which makes use of the transition probability between adjacent nucleotides to identify 4mC sites. The results show that the first-order and second-order Markov model are superior to the three existing algorithms in all six species (C. elegans, D. melanogaster, A. thaliana, E. coli, G. subterruneus, and G. pickeringii) where benchmark datasets are available. However, the classification performance of SOMM4mC is more outstanding than that of first-order Markov model. Especially, for E. coli and C. elegans, the overall accuracy of SOMM4mC are 91.8% and 87.6%, which are 8.5% and 6.1% higher than those of the latest method 4mcPred-SVM, respectively. This shows that more discriminant sequence information is captured by SOMM4mC through the dependency between adjacent nucleotides.

Availability: The web server of SOMM4mC is freely accessible at

Supplementary information: Supplementary data are available at Bioinformatics online.
Acceptance Date08/05/2020
All Author(s) ListYang J, Lang K, Zhang G, Fan X, Chen Y, Pian C
Journal nameBioinformatics
Volume Number36
Issue Number14
Pages4103 - 4105
LanguagesEnglish-United Kingdom

Last updated on 2021-20-10 at 23:53