BMC3C: binning metagenomic contigs using codon usage, sequence composition and read coverage
Publication in refereed journal

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要Motivation: Metagenomics investigates the DNA sequences directly recovered from environmental samples. It often starts with reads assembly, which leads to contigs rather than more complete genomes. Therefore, contig binning methods are subsequently used to bin contigs into genome bins. While some clustering-based binning methods have been developed, they generally suffer from problems related to stability and robustness.
Results: We introduce BMC3C, an ensemble clustering-based method, to accurately and robustly bin contigs by making use of DNA sequence Composition, Coverage across multiple samples and Codon usage. BMC3C begins by searching the proper number of clusters and repeatedly applying the k-means clustering with different initializations to cluster contigs. Next, a weight graph with each node representing a contig is derived from these clusters. If two contigs are frequently grouped into the same cluster, the weight between them is high, and otherwise low. BMC3C finally employs a graph partitioning technique to partition the weight graph into subgraphs, each corresponding to a genome bin. We conduct experiments on both simulated and real-world datasets to evaluate BMC3C, and compare it with the state-of-the-art binning tools. We show that BMC3C has an improved performance compared to these tools. To our knowledge, this is the first time that the codon usage features and ensemble clustering are used in metagenomic contig binning.
出版社接受日期26.06.2018
著者Yu G., Jiang Y., Wang J., Zhang H., Luo H.
期刊名稱Bioinformatics
出版年份2018
月份12
日期15
卷號34
期次24
頁次4172 - 4179
國際標準期刊號1367-4803
語言英式英語

上次更新時間 2020-16-09 於 02:20