Flexible K-mers with Variable-Length Indels for Identifying Binding Sequences of Protein Dimers
Publication in refereed journal
已正式接受出版

替代計量分析
.

其它資訊
摘要Many DNA-binding proteins interact with partner proteins. Recently, based on the high-throughput consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) method, many such protein pairs have been found to bind DNA with flexible spacing between their individual binding motifs. Most existing motif representations were not designed to capture such flexibly spaced regions. In order to computationally discover more co-binding events without prior knowledge about the identities of the co-binding proteins, a new representation is needed. We propose a new class of sequence patterns that flexibly model such variable regions and corresponding algorithms that identify co-bound sequences using these patterns. Based on both simulated and CAP-SELEX data, features derived from our sequence patterns lead to better classification performance than patterns that do not explicitly model the variable regions. We also show that even for standard ChIP-seq data, this new class of sequence patterns can help discover co-bound events in a subset of sequences in an unsupervised manner. The open-source software is available at https://github.com/kevingroup/glk-SVM.
出版社接受日期05.11.2019
著者Chenyang Hong, Kevin Y Yip
期刊名稱Briefings in Bioinformatics
出版年份2019
國際標準期刊號1467-5463
電子國際標準期刊號1477-4054
語言美式英語

上次更新時間 2020-22-10 於 23:04