Predicting Approximate Protein-DNA Binding Cores Using Association Rule Mining
Refereed conference paper presented and published in conference proceedings


引用次數
替代計量分析
.

其它資訊
摘要The studies of protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are important bioinformatics topics. High-resolution (length<10) TF-TFBS binding cores are discovered by expensive and time-consuming 3D structure experiments. Recent association rule mining approaches on low-resolution binding sequences (TF length>490) are shown promising in identifying accurate binding cores without using any 3D structures. While the current association rule mining method on this problem addresses exact sequences only, the most recent ad hoc method for approximation does not establish any formal model and is limited by experimentally known patterns. As biological mutations are common, it is desirable to formally extend the exact model into an approximate one. In this paper, we formalize the problem of mining approximate protein-DNA association rules from sequence data and propose a novel efficient algorithm to predict protein-DNA binding cores. Our two-phase algorithm first constructs two compact intermediate structures called frequent sequence tree (FS-Tree) and frequent sequence class tree (FSC-Tree). Approximate association rules are efficiently generated from the structures and bioinformatics concepts (position weight matrix and information content) are further employed to prune meaningless rules. Experimental results on real data show the performance and applicability of the proposed algorithm.
著者Wong PY, Chan TM, Wong MH, Leung KS
會議名稱28th IEEE International Conference on Data Engineering (ICDE)
會議開始日01.04.2012
會議完結日05.04.2012
會議地點Washington
會議國家/地區美國
詳細描述IEEE
出版年份2012
月份1
日期1
出版社IEEE
頁次965 - 976
電子國際標準書號*****************
國際標準期刊號1084-4627
語言英式英語
Web of Science 學科類別Computer Science; Computer Science, Theory & Methods; Engineering; Engineering, Electrical & Electronic

上次更新時間 2020-17-10 於 00:57