rG4-seeker enables high confidence identification of novel rG4 motifs from rG4-seq experiment via platform-specific noise modeling
Refereed conference paper presented and published in conference proceedings



摘要Emergence of RNA-seq has revolutionized the studying and understanding of transcriptome [1,2], and empowered many high-throughput RNA structural and regulatory element mapping platforms [3,4]. However, subsequent bioinformatics search to retrieve elements-of-interest often pick up noise that impacts overall research interpretation and outcome. Nevertheless, noise is often considered normal consequences of biological variances and tolerated by conducting statistical tests with replicated experiments.

We have recently developed RNA G-quadruplex sequencing (rG4-seq) for transcriptome-wide mapping of RNA G-quadruplexes (rG4s) [5] by exploiting their intrinsic reverse transcriptase-stalling (RTS) properties [6]. RNA G-quadruplex secondary structures are proposed to play significant regulatory roles in transcriptional, post-transcriptional and translational processes [7,8]. In this study, we investigated the context of non-biological platform-specific noise in rG4-seq and demonstrated how noise modeling could improve both sensitivity and specificity of rG4 detection in replicate-independent manner.

Through in-depth re-analysis of HeLa rG4-seq datasets generated in our previous study [5], it was revealed that the RNA fragmentation process in rG4-seq chemistry is associated with a distinct distribution of background RTS signal, which contributed as the most significant source of noise. By modeling and thus eliminating the effect of noise in RTS measurements, an improved rG4 detection pipeline called rG4-seeker were formulated. In contrast to the original pipeline that achieved 12% FDR with a 4-replicate-combined analysis; the new implementation demonstrated significant improvements by enabling reliable single-replicate analysis at FDR <2% and recalling ~80% of rG4 motifs identified previously. Meanwhile, unrecalled rG4 motifs were found coincidentally mapped to transcript regions of significantly higher GC ratio, where RTS signals were likely compromised by sequencing bias [9,10] and rendered inconclusive rG4 detection outcomes. Furthermore, with rG4-seeker we identified hundreds of novel rG4 that nucleotide sequences do not match existing motif definitions, where candidates were experimentally validated. The information provided new insights in interpreting the nucleotide sequence rules governing rG4 formation.
著者Eugene Yui-Ching CHOW, Kaixin LYU, Chun Kit KWOK, Ting-Fung CHAN
會議名稱The 7th International Meeting on Quadruplex Nucleic Acids

上次更新時間 2020-15-05 於 12:07