rG4-seeker enables high confidence identification of novel rG4 motifs from rG4-seq experiment via platform-specific noise modeling
Refereed conference paper presented and published in conference proceedings

Full Text

Other information
AbstractEmergence of RNA-seq has revolutionized the studying and understanding of transcriptome [1,2], and empowered many high-throughput RNA structural and regulatory element mapping platforms [3,4]. However, subsequent bioinformatics search to retrieve elements-of-interest often pick up noise that impacts overall research interpretation and outcome. Nevertheless, noise is often considered normal consequences of biological variances and tolerated by conducting statistical tests with replicated experiments.

We have recently developed RNA G-quadruplex sequencing (rG4-seq) for transcriptome-wide mapping of RNA G-quadruplexes (rG4s) [5] by exploiting their intrinsic reverse transcriptase-stalling (RTS) properties [6]. RNA G-quadruplex secondary structures are proposed to play significant regulatory roles in transcriptional, post-transcriptional and translational processes [7,8]. In this study, we investigated the context of non-biological platform-specific noise in rG4-seq and demonstrated how noise modeling could improve both sensitivity and specificity of rG4 detection in replicate-independent manner.

Through in-depth re-analysis of HeLa rG4-seq datasets generated in our previous study [5], it was revealed that the RNA fragmentation process in rG4-seq chemistry is associated with a distinct distribution of background RTS signal, which contributed as the most significant source of noise. By modeling and thus eliminating the effect of noise in RTS measurements, an improved rG4 detection pipeline called rG4-seeker were formulated. In contrast to the original pipeline that achieved 12% FDR with a 4-replicate-combined analysis; the new implementation demonstrated significant improvements by enabling reliable single-replicate analysis at FDR <2% and recalling ~80% of rG4 motifs identified previously. Meanwhile, unrecalled rG4 motifs were found coincidentally mapped to transcript regions of significantly higher GC ratio, where RTS signals were likely compromised by sequencing bias [9,10] and rendered inconclusive rG4 detection outcomes. Furthermore, with rG4-seeker we identified hundreds of novel rG4 that nucleotide sequences do not match existing motif definitions, where candidates were experimentally validated. The information provided new insights in interpreting the nucleotide sequence rules governing rG4 formation.
All Author(s) ListEugene Yui-Ching CHOW, Kaixin LYU, Chun Kit KWOK, Ting-Fung CHAN
Name of ConferenceThe 7th International Meeting on Quadruplex Nucleic Acids
Start Date of Conference06/09/2019
End Date of Conference09/09/2019
Place of ConferenceChangchun
Country/Region of ConferenceChina
LanguagesEnglish-United States

Last updated on 2020-15-05 at 12:07