Speech retrieval with video parsing for television news programs
Refereed conference paper presented and published in conference proceedings


全文

引用次數

其它資訊
摘要We have been working on speech retrieval from Chinese (Cantonese) television news programs. The use of automatic speech recognition for audio indexing produces imperfect transcriptions, and recognition errors affect retrieval performance. A news story typically contains a brief report by the anchor person(s) in the studio, as well as news footage from the field. Investigation shows that our recognizer performs better when indexing audio from the studio, compared to that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our study is based on 146 news stories collected from local television Cantonese news programs. We formulated a known-item retrieval task and adopted the average inverse rank (AIR) as our evaluation metric. Retrieval is performed based on :syllable bigramu nits, augmented with skipped syllable bigrams. Retrieval using the entire audio track ofe ach news story gave AIR=0.759. With the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.768.
著者Meng HM, Tang X, Hui PY, Gao XB, Li YC
會議名稱IEEE International Conference on Acoustics, Speech, and Signal Processing
會議開始日07.05.2001
會議完結日11.05.2001
會議地點SALT LAKE CITY
會議國家/地區美國
出版年份2001
月份1
日期1
出版社IEEE
頁次1401 - 1404
國際標準書號0-7803-7041-4
國際標準期刊號1520-6149
語言英式英語
Web of Science 學科類別Acoustics; Computer Science; Computer Science, Artificial Intelligence; Computer Science, Theory & Methods; Engineering; Engineering, Electrical & Electronic

上次更新時間 2020-30-09 於 00:12