全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM)

DOI: https://doi.org/10.3390/app9071504

Full-Text   Cite this paper   Add to My Lib

Abstract:

Recently, with the development of Speech to Text, which converts voice to text, and machine translation, technologies for simultaneously translating the captions of video into other languages have been developed. Using this, YouTube, a video-sharing site, provides captions in many languages. Currently, the automatic caption system extracts voice data when uploading a video and provides a subtitle file converted into text. This method creates subtitles suitable for the running time. However, when extracting subtitles from video using Speech to Text, it is impossible to accurately translate the sentence because all sentences are generated without periods. Since the generated subtitles are separated by time units rather than sentence units, and are translated, it is very difficult to understand the translation result as a whole. In this paper, we propose a method to divide text into sentences and generate period marks to improve the accuracy of automatic translation of English subtitles. For this study, we use the 27,826 sentence subtitles provided by Stanford University’s courses as data. Since this lecture video provides complete sentence caption data, it can be used as training data by transforming the subtitles into general YouTube-like caption data. We build a model with the training data using the LSTM-RNN (Long-Short Term Memory – Recurrent Neural Networks) and predict the position of the period mark, resulting in prediction accuracy of 70.84%. Our research will provide people with more accurate translations of subtitles. In addition, we expect that language barriers in online education will be more easily broken by achieving more accurate translations of numerous video lectures in English. View Full-Tex

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413