全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

PARADOCS: A Language Independant Go-Between for Mating Parallel Documents PARADOCS : l'entremetteur de documents parallèles indépendant de la langue

Keywords: Parallel corpora , Information Retrieval , Machine Translation

Full-Text   Cite this paper   Add to My Lib

Abstract:

Parallel corpora are the bread and butter of a number of machine translation tech- nologies. Therefore, important efforts are regularly spent in acquiring new ones. This task often involves a rather cumbersome manual inspection and it is rather difficult to set up a strategy that fits all the needs. We thus developed PARADOCS, a system aiming at doing this automatically. Our solution exploits numerical entities in documents in order to pair them. A classifier trained to recognize parallel text coupled to an information retrieval engine controlling the search space of candidate pairs are the main components of our approach. We tested PARADOCS on a number of tasks involving numerous pairs of languages and report good results.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133