five

LC-STAR English-German Bilingual Aligned Phrasal lexicon

收藏
catalogue.elra.info2008-04-22 更新2025-03-26 收录
下载链接:
https://catalogue.elra.info/en-us/repository/browse/ELRA-S0248/
下载链接
链接失效反馈
官方服务:
资源简介:
The LC-STAR English-German Bilingual Aligned Phrasal lexicon was created within the scope of the LC-STAR project (IST 2001-32216) which was sponsored by the European Commission. It was designed for SST (Speech-to-Speech Translation).The lexicon comprises 10,733 phrases from the tourist domain. It is based on a list of short sentences obtained by translation from a US-English 10,518 phrase corpus. The total number of unique separate words is 8,782.The lexicon contains the following information: -US-English phrase (orthography), -its translation into German (orthography), and for each token in German a phrase provides the following: -orthography of a word, -part of speech, -lemma,-whether the phrase is idiomatic or not, -if a word is a foreign word. In this lexicon, foreign words were only tagged if they were written with foreign orthography (e.g. English characters). The lexicon is provided in XML format. The database is stored on 1 CD.

LC-STAR 英德双语对齐短语词汇表是在欧洲委员会资助的 LC-STAR 项目(IST 2001-32216)的框架内创建的,旨在支持语音到语音翻译(SST)。该词汇表由旅游领域的 10,733 个短语组成,基于从美国英语 10,518 个短语语料库通过翻译获得的短句列表。词汇表中包含的独特独立单词总数为 8,782。该词汇表包含以下信息:- 美国英语短语的正字法,-其德语翻译的正字法,以及对于德语中的每个标记,提供一个短语,包括:-单词的正字法,-词性,-词根,-短语是否为习语,-单词是否为外来词。在此词汇表中,只有当外来词以外来正字法书写时(例如,英语字符)才会进行标记。该词汇表以 XML 格式提供,数据库存储于一张 CD 上。
提供机构:
ELRA Catalogue of Language Resources
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作