agentlans/en-translations
收藏Hugging Face2024-12-14 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/agentlans/en-translations
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多样化的英语与多种其他语言的平行句子集合,来源于多个高质量数据集。每个句子对包括使用LaBSE模型计算的语义相似度分数,以及额外的质量指标。数据集支持机器翻译、跨语言语义相似性、多语言自然语言理解和翻译质量估计等任务。数据集结构包括每个实例的英语句子、非英语句子、语义相似度分数、内容质量分数、可读性分数和情感分数。数据集分为训练集和验证集,分别占总数据集的90%和10%。数据集的创建过程包括从多个高质量数据集中下载句子,并确保语言表示的多样性。语义相似度分数使用LaBSE模型计算,其他质量指标使用特定模型进行注释。
This dataset is a collection of multilingual parallel sentences, including English and various other languages. Each sentence pair is accompanied by a semantic similarity score and quality metrics, supporting tasks such as machine translation, cross-lingual semantic similarity, multilingual natural language understanding, and translation quality estimation. The dataset structure includes English sentences, non-English sentences, semantic similarity scores, translation quality scores, readability scores, and sentiment scores. The dataset is divided into training and validation sets, ensuring a variety of linguistic representations. Semantic similarity scores are calculated using the LaBSE model, while other metrics like quality, readability, and sentiment are annotated using specific models.
提供机构:
agentlans



