un_ga
收藏魔搭社区2025-10-09 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Helsinki-NLP/un_ga
下载链接
链接失效反馈官方服务:
资源简介:
Deprecated: Dataset "un_ga" is deprecated due to the the unavailability of its source data. It has been superseded by the official United Nations Parallel Corpus, which is recommended for use in its place: un_pc
# Dataset Card for [Dataset Name]
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://opus.nlpl.eu/legacy/UN.php
- **Repository:** [More Information Needed]
- **Paper:** https://www.researchgate.net/publication/228579662_United_nations_general_assembly_resolutions_A_six-language_parallel_corpus
- **Leaderboard:** [More Information Needed]
- **Point of Contact:** [More Information Needed]
### Dataset Summary
This is a collection of translated documents from the United Nations originally compiled into a translation memory by Alexandre Rafalovitch, Robert Dale (see http://uncorpora.org).
- Deprecated homepage URL: http://opus.nlpl.eu/UN.php
- Legacy homepage URL: https://opus.nlpl.eu/legacy/UN.php
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
@inproceedings{title = "United Nations General Assembly Resolutions: a six-language parallel corpus",
abstract = "In this paper we describe a six-ways parallel public-domain corpus consisting of 2100 United Nations General Assembly Resolutions with translations in the six official languages of the United Nations, with an average of around 3 million tokens per language. The corpus is available in a preprocessed, formatting-normalized TMX format with paragraphs aligned across multiple languages. We describe the background to the corpus and its content, the process of its construction, and some of its interesting properties.",
author = "Alexandre Rafalovitch and Robert Dale",
year = "2009",
language = "English",
booktitle = "MT Summit XII proceedings",
publisher = "International Association of Machine Translation",
}
### Contributions
Thanks to [@param087](https://github.com/param087) for adding this dataset.
已弃用:数据集“un_ga”因源数据不可用已被弃用,其替代版本为官方联合国平行语料库(United Nations Parallel Corpus),推荐使用该替代数据集un_pc。
# 数据集卡片(Dataset Card):un_ga
## 目录(Table of Contents)
- [数据集描述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [支持任务与评测榜单](#supported-tasks-and-leaderboards)
- [语言范围](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [数据集构建依据](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差分析](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献声明](#contributions)
## 数据集描述
- **主页(Homepage):** https://opus.nlpl.eu/legacy/UN.php
- **仓库(Repository):** [需补充更多信息]
- **论文(Paper):** https://www.researchgate.net/publication/228579662_United_nations_general_assembly_resolutions_A_six-language_parallel_corpus
- **评测榜单(Leaderboard):** [需补充更多信息]
- **联系方式(Point of Contact):** [需补充更多信息]
### 数据集摘要
本数据集由Alexandre Rafalovitch与Robert Dale最初整理为翻译记忆库,收录来自联合国的译制文档(详见http://uncorpora.org)。
- 已弃用的主页链接:http://opus.nlpl.eu/UN.php
- 遗留主页链接:https://opus.nlpl.eu/legacy/UN.php
### 支持任务与评测榜单
[需补充更多信息]
### 语言范围
[需补充更多信息]
## 数据集结构
### 数据实例
[需补充更多信息]
### 数据字段
[需补充更多信息]
### 数据划分
[需补充更多信息]
## 数据集构建
### 数据集构建依据
[需补充更多信息]
### 源数据
#### 初始数据采集与归一化
[需补充更多信息]
#### 源语言文本的创作者是谁?
[需补充更多信息]
### 标注信息
#### 标注流程
[需补充更多信息]
#### 标注者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差分析
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[需补充更多信息]
### 许可信息
[需补充更多信息]
### 引用信息
bibtex
@inproceedings{
title = "联合国大会决议:六语平行语料库",
abstract = "本文介绍了一个六向并行公有领域语料库,包含2100份联合国大会决议,涵盖联合国六种官方语言的译版,每种语言平均包含约300万Token。该语料库已预处理为格式归一化的TMX格式,支持多语言段落对齐。本文阐述了该语料库的背景、内容、构建流程以及部分有趣的特性。",
author = "Alexandre Rafalovitch 与 Robert Dale",
year = "2009",
language = "English",
booktitle = "第12届机器翻译峰会论文集(MT Summit XII proceedings)",
publisher = "国际机器翻译协会(International Association of Machine Translation)",
}
### 贡献声明
感谢[@param087](https://github.com/param087) 为本数据集的添加工作。
提供机构:
maas
创建时间:
2025-08-16



