DiscoX
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ByteDance-Seed/DiscoX
下载链接
链接失效反馈官方服务:
资源简介:
# DiscoX Translation Benchmark
DiscoX is a benchmark for the evaluation of LLMs on discourse- and expert-level translation tasks.
## Dataset At A Glance
- **Languages**: English ⇄ Chinese (100 English→Chinese tasks, 100 Chinese→English tasks)
- **Total samples**: 200 discourse- and exprt-level translation items
- **Average passage length**: ~1.7k characters (min 0.73k, max 3.04k)
- **Meta fields**: primary & secondary domain labels, structured rubrics, prompt IDs,etc
- **Reference Rubrics**: every task ships with multiple rubrics annotated by experts, capturing key points for evaluating translation quality
Primary domain coverage:
| Primary Domain | Samples | Share |
| --- | --- | --- |
| 学术论文 (Academic papers) | 121 | 60.5% |
| 非学术论文 (Non-Academic tasks) | 79 | 39.5% |
Secondary domain highlights include Social Scienices(社会科学),Natural Sciences(自然科学),Humanities(人文科学),Applied Disciplines(应用学科),News&Information(新闻资讯),Domain-Specific Scenarios(垂类场景) and Literature&Arts(文学艺术).
## File Structure
- `discox.json`: the core dataset. Each record contains
- `ori_text`: the source text to be translated
- `prompt`: text adding translation instructions
- `reference_list`: rubrics designed for evaluating translation results
- `Primary_Domain`, `Secondary_Domain`: high-level topic labels
- `prompt_id`, `__internal_uuid__`: identifiers for specific tasks
## Notes & Recommendations
- The reference_list entries are designed to enable targeted verification of translation fidelity: by converting them into structured checks (e.g., terminology, tone, and named entities), the evaluation performs fine-grained, pointwise assessments of key translation aspects.
- Translation instruction in pormpt describe desired output language in Chinese.
## License
Our data is under cc-by-4.0 license.
# DiscoX 翻译基准测试(DiscoX Translation Benchmark)
DiscoX是一款用于评估大语言模型(LLM)在语篇级与专家级翻译任务上表现的基准测试集。
## 数据集概览
- **语言覆盖**:英语 ⇄ 汉语(含100条英译汉任务、100条汉译英任务)
- **总样本量**:200条语篇级与专家级翻译条目
- **平均篇章长度**:约1.7k字符(最短0.73k字符,最长3.04k字符)
- **元数据字段**:主、辅领域标签,结构化评分准则,提示词ID等
- **参考评分准则**:每项任务均附带多名专家标注的多套评分准则,涵盖翻译质量评估的核心维度
### 主要领域覆盖范围
| 主要领域 | 样本量 | 占比 |
| --- | --- | --- |
| 学术论文(Academic papers) | 121 | 60.5% |
| 非学术任务(Non-Academic tasks) | 79 | 39.5% |
辅助领域涵盖社会科学(Social Sciences)、自然科学(Natural Sciences)、人文科学(Humanities)、应用学科(Applied Disciplines)、新闻资讯(News&Information)、垂类场景(Domain-Specific Scenarios)以及文学艺术(Literature&Arts)。
## 文件结构
- `discox.json`:核心数据集文件。每条记录包含以下字段:
- `ori_text`:待翻译的源文本
- `prompt`:附带翻译说明的提示词
- `reference_list`:用于评估翻译结果的评分准则列表
- `Primary_Domain`、`Secondary_Domain`:高级主题标签
- `prompt_id`、`__internal_uuid__`:对应特定任务的标识符
## 注意事项与建议
- 参考评分准则条目旨在实现翻译保真度的定向核验:将其转化为结构化核查维度(如术语、语气、命名实体等)后,可针对翻译质量的核心维度开展细粒度、逐项评估。
- 提示词中的翻译指令会明确指定目标输出语言为汉语。
## 许可证
本数据集采用CC BY 4.0许可证发布。
提供机构:
maas
创建时间:
2025-11-18



