Arena-Write

Name: Arena-Write
Creator: maas
Published: 2025-11-26 20:48:13
License: 暂无描述

魔搭社区2025-11-26 更新2025-07-19 收录

下载链接：

https://modelscope.cn/datasets/THU-KEG/Arena-Write

下载链接

链接失效反馈

官方服务：

资源简介：

## 📚 Arena-Write Dataset Arena-Write is a small-scale benchmark of **100 user writing tasks**, designed to evaluate long-form generation models in realistic scenarios. Each task covers diverse formats such as social posts, essays, and reports, with many requiring outputs over 2,000 words. Project page: https://huggingface.co/THU-KEG/ ### 📄 Data Format Each data sample is a JSON object with the following fields: ```json { "idx": 1, "question": "Write a social media post about Lei Feng spirit, within 200 characters.", "type": "Community Forum", "length": 200, "baseline_response": "" } ``` - `question`: A real-world user writing prompt - `type`: Scenario tag (e.g., Community Forum, Essay) - `length`: Expected output length - `baseline_response`: Outputs from **six** strong base models (e.g., GPT-4o, DeepSeek-R1, etc.) > Each task is answered by several base models to support pairwise comparison during evaluation. ### 🧪 Evaluation Protocol - **Pairwise Comparison**: Model outputs are compared against baseline responses using LLMs judges. Each pair is evaluated twice with flipped order to reduce position bias. - **Elo Scoring**: Results are aggregated into Elo scores to track model performance. ### 📖 Citation If you use this dataset, please cite: ```bibtex @misc{wu2025longwriterzeromasteringultralongtext, title={LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning}, author={Yuhao Wu and Yushi Bai and Zhiqiang Hu and Roy Ka-Wei Lee and Juanzi Li}, year={2025}, eprint={2506.18841}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.18841}, }

# 📚 Arena-Write 数据集 Arena-Write 是一个包含100个用户写作任务的小规模基准测试集，旨在于真实应用场景中评估长文本生成模型。每个任务涵盖社交媒体帖文、议论文、报告等多样格式，其中多数任务要求生成超过2000词的输出内容。项目主页：https://huggingface.co/THU-KEG/ ## 📄 数据格式每个数据样本为包含以下字段的JSON对象： json { "idx": 1, "question": "请撰写一篇关于雷锋精神的社交媒体帖文，字数控制在200字以内。", "type": "社区论坛", "length": 200, "baseline_response": "" } - `question`：真实的用户写作提示词 - `type`：场景标签（例如社区论坛、议论文） - `length`：预期输出字数 - `baseline_response`：6个顶尖基础模型的生成输出（例如GPT-4o、DeepSeek-R1等） > 每个任务均由多个基础模型生成输出，以支撑评估过程中的两两对比。 ## 🧪 评估协议 - **两两对比**：以大语言模型（LLM）作为评判者，将待评估模型的输出与基准响应进行对比。为降低位置偏差，每一组对比均会以颠倒顺序的方式执行两次评估。 - **Elo评分**：将评估结果汇总为Elo评分，以追踪模型的性能表现。 ## 📖 引用若您使用本数据集，请引用以下文献： bibtex @misc{wu2025longwriterzeromasteringultralongtext, title={LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning}, author={Yuhao Wu and Yushi Bai and Zhiqiang Hu and Roy Ka-Wei Lee and Juanzi Li}, year={2025}, eprint={2506.18841}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.18841}, }

提供机构：

maas

创建时间：

2025-07-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集