exlaw/tis-dpo-data
收藏Hugging Face2025-04-06 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/exlaw/tis-dpo-data
下载链接
链接失效反馈官方服务:
资源简介:
TIS-DPO数据集仓库包含了ICLR 2025论文中使用的实验数据集,用于验证针对直接偏好优化的token-level importance sampling方法。数据集包括:1. Anthropics HH数据集,专注于有助于和无害的AI响应,用于训练和评估AI的安全性和对齐;2. PKU-Safety数据集,来自北京大学的安全导向数据集,关注于AI的安全和负责任行为;3. TL-DR数据集,文本摘要数据集,包含长文本及其摘要对;4. Ultra-feedback数据集,包含对模型输出的反馈,用于偏好学习和优化。
The TIS-DPO Dataset Repository contains experimental datasets used in the ICLR 2025 paper, which are for validating the token-level importance sampling approach for direct preference optimization. The datasets included are: 1. Anthropic HH Dataset, focusing on helpful and harmless AI responses, used for training and evaluating AI safety and alignment; 2. PKU-Safety Dataset, a comprehensive safety-oriented dataset from Peking University, focusing on safe and responsible AI behavior; 3. TL-DR Dataset, a text summarization dataset containing pairs of long texts and their summaries; 4. Ultra-feedback Dataset, containing human feedback on model outputs, used for preference learning and optimization.
提供机构:
exlaw



