five

stack-exchange-paired

收藏
魔搭社区2026-05-15 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/stack-exchange-paired
下载链接
链接失效反馈
官方服务:
资源简介:
# StackExchange Paired This is a processed version of the [`HuggingFaceH4/stack-exchange-preferences`](https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences). The following steps were applied: - Parse HTML to Markdown with `markdownify` - Create pairs `(response_j, response_k)` where j was rated better than k - Sample at most 10 pairs per question - Shuffle the dataset globally ## 示例代码 ```python from modelscope import MsDataset from modelscope.utils.constant import DownloadMode ds = MsDataset.load('AI-ModelScope/stack-exchange-paired',subset_name='default', split='train', download_mode=DownloadMode.FORCE_REDOWNLOAD) print(next(iter(ds))) ``` This dataset is designed to be used for preference learning. The processing notebook is in [the repository](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main) as well.

# StackExchange 配对数据集(StackExchange Paired) 本数据集是对 [`HuggingFaceH4/stack-exchange-preferences`](https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences) 数据集的加工版本。本次加工采用了以下步骤: - 使用`markdownify`工具将HTML格式转换为Markdown格式 - 构建配对样本`(response_j, response_k)`,其中response_j的评分高于response_k - 每个问题最多采样10组配对样本 - 对全数据集进行全局洗牌 ## 示例代码 python from modelscope import MsDataset from modelscope.utils.constant import DownloadMode ds = MsDataset.load('AI-ModelScope/stack-exchange-paired',subset_name='default', split='train', download_mode=DownloadMode.FORCE_REDOWNLOAD) print(next(iter(ds))) 本数据集专为偏好学习(preference learning)任务设计,其加工Jupyter笔记本同样可在[该仓库](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main)中获取。
提供机构:
maas
创建时间:
2023-12-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作