stack-exchange-paired
收藏魔搭社区2026-05-15 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/stack-exchange-paired
下载链接
链接失效反馈官方服务:
资源简介:
# StackExchange Paired
This is a processed version of the [`HuggingFaceH4/stack-exchange-preferences`](https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences). The following steps were applied:
- Parse HTML to Markdown with `markdownify`
- Create pairs `(response_j, response_k)` where j was rated better than k
- Sample at most 10 pairs per question
- Shuffle the dataset globally
## 示例代码
```python
from modelscope import MsDataset
from modelscope.utils.constant import DownloadMode
ds = MsDataset.load('AI-ModelScope/stack-exchange-paired',subset_name='default', split='train', download_mode=DownloadMode.FORCE_REDOWNLOAD)
print(next(iter(ds)))
```
This dataset is designed to be used for preference learning. The processing notebook is in [the repository](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main) as well.
# StackExchange 配对数据集(StackExchange Paired)
本数据集是对 [`HuggingFaceH4/stack-exchange-preferences`](https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences) 数据集的加工版本。本次加工采用了以下步骤:
- 使用`markdownify`工具将HTML格式转换为Markdown格式
- 构建配对样本`(response_j, response_k)`,其中response_j的评分高于response_k
- 每个问题最多采样10组配对样本
- 对全数据集进行全局洗牌
## 示例代码
python
from modelscope import MsDataset
from modelscope.utils.constant import DownloadMode
ds = MsDataset.load('AI-ModelScope/stack-exchange-paired',subset_name='default', split='train', download_mode=DownloadMode.FORCE_REDOWNLOAD)
print(next(iter(ds)))
本数据集专为偏好学习(preference learning)任务设计,其加工Jupyter笔记本同样可在[该仓库](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main)中获取。
提供机构:
maas
创建时间:
2023-12-22



