example-retrieval-reranking-dataset
收藏数据集概述
基本信息
- 数据集名称: example-retrieval-reranking-dataset
- 来源: 使用distilabel生成
- 标签: synthetic, distilabel, rlaif
- 规模: 小于1K样本
配置信息
generate_reranking_pairs配置
- 特征:
- filename: 字符串类型
- anchor: 字符串类型
- repo_name: 字符串类型
- positive: 字符串类型
- negative: 字符串类型
- distilabel_metadata: 结构体,包含原始输入、原始输出和统计信息
- model_name: 字符串类型
- 数据分割:
- train: 20个样本,52,066字节
- 下载大小: 38,610字节
- 数据集大小: 52,066字节
generate_retrieval_pairs配置
- 特征:
- filename: 字符串类型
- anchor: 字符串类型
- repo_name: 字符串类型
- positive: 字符串类型
- negative: 字符串类型
- distilabel_metadata: 结构体,包含原始输入、原始输出和统计信息
- model_name: 字符串类型
- 数据分割:
- train: 20个样本,48,838字节
- 下载大小: 31,375字节
- 数据集大小: 48,838字节
数据示例结构
generate_reranking_pairs配置示例
json { "anchor": "description: Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency...", "distilabel_metadata": { "raw_input_generate_reranking_pairs": [...], "raw_output_generate_reranking_pairs": "## Positive Argilla serves as a collaborative tool...
Negative
The pizza is a delicious dish...", "statistics_generate_reranking_pairs": { "input_tokens": 200, "output_tokens": 55 } }, "filename": "argilla-python/docs/index.md", "model_name": "gpt-4o-mini", "negative": "The pizza is a delicious dish that many people enjoy...", "positive": "Argilla serves as a collaborative tool designed for AI engineers...", "repo_name": "argilla-io/argilla-python" }
generate_retrieval_pairs配置示例
json { "anchor": "description: Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency...", "distilabel_metadata": { "raw_input_generate_retrieval_pairs": [...], "raw_output_generate_retrieval_pairs": "## Positive What makes Argilla a collaboration platform...
Negative
Is Argilla a collaboration program for data scientists...", "statistics_generate_retrieval_pairs": { "input_tokens": 253, "output_tokens": 60 } }, "filename": "argilla-python/docs/index.md", "model_name": "gpt-4o-mini", "negative": "Is Argilla a collaboration program for data scientists and domain experts that seek low-quality inputs...", "positive": "What makes Argilla a collaboration platform for AI engineers and domain experts focused on high-quality outputs...", "repo_name": "argilla-io/argilla-python" }
加载方式
python from datasets import load_dataset
加载generate_reranking_pairs配置
ds_reranking = load_dataset("Chandan683/example-retrieval-reranking-dataset", "generate_reranking_pairs")
加载generate_retrieval_pairs配置
ds_retrieval = load_dataset("Chandan683/example-retrieval-reranking-dataset", "generate_retrieval_pairs")




