personalized_passkey_retrieval
收藏魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/intfloat/personalized_passkey_retrieval
下载链接
链接失效反馈官方服务:
资源简介:
### Dataset Summary
This dataset contains the data for personalized passkey retrieval task in the paper [Improving Text Embeddings with Large Language Models](https://arxiv.org/pdf/2401.00368.pdf).
### Data Fields
- `query`: a `string` feature.
- `candidates`: List of `string` feature, 100 candidates for each query.
- `label`: a `int32` feature, the index of the correct candidate in the candidates list, always 0.
- `context_length`: a `int32` feature, the approximate length for the candidate documents.
### How to use this dataset
You can load the dataset in your python code as follows:
```python
from datasets import load_dataset
dataset = load_dataset("intfloat/personalized_passkey_retrieval")
```
The data in this repo is generated by the script [generate_passkey_data.py](https://huggingface.co/datasets/intfloat/personalized_passkey_retrieval/blob/main/generate_passkey_data.py).
You can also tweak the script to generate your own data.
### Citation Information
If you use this dataset in your research, please cite this paper:
```
@inproceedings{Wang2023ImprovingTE,
title={Improving Text Embeddings with Large Language Models},
author={Liang Wang and Nan Yang and Xiaolong Huang and Linjun Yang and Rangan Majumder and Furu Wei},
year={2023},
}
```
### 数据集概述
本数据集对应论文《基于大语言模型优化文本嵌入(Improving Text Embeddings with Large Language Models)》(可通过https://arxiv.org/pdf/2401.00368.pdf获取)中提出的个性化密钥检索任务所需数据。
### 数据字段
- `query`:字符串类型特征。
- `candidates`:字符串列表类型特征,每个查询对应100个候选样本。
- `label`:int32类型特征,代表正确候选样本在候选列表中的索引,本数据集内该值恒为0。
- `context_length`:int32类型特征,代表候选文档的近似长度。
### 数据集使用方法
您可通过以下Python代码加载本数据集:
python
from datasets import load_dataset
dataset = load_dataset("intfloat/personalized_passkey_retrieval")
本仓库中的数据集由脚本`generate_passkey_data.py`(可通过https://huggingface.co/datasets/intfloat/personalized_passkey_retrieval/blob/main/generate_passkey_data.py获取)生成,您也可以修改该脚本以自定义生成数据集。
### 引用信息
若您在研究中使用本数据集,请引用如下论文:
@inproceedings{Wang2023ImprovingTE,
title={Improving Text Embeddings with Large Language Models},
author={Liang Wang and Nan Yang and Xiaolong Huang and Linjun Yang and Rangan Majumder and Furu Wei},
year={2023},
}
提供机构:
maas
创建时间:
2025-02-12



