five

jhu-clsp/news21-instructions-mteb

收藏
Hugging Face2024-11-05 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jhu-clsp/news21-instructions-mteb
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: corpus data_files: - path: corpus/corpus-* split: corpus - config_name: queries data_files: - path: queries/queries-* split: queries - config_name: instruction data_files: - path: instruction/instruction-* split: instruction - config_name: default data_files: - path: data/default-* split: test - config_name: qrel_diff data_files: - path: qrel_diff/qrel_diff-* split: qrel_diff - config_name: top_ranked data_files: - path: top_ranked/top_ranked-* split: top_ranked dataset_info: - config_name: corpus features: - dtype: string name: _id - dtype: string name: title - dtype: string name: text splits: - name: corpus num_examples: 30921 - config_name: queries features: - dtype: string name: _id - dtype: string name: text splits: - name: queries num_examples: 64 - config_name: instruction features: - dtype: string name: query-id - dtype: string name: instruction splits: - name: instruction num_examples: 64 - config_name: default features: - dtype: string name: query-id - dtype: string name: corpus-id - dtype: float64 name: score splits: - name: test num_examples: 8554 - config_name: qrel_diff features: - dtype: string name: query-id - list: string name: corpus-ids splits: - name: qrel_diff num_examples: 32 - config_name: top_ranked features: - dtype: string name: query-id - list: string name: corpus-ids splits: - name: top_ranked num_examples: 64 language: - en multilinguality: - monolingual tags: - text-retrieval - instruction-retrieval task_categories: - text-retrieval task_ids: - document-retrieval --- # news21-instructions-mteb This is a new version of the news21-instructions dataset modified to fit the new MTEB format. 1. Restructured queries to include both original and changed versions 2. Separated instructions into a dedicated configuration 3. Reorganized qrels into default (original) and qrel_diff configurations ## Dataset Structure The dataset contains the following configurations: - corpus: Original corpus documents - queries: Queries with both original and changed versions - instruction: Instructions for both original and changed queries - default: Original relevance judgments - qrel_diff: Changes in relevance judgments - top_ranked: Top ranked documents for each query

配置项: - 配置名称:corpus 数据文件: - 路径:corpus/corpus-* 拆分:corpus - 配置名称:queries 数据文件: - 路径:queries/queries-* 拆分:queries - 配置名称:instruction 数据文件: - 路径:instruction/instruction-* 拆分:instruction - 配置名称:default 数据文件: - 路径:data/default-* 拆分:test - 配置名称:相关性判断差异(qrel_diff) 数据文件: - 路径:qrel_diff/qrel_diff-* 拆分:qrel_diff - 配置名称:top_ranked 数据文件: - 路径:top_ranked/top_ranked-* 拆分:top_ranked 数据集信息: - 配置名称:corpus 特征: - 数据类型:字符串,字段名:_id - 数据类型:字符串,字段名:title - 数据类型:字符串,字段名:text 拆分: - 拆分名称:corpus,样本数量:30921 - 配置名称:queries 特征: - 数据类型:字符串,字段名:_id - 数据类型:字符串,字段名:text 拆分: - 拆分名称:queries,样本数量:64 - 配置名称:instruction 特征: - 数据类型:字符串,字段名:query-id - 数据类型:字符串,字段名:instruction 拆分: - 拆分名称:instruction,样本数量:64 - 配置名称:default 特征: - 数据类型:字符串,字段名:query-id - 数据类型:字符串,字段名:corpus-id - 数据类型:双精度浮点数(float64),字段名:score 拆分: - 拆分名称:test,样本数量:8554 - 配置名称:相关性判断差异(qrel_diff) 特征: - 数据类型:字符串,字段名:query-id - 数据类型:字符串列表,字段名:corpus-ids 拆分: - 拆分名称:qrel_diff,样本数量:32 - 配置名称:top_ranked 特征: - 数据类型:字符串,字段名:query-id - 数据类型:字符串列表,字段名:corpus-ids 拆分: - 拆分名称:top_ranked,样本数量:64 语言: - 英语 多语言属性: - 单语言 标签: - 文本检索 - 指令检索 任务类别: - 文本检索 任务子项: - 文档检索 # news21-instructions-mteb 这是适配新版MTEB格式的news21-instructions数据集的修订版本。 1. 重构查询集,涵盖原始版本与修改后的版本 2. 将指令集单独分离为独立配置项 3. 将相关性判断文件重组为default(原始)与相关性判断差异(qrel_diff)两类配置项 ## 数据集结构 该数据集包含以下配置项: - corpus:原始语料库文档 - queries:涵盖原始版本与修改后版本的查询集 - instruction:针对原始与修改后查询的指令集 - default:原始相关性判断结果 - 相关性判断差异(qrel_diff):相关性判断的变更内容 - top_ranked:针对每个查询的Top排名文档
提供机构:
jhu-clsp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作