mayankkeshari/hotpotqa-sentence-retrieval-with-attn-scores-18-12-25
收藏Hugging Face2025-12-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/mayankkeshari/hotpotqa-sentence-retrieval-with-attn-scores-18-12-25
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
dataset_info:
features:
- name: query_id
dtype: string
- name: passage_id
dtype: string
- name: sentence_hash
dtype: string
- name: query
dtype: string
- name: answer
dtype: string
- name: query_type
dtype: string
- name: query_level
dtype: string
- name: passage
dtype: string
- name: passage_idx
dtype: int32
- name: title
dtype: string
- name: num_sentences_in_passage
dtype: int32
- name: sentence
dtype: string
- name: sentence_idx
dtype: int32
- name: sentence_char_start
dtype: int32
- name: sentence_char_end
dtype: int32
- name: relevant
dtype:
class_label:
names:
'0': not_relevant
'1': relevant
- name: source_split
dtype: string
- name: original_split
dtype: string
- name: passage_reranker_score
dtype: float64
- name: sentence_attn_score
dtype: float64
- name: sentence_attn_score_normalized
dtype: float64
splits:
- name: train
num_bytes: 3825021997
num_examples: 3207791
- name: validation
num_bytes: 477494362
num_examples: 401074
- name: test
num_bytes: 477002501
num_examples: 400379
download_size: 991727878
dataset_size: 4779518860
---
# Dataset Card for "hotpotqa-sentence-retrieval-with-attn-scores-18-12-25"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
配置项:
- 配置名称(config_name):default(默认配置)
数据文件(data_files):
- 数据拆分(split):训练集(train),路径(path):data/train-*
- 数据拆分:验证集(validation),路径:data/validation-*
- 数据拆分:测试集(test),路径:data/test-*
数据集信息(dataset_info):
特征字段(features):
- 字段名称:查询ID(query_id),数据类型(dtype):字符串(string)
- 字段名称:段落ID(passage_id),数据类型:字符串
- 字段名称:句子哈希值(sentence_hash),数据类型:字符串
- 字段名称:查询文本(query),数据类型:字符串
- 字段名称:答案文本(answer),数据类型:字符串
- 字段名称:查询类型(query_type),数据类型:字符串
- 字段名称:查询难度等级(query_level),数据类型:字符串
- 字段名称:段落文本(passage),数据类型:字符串
- 字段名称:段落索引(passage_idx),数据类型:int32(32位整数)
- 字段名称:标题(title),数据类型:字符串
- 字段名称:段落内句子总数(num_sentences_in_passage),数据类型:int32
- 字段名称:句子文本(sentence),数据类型:字符串
- 字段名称:句子索引(sentence_idx),数据类型:int32
- 字段名称:句子字符起始位置(sentence_char_start),数据类型:int32
- 字段名称:句子字符结束位置(sentence_char_end),数据类型:int32
- 字段名称:相关性标记(relevant),数据类型:
类别标签(class_label):
标签映射:
'0':不相关(not_relevant)
'1':相关(relevant)
- 字段名称:源数据拆分(source_split),数据类型:字符串
- 字段名称:原始数据拆分(original_split),数据类型:字符串
- 字段名称:段落重排序得分(passage_reranker_score),数据类型:float64(64位浮点数)
- 字段名称:句子注意力得分(sentence_attn_score),数据类型:float64
- 字段名称:归一化句子注意力得分(sentence_attn_score_normalized),数据类型:float64
数据拆分统计(splits):
- 拆分名称:训练集(train),总字节数:3825021997,样本总量:3207791
- 拆分名称:验证集(validation),总字节数:477494362,样本总量:401074
- 拆分名称:测试集(test),总字节数:477002501,样本总量:400379
下载总大小(download_size):991727878,数据集总大小(dataset_size):4779518860
# 数据集卡片(Dataset Card):"hotpotqa-sentence-retrieval-with-attn-scores-18-12-25"
[更多信息待补充](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
mayankkeshari



