AndreaUnibo/NQ_reshaped
收藏Hugging Face2024-07-01 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/AndreaUnibo/NQ_reshaped
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从lighteval/natural_questions_clean重新塑造和处理的,仅包含简短答案。文档通过句子分割器被分割成块。ordered_index包含按与嵌入答案的余弦相似度排序的索引。n_chunks是前k个排序索引的索引,其大小基于我们使用的模型可以处理的最大令牌数量(大约4096)。需要注意的是,n_chunks中的最后一个索引可能与排序索引不同,因为检查了答案是否包含在前n_chunks中,如果没有,则选择包含答案的块作为最后一个令牌。
The dataset is a reshaped and processed version from lighteval/natural_questions_clean, containing only short answers. The document has been split into sentence chunks, and the ordered_index contains indexes ordered by cosine similarity with the embedded answer. The n_chunks are the indexes of the top_k ordered indexes, with the size based on the maximum number of tokens that could be processed by the model we used (around 4096). Its important to note that the last index in n_chunks may differ from the ordered indexes, as I checked if the answer was contained within the first n_chunks and selected the chunk containing the answer as the last token if not.
提供机构:
AndreaUnibo



