five

LitSearch-NLP-Class

收藏
魔搭社区2025-11-12 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/yale-nlp/LitSearch-NLP-Class
下载链接
链接失效反馈
官方服务:
资源简介:
# LitSearch: A Retrieval Benchmark for Scientific Literature Search This dataset contains the query set and retrieval corpus for our paper **LitSearch: A Retrieval Benchmark for Scientific Literature Search**. We introduce LitSearch, a retrieval benchmark comprising 597 realistic literature search queries about recent ML and NLP papers. LitSearch is constructed using a combination of (1) questions generated by GPT-4 based on paragraphs containing inline citations from research papers and (2) questions about recently published papers, manually written by their authors. All LitSearch questions were manually examined or edited by experts to ensure high quality. This dataset contains three configurations: 1. `query` containing 597 queries accomanied by gold paper IDs, specificity and quality annotations, and metadata about the source of the query. 2. `corpus_new` containing 6809 documents. We provide the extracted titles, abstracts and outgoing citation paper IDs. Each configuration has a single 'full' split. ## Usage You can load the configurations as follows: ```python from datasets import load_dataset query_data = load_dataset("yale-nlp/LitSearch-NLP-Class", "query", split="full") corpus_clean_data = load_dataset("yale-nlp/LitSearch-NLP-Class", "corpus_new", split="full") ```

# LitSearch:面向学术文献检索的基准数据集 本数据集对应论文**LitSearch(LitSearch):面向学术文献检索的基准数据集**,包含检索查询集与检索语料库。我们提出了LitSearch(LitSearch)这一检索基准数据集,其包含597条针对近期机器学习(ML, Machine Learning)与自然语言处理(NLP, Natural Language Processing)论文的真实学术文献检索查询。LitSearch的构建结合了两种来源:(1) 基于学术论文中带有内联引用的段落由GPT-4生成的查询,以及(2) 由论文作者手动撰写的针对近期发表论文的查询。所有LitSearch查询均经过专家人工审核与编辑,以确保数据集的高质量。 本数据集包含三种配置: 1. `query`配置:包含597条查询,附带金标准论文ID、特异性与质量标注,以及查询来源的元数据。 2. `corpus_new`配置:包含6809篇文档,我们提供了提取得到的标题、摘要与该文档所引用的论文ID。 每种配置均仅包含一个「全量(full)」划分。 ## 使用方法 你可以通过如下代码加载各配置: python from datasets import load_dataset query_data = load_dataset("yale-nlp/LitSearch-NLP-Class", "query", split="full") corpus_clean_data = load_dataset("yale-nlp/LitSearch-NLP-Class", "corpus_new", split="full")
提供机构:
maas
创建时间:
2025-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作