Bob, the ACL Anthology test collection
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://data.mendeley.com/datasets/9rrvd2myjy
下载链接
链接失效反馈官方服务:
资源简介:
This test collection, henceforth Bob, was created at the University of
Cambridge and is intended for information retrieval experiments with
scientific literature. Bob consists of:
* documents.xml - almost 10,000 research papers from the ACL
Anthology (the freely available digital archive of computational
linguistics publications), packaged as one large XML document with
tags to delimit individual papers. These documents were
processed individually using PTX, part of the Skimcast (TM) Semantic
System; please see README_PTX for details and a reference.
* queries - 82 research questions from authors of ACL Anthology
papers, in three files: queries.txt (a plaintext file
containing all 82 queries with their Anthology-based IDs and numeric
IDs), queries.lemur (a Lemur-style query file) and queries.indri (an
Indri-style queries file).
* relevance judgements - judgements by the query authors as to the
relevance of other papers in the ACL Anthology with respect to their
queries, packaged together in the TREC-style qrels.txt (0==irrelevant,
!0==relevant).
CONDITIONS OF USE: Bob may be used solely for non-commercial
purposes. When publishing work using Bob, please cite the PhD
thesis of Anna Ritchie. Below are BibTeX entries for the thesis and
further publications describing the creation of the test collection.
@phdthesis{anna_ritchie_thesis,
author = {Anna Ritchie},
title = {Citation Context Analysis for Information Retrieval},
year = {2008},
school = {University of Cambridge, UK},
}
See https://www.mendeley.com/profiles/anna-ritchie1/ for more related publications.
创建时间:
2017-01-12



