five

Bob, the ACL Anthology test collection

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://data.mendeley.com/datasets/9rrvd2myjy
下载链接
链接失效反馈
官方服务:
资源简介:
This test collection, henceforth Bob, was created at the University of Cambridge and is intended for information retrieval experiments with scientific literature. Bob consists of: * documents.xml - almost 10,000 research papers from the ACL Anthology (the freely available digital archive of computational linguistics publications), packaged as one large XML document with tags to delimit individual papers. These documents were processed individually using PTX, part of the Skimcast (TM) Semantic System; please see README_PTX for details and a reference. * queries - 82 research questions from authors of ACL Anthology papers, in three files: queries.txt (a plaintext file containing all 82 queries with their Anthology-based IDs and numeric IDs), queries.lemur (a Lemur-style query file) and queries.indri (an Indri-style queries file). * relevance judgements - judgements by the query authors as to the relevance of other papers in the ACL Anthology with respect to their queries, packaged together in the TREC-style qrels.txt (0==irrelevant, !0==relevant). CONDITIONS OF USE: Bob may be used solely for non-commercial purposes. When publishing work using Bob, please cite the PhD thesis of Anna Ritchie. Below are BibTeX entries for the thesis and further publications describing the creation of the test collection. @phdthesis{anna_ritchie_thesis, author = {Anna Ritchie}, title = {Citation Context Analysis for Information Retrieval}, year = {2008}, school = {University of Cambridge, UK}, } See https://www.mendeley.com/profiles/anna-ritchie1/ for more related publications.
创建时间:
2017-01-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作