pyterrier/scidocs.terrier
收藏Hugging Face2024-10-08 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/pyterrier/scidocs.terrier
下载链接
链接失效反馈官方服务:
资源简介:
---
# pretty_name: "" # Example: "MS MARCO Terrier Index"
tags:
- pyterrier
- pyterrier-artifact
- pyterrier-artifact.sparse_index
- pyterrier-artifact.sparse_index.terrier
task_categories:
- text-retrieval
viewer: false
---
# scidocs.terrier
## Description
Terrier index for SciDocs
## Usage
```python
# Load the artifact
import pyterrier as pt
index = pt.Artifact.from_hf('pyterrier/scidocs.terrier')
index.bm25()
```
## Benchmarks
| name | nDCG@10 | R@1000 |
|:-------|----------:|---------:|
| bm25 | 0.1582 | 0.5713 |
| dph | 0.1499 | 0.5649 |
## Reproduction
```python
import pyterrier as pt
from tqdm import tqdm
import ir_datasets
dataset = ir_datasets.load('beir/scidocs')
meta_docno_len = dataset.metadata()['docs']['fields']['doc_id']['max_len']
indexer = pt.IterDictIndexer("./scidocs.terrier", meta={'docno': meta_docno_len, 'text': 4096})
docs = ({'docno': d.doc_id, 'text': '{title}\n{text}'.format(**d._asdict())} for d in tqdm(dataset.docs))
indexer.index(docs)
```
## Metadata
```
{
"type": "sparse_index",
"format": "terrier",
"package_hint": "python-terrier"
}
```
# 展示名称: "" # 示例: "MS MARCO Terrier 索引"
标签:
- pyterrier
- pyterrier-artifact
- pyterrier-artifact.sparse_index
- pyterrier-artifact.sparse_index.terrier
任务类别:
- 文本检索(text-retrieval)
查看器: 已禁用
---
# scidocs.terrier
## 描述
面向SciDocs的Terrier索引
## 使用方法
python
# 加载该工件
import pyterrier as pt
index = pt.Artifact.from_hf('pyterrier/scidocs.terrier')
index.bm25()
## 基准测试
| 方法名称 | 归一化折损累积增益@10(nDCG@10) | 召回率@1000(R@1000) |
|:-------|----------:|---------:|
| BM25 | 0.1582 | 0.5713 |
| DPH | 0.1499 | 0.5649 |
## 复现流程
python
import pyterrier as pt
from tqdm import tqdm
import ir_datasets
dataset = ir_datasets.load('beir/scidocs')
meta_docno_len = dataset.metadata()['docs']['fields']['doc_id']['max_len']
indexer = pt.IterDictIndexer("./scidocs.terrier", meta={'docno': meta_docno_len, 'text': 4096})
docs = ({'docno': d.doc_id, 'text': '{title}
{text}'.format(**d._asdict())} for d in tqdm(dataset.docs))
indexer.index(docs)
## 元数据
{
"type": "sparse_index",
"format": "terrier",
"package_hint": "python-terrier"
}
提供机构:
pyterrier



