pyterrier/fiqa.pisa
收藏Hugging Face2024-10-08 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/pyterrier/fiqa.pisa
下载链接
链接失效反馈官方服务:
资源简介:
---
# pretty_name: "" # Example: "MS MARCO Terrier Index"
tags:
- pyterrier
- pyterrier-artifact
- pyterrier-artifact.sparse_index
- pyterrier-artifact.sparse_index.pisa
task_categories:
- text-retrieval
viewer: false
---
# fiqa.pisa
## Description
A PISA index for the FIQA dataset
## Usage
```python
# Load the artifact
import pyterrier as pt
index = pt.Artifact.from_hf('pyterrier/fiqa.pisa')
index.bm25() # returns a BM25 retriever
```
## Benchmarks
`fiqa/dev`
| name | nDCG@10 | R@1000 |
|:-------|----------:|---------:|
| bm25 | 0.263 | 0.7423 |
| dph | 0.2587 | 0.7497 |
`fiqa/test`
| name | nDCG@10 | R@1000 |
|:-------|----------:|---------:|
| bm25 | 0.2411 | 0.7504 |
| dph | 0.2401 | 0.7615 |
## Reproduction
```python
import pyterrier as pt
from tqdm import tqdm
import ir_datasets
from pyterrier_pisa import PisaIndex
index = PisaIndex("fiqa.pisa", threads=16)
dataset = ir_datasets.load('beir/fiqa')
docs = ({'docno': d.doc_id, 'text': d.default_text()} for d in tqdm(dataset.docs))
index.index(docs)
```
## Metadata
```
{
"type": "sparse_index",
"format": "pisa",
"package_hint": "pyterrier-pisa",
"stemmer": "porter2"
}
```
# 美观名称: "" # 示例:"MS MARCO Terrier 索引"
标签:
- PyTerrier(pyterrier)
- PyTerrier工件(pyterrier-artifact)
- PyTerrier稀疏索引工件(pyterrier-artifact.sparse_index)
- PyTerrier PISA稀疏索引工件(pyterrier-artifact.sparse_index.pisa)
任务类别:
- 文本检索(text-retrieval)
查看器: 否
## fiqa.pisa
## 数据集描述
适用于FIQA数据集的PISA索引。
## 使用方法
python
# 加载工件(Artifact)
import pyterrier as pt
index = pt.Artifact.from_hf('pyterrier/fiqa.pisa')
index.bm25() # 返回一个BM25检索器(BM25 retriever)
## 基准测试
`fiqa/dev`
| 方法名称 | 归一化折损累计增益@10(nDCG@10) | 召回率@1000(Recall@1000) |
|:-------|----------:|---------:|
| bm25 | 0.263 | 0.7423 |
| dph | 0.2587 | 0.7497 |
`fiqa/test`
| 方法名称 | 归一化折损累计增益@10(nDCG@10) | 召回率@1000(Recall@1000) |
|:-------|----------:|---------:|
| bm25 | 0.2411 | 0.7504 |
| dph | 0.2401 | 0.7615 |
## 复现流程
python
import pyterrier as pt
from tqdm import tqdm
import ir_datasets
from pyterrier_pisa import PisaIndex
index = PisaIndex("fiqa.pisa", threads=16)
dataset = ir_datasets.load('beir/fiqa')
docs = ({'docno': d.doc_id, 'text': d.default_text()} for d in tqdm(dataset.docs))
index.index(docs)
## 元数据
{
"类型": "稀疏索引(sparse_index)",
"格式": "pisa",
"依赖包提示": "pyterrier-pisa",
"词干提取器": "波特2词干提取器(porter2)"
}
提供机构:
pyterrier



