baselines-v2
收藏魔搭社区2025-12-05 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/vidore/baselines-v2
下载链接
链接失效反馈官方服务:
资源简介:
# Contextualized Embeddings Benchmark
This repository contains evaluation code for the "Contextualized Embeddings" project.
## Installation
```bash
pip install -e .
pip install git+https://github.com/jina-ai/late-chunking --no-deps # for late-chunking with jina
```
## Usage
Refer to `scripts/evaluation.py` for an example of how to use the code.
```python
from datasets import load_dataset
from cde_benchmark.embedders.sentence_transformer_embedder import SentenceTransformerEmbedder
from cde_benchmark.embedders.naive_contextual_embedder import NaiveContextualEmbedder
from cde_benchmark.formatters.data_formatter import DataFormatter
# Datasets should be correctly formatted
formatter = DataFormatter("illuin-cde/chunked-mldr", split="test")
# Non-nested example
embedder = SentenceTransformerEmbedder("intfloat/e5-base-v2")
metrics = embedder.compute_metrics_e2e(formatter)
print(metrics)
# Nested example (for conxtualized embeddings models)
embedder = NaiveContextualEmbedder("intfloat/e5-base-v2")
metrics = embedder.compute_metrics_e2e(formatter)
print(metrics)
```
# 上下文嵌入基准测试集(Contextualized Embeddings Benchmark)
本仓库包含针对「上下文嵌入(Contextualized Embeddings)」项目的评估代码。
## 安装
bash
pip install -e .
pip install git+https://github.com/jina-ai/late-chunking --no-deps # 用于实现与jina适配的延迟分块(late-chunking)功能
## 使用指南
可参考`scripts/evaluation.py`文件获取本代码的使用示例。
python
from datasets import load_dataset
from cde_benchmark.embedders.sentence_transformer_embedder import SentenceTransformerEmbedder
from cde_benchmark.embedders.naive_contextual_embedder import NaiveContextualEmbedder
from cde_benchmark.formatters.data_formatter import DataFormatter
# 数据集需遵循正确格式
formatter = DataFormatter("illuin-cde/chunked-mldr", split="test")
# 非嵌套场景示例
embedder = SentenceTransformerEmbedder("intfloat/e5-base-v2")
metrics = embedder.compute_metrics_e2e(formatter)
print(metrics)
# 嵌套示例(适用于上下文嵌入模型)
embedder = NaiveContextualEmbedder("intfloat/e5-base-v2")
metrics = embedder.compute_metrics_e2e(formatter)
print(metrics)
提供机构:
maas
创建时间:
2025-06-04



