allenai/wcep_dense_max

Name: allenai/wcep_dense_max
Creator: allenai
Published: 2022-11-18 20:00:07
License: 暂无描述

Hugging Face2022-11-18 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/allenai/wcep_dense_max

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - expert-generated language: - en license: - other multilinguality: - monolingual pretty_name: WCEP-10 size_categories: - 1K<n<10K source_datasets: - original task_categories: - summarization task_ids: - news-articles-summarization paperswithcode_id: wcep train-eval-index: - config: default task: summarization task_id: summarization splits: train_split: train eval_split: test col_mapping: document: text summary: target metrics: - type: rouge name: Rouge --- This is a copy of the [WCEP-10](https://huggingface.co/datasets/ccdv/WCEP-10) dataset, except the input source documents of its `test` split have been replaced by a __dense__ retriever. The retrieval pipeline used: - __query__: The `summary` field of each example - __corpus__: The union of all documents in the `train`, `validation` and `test` splits - __retriever__: [`facebook/contriever-msmarco`](https://huggingface.co/facebook/contriever-msmarco) via [PyTerrier](https://pyterrier.readthedocs.io/en/latest/) with default settings - __top-k strategy__: `"max"`, i.e. the number of documents retrieved, `k`, is set as the maximum number of documents seen across examples in this dataset, in this case `k==10` Retrieval results on the `train` set: | Recall@100 | Rprec | Precision@k | Recall@k | | ----------- | ----------- | ----------- | ----------- | | 0.8590 | 0.6490 | 0.5967 | 0.6631 | Retrieval results on the `validation` set: | Recall@100 | Rprec | Precision@k | Recall@k | | ----------- | ----------- | ----------- | ----------- | | 0.8578 | 0.6326 | 0.6040 | 0.6401 | Retrieval results on the `test` set: | Recall@100 | Rprec | Precision@k | Recall@k | | ----------- | ----------- | ----------- | ----------- | | 0.8678 | 0.6631 | 0.6301 | 0.6740 |

提供机构：

allenai

原始信息汇总

数据集概述

基本信息

名称: WCEP-10
语言: 英语
许可证: 其他
多语言性: 单语种
大小类别: 1K<n<10K
源数据集: 原始数据
任务类别: 摘要生成
任务ID: 新闻文章摘要生成
PapersWithCode ID: wcep

训练与评估

配置: default
任务: 摘要生成
任务ID: summarization
拆分:
- 训练集: train
- 评估集: test
列映射:
- 文档: text
- 摘要: target
评估指标:
- 类型: rouge
- 名称: Rouge

5,000+

优质数据集

54 个

任务类型

进入经典数据集