pinecone/core-2020-05-10-deduplication
收藏Hugging Face2022-10-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pinecone/core-2020-05-10-deduplication
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- unknown
language_creators:
- unknown
language:
- en
license:
- mit
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- unknown
task_categories:
- other
task_ids:
- natural-language-inference
- semantic-similarity-scoring
- text-scoring
pretty_name: CORE Deduplication of Scholarly Documents
tags:
- deduplication
---
# Dataset Card for CORE Deduplication
## Dataset Description
- **Homepage:** [https://core.ac.uk/about/research-outputs](https://core.ac.uk/about/research-outputs)
- **Repository:** [https://core.ac.uk/datasets/core_2020-05-10_deduplication.zip](https://core.ac.uk/datasets/core_2020-05-10_deduplication.zip)
- **Paper:** [Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings](http://oro.open.ac.uk/id/eprint/70519)
- **Point of Contact:** [CORE Team](https://core.ac.uk/about#contact)
- **Size of downloaded dataset files:** 204 MB
### Dataset Summary
CORE 2020 Deduplication dataset (https://core.ac.uk/documentation/dataset) contains 100K scholarly documents labeled as duplicates/non-duplicates.
### Languages
The dataset language is English (BCP-47 `en`)
### Citation Information
```
@inproceedings{dedup2020,
title={Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings},
author={Gyawali, Bikash and Anastasiou, Lucas and Knoth, Petr},
booktitle = {Proceedings of 12th Language Resources and Evaluation Conference},
month = may,
year = 2020,
publisher = {France European Language Resources Association},
pages = {894-903}
}
```
提供机构:
pinecone
原始信息汇总
数据集概述
基本信息
- 名称: CORE Deduplication of Scholarly Documents
- 语言: 英语 (
en) - 许可证: MIT
- 多语言性: 单语种
- 大小: 100K<n<1M
- 任务类别: 其他
- 任务ID:
- natural-language-inference
- semantic-similarity-scoring
- text-scoring
- 标签: deduplication
数据集描述
- 摘要: CORE 2020 Deduplication dataset 包含100K学术文档,标记为重复/非重复。
- 下载大小: 204 MB
引用信息
@inproceedings{dedup2020, title={Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings}, author={Gyawali, Bikash and Anastasiou, Lucas and Knoth, Petr}, booktitle = {Proceedings of 12th Language Resources and Evaluation Conference}, month = may, year = 2020, publisher = {France European Language Resources Association}, pages = {894-903} }



