BoostCLIR

Name: BoostCLIR
Creator: OpenDataLab
Published: 2026-05-17 06:30:07
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/BoostCLIR

下载链接

链接失效反馈

官方服务：

资源简介：

BoostCLIR 是专利摘要的双语（日英）语料库，从 MAREC 专利数据和 NTCIR PatentMT 研讨会集合的数据中提取，并附有专利现有技术检索任务的相关性判断。重要提示：语料库的英文部分包含专利 ID 以及摘要文本。由于 NTCIR 版权限制，日方仅包含专利 ID。日本专利摘要可以从日本专利文献全文中提取，这些文献可从 NTCIR 研讨会的组织者处获得。

BoostCLIR is a Japanese-English bilingual corpus of patent abstracts, extracted from the MAREC patent dataset and the NTCIR PatentMT workshop collection, with relevance judgments for patent prior art retrieval tasks. Important note: The English portion of the corpus includes both patent IDs and abstract texts. Due to copyright restrictions from NTCIR, the Japanese portion only contains patent IDs. Japanese patent abstracts can be extracted from full-text Japanese patent documents, which are available from the organizers of the NTCIR workshops.

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍