IEPILE
收藏arXiv2024-04-08 更新2024-06-21 收录
下载链接:
https://github.com/zjunlp/IEPile
下载链接
链接失效反馈官方服务:
资源简介:
IEPILE是一个综合的双语(中英文)信息提取指令语料库,由浙江大学-蚂蚁集团知识图谱联合实验室创建。该数据集通过整合和清洗33个现有信息提取数据集构建,包含约0.32亿个令牌,覆盖了命名实体识别、关系提取和事件提取等多个任务。IEPILE旨在通过基于模式的指令生成策略,增强大型语言模型在信息提取任务上的性能,特别是在零样本泛化能力方面。该数据集的应用领域广泛,包括新闻、金融、生物医学等多个领域,旨在解决信息提取中的性能差距问题。
IEPILE is a comprehensive bilingual (Chinese-English) instruction corpus for information extraction, developed by the Joint Laboratory of Knowledge Graph, Zhejiang University and Ant Group. Constructed by integrating and cleaning 33 existing information extraction datasets, IEPILE contains approximately 32 million tokens and covers multiple tasks including named entity recognition (NER), relation extraction (RE), and event extraction. The corpus aims to enhance the performance of large language models (LLMs) on information extraction tasks through pattern-based instruction generation strategies, with a particular focus on improving their zero-shot generalization capabilities. It has broad application scenarios across multiple domains such as news, finance, and biomedicine, and is designed to address the performance gap in information extraction.
提供机构:
浙江大学-蚂蚁集团知识图谱联合实验室
创建时间:
2024-02-23
搜集汇总
数据集介绍

背景与挑战
背景概述
IEPILE是一个大规模、高质量的双语(中文和英文)信息抽取指令数据集,包含约0.32B个令牌,整合了26个英文和7个中文信息抽取数据集,覆盖通用、医疗、金融等多个领域。它采用基于模式的批处理指令生成策略构建,旨在提升模型在零样本信息抽取任务中的性能,并支持多种信息抽取任务如命名实体识别、关系抽取和事件抽取。
以上内容由遇见数据集搜集并总结生成



