five

合成上下文检索训练数据集

收藏
arXiv2024-04-08 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2310.10118v3
下载链接
链接失效反馈
官方服务:
资源简介:
合成上下文检索训练数据集是由阿维尼翁大学信息实验室创建,用于解决长文档中命名实体识别的挑战。该数据集包含2716个样本,通过Alpaca大型语言模型生成,旨在提供文档级别的上下文信息以辅助实体识别。数据集的创建过程涉及使用特定的提示模板生成正负样本,以训练神经网络模型进行上下文检索。此数据集特别适用于文学作品的分析,旨在提高模型在长文本中的实体识别准确性。

The Synthetic Context Retrieval Training Dataset was developed by the Information Laboratory of Avignon University to address the challenges of named entity recognition (NER) in long documents. Comprising 2716 samples, this dataset is generated via the Alpaca Large Language Model (LLM), with the core objective of providing document-level contextual information to facilitate entity recognition. The dataset construction process utilizes specific prompt templates to generate both positive and negative samples, aiming to train neural network models for context retrieval. This dataset is particularly applicable to literary works analysis, with the purpose of enhancing the accuracy of entity recognition models when processing long texts.
提供机构:
阿维尼翁大学信息实验室
创建时间:
2023-10-16
二维码
社区交流群
二维码
科研交流群
商业服务