GLINER-multi-task-synthetic-data
收藏魔搭社区2025-12-10 更新2024-12-28 收录
下载链接:
https://modelscope.cn/datasets/knowledgator/GLINER-multi-task-synthetic-data
下载链接
链接失效反馈官方服务:
资源简介:
This is official synthetic dataset used to train GLiNER multi-task model.
The dataset is a list of dictionaries consisting a tokenized text with named entity recognition (NER) information. Each item represents of two main components:
1. 'tokenized_text': A list of individual words and punctuation marks from the original text, split into tokens.
2. 'ner': A list of lists containing named entity recognition information. Each inner list has three elements:
- Start index of the named entity in the tokenized text
- End index of the named entity in the tokenized text
- Label 'match' for the identified entity
The dataset was pre-annotated with Llama3-8B feeding Wikipedia articles.
**Supported tasks:**
- Named Entity Recognition (NER): Identifies and categorizes entities such as names, organizations, dates, and other specific items in the text.
- Relation Extraction: Detects and classifies relationships between entities within the text.
- Summarization: Extract the most important sentences that summarize the input text, capturing the essential information.
- Sentiment Extraction: Identify parts of the text that signalize a positive, negative, or neutral sentiment;
- Key-Phrase Extraction: Identifies and extracts important phrases and keywords from the text.
- Question-answering: Finding an answer in the text given a question;
- Open Information Extraction: Extracts pieces of text given an open prompt from a user, for example, product description extraction;
本数据集为用于训练GLiNER多任务模型的官方合成数据集。
该数据集为若干字典构成的列表,其中包含带有命名实体识别(Named Entity Recognition,NER)信息的分词文本。每条数据包含两个核心组成部分:
1. `tokenized_text`:由原始文本拆分得到的单个词汇与标点符号组成的Token列表。
2. `ner`:包含命名实体识别信息的嵌套列表。每个内层列表包含三个元素:
- 该命名实体在分词文本中的起始索引;
- 该命名实体在分词文本中的终止索引;
- 所识别实体的类别标签`match`。
该数据集由Llama3-8B对维基百科文章进行预标注生成。
**支持任务:**
- 命名实体识别(NER):对文本中的名称、组织机构、日期及其他特定实体进行识别与分类。
- 关系抽取:检测并分类文本内实体间的语义关系。
- 文本摘要:提取能够概括输入文本核心信息的关键语句,保留核心要义。
- 情感提取:识别文本中传递积极、消极或中性情感的片段;
- 关键词短语提取:从文本中识别并提取重要短语与关键词。
- 问答任务:根据给定问题从文本中定位对应答案;
- 开放信息抽取:根据用户的开放式提示提取文本片段,例如商品描述抽取。
提供机构:
maas
创建时间:
2024-12-26



