nataliaElv/text-descriptives-metadata
收藏数据集卡片 for text-descriptives-metadata
数据集描述
数据集概述
该数据集包含:
- 符合 Argilla 数据集格式的配置文件
argilla.yaml,用于在使用FeedbackDataset.from_huggingface方法时配置数据集。 - 兼容 HuggingFace
datasets格式的数据记录,这些记录在使用FeedbackDataset.from_huggingface时会自动加载,也可以通过datasets库独立加载。 - 用于构建和整理数据集的标注指南(如果已在 Argilla 中定义)。
加载方式
使用 Argilla 加载
安装 Argilla:
python pip install argilla --upgrade
加载数据集:
python import argilla as rg
ds = rg.FeedbackDataset.from_huggingface("nataliaElv/text-descriptives-metadata")
使用 datasets 加载
安装 datasets:
python pip install datasets --upgrade
加载数据集:
python from datasets import load_dataset
ds = load_dataset("nataliaElv/text-descriptives-metadata")
支持的任务和排行榜
该数据集可以包含多个字段、问题和响应,因此可以用于不同的 NLP 任务,具体取决于配置。数据集结构在数据集结构部分中描述。
该数据集没有关联的排行榜。
语言
[更多信息需要]
数据集结构
数据在 Argilla 中
数据集在 Argilla 中包含以下内容:字段、问题、建议、元数据和指南。
字段
| 字段名称 | 标题 | 类型 | 必填 | Markdown |
|---|---|---|---|---|
| prompt | Prompt | FieldTypes.text | True | True |
| context | Context | FieldTypes.text | False | True |
问题
| 问题名称 | 标题 | 类型 | 必填 | 描述 | 值/标签 |
|---|---|---|---|---|---|
| response | Response | QuestionTypes.text | True | N/A | N/A |
建议
建议是人工或机器生成的推荐,用于在标注过程中辅助标注者。建议总是与现有问题相关联,并以 "-suggestion" 和 "-suggestion-metadata" 命名,包含建议的值及其元数据。
元数据
元数据是一个字典,用于提供关于数据记录的额外信息。这可以为标注者提供额外的上下文,或提供关于数据记录本身的额外信息。元数据总是可选的,并且可能与 argilla.yaml 中定义的 metadata_properties 相关联。
指南
指南是可选的,只是一个用于向标注者提供指令的纯字符串。
数据实例
在 Argilla 中的数据实例示例
json { "external_id": null, "fields": { "context": null, "prompt": "Can brain cells move? By movement I mean long distance migration (preferably within the brain only)." }, "metadata": { "entropy": 0.4352176404374839, "flesch_reading_ease": 82.39000000000001, "n_characters": 85, "passed_quality_check": "True" }, "responses": [], "suggestions": [ { "agent": null, "question_name": "response", "score": null, "type": null, "value": "The question is relatively broad and one should take into account that the brain not only consists of neurons, but also glial cells (supportive cells) and pre-mitotic neuronal stem cells. Furthermore, as critical fellow-scientists have indicated, developmental stage is very important, as the developing embryonic brain is very different from the adult brain. However, after sifting through various publications, the answer to the question is actually remarkably simple: Yes, brain cells migrate. In the adult brain glial cells migrate in the brain (Klämbt, 2009). Glial cells are involved in a myriad of functions, but a notable example of migrating glial cells are the oligodendrocytes that migrate relative long distances to find their target axons onto which they wrap themselves to form the insulating myelin sheath (Tsai and Miller, 2002). Neuronal stem cells migrate over long distances in response to injury (Imitola et al., 2004) and they migrate from specific stem-cell locations (e.g., hippocampus and subventricular zone) to other regions (Clarke, 2003). Post-mitotic, but non-differentiated neurons have been shown to migrate in the adult brain in fish (Scott et al., 2012), and in mammals and non-human primates as well (Sawada et al., 2011). Not surprisingly, glial cells, stem cells and neurons also migrate during embryonic development. Most notably, post-mitotic neurons destined to fulfill peripheral functions have to migrate over relatively long distances from the neural crest to their target locations (Neuroscience, 2nd ed, Neuronal Migration)." } ] }
在 HuggingFace datasets 中的数据实例示例
json { "context": null, "external_id": null, "metadata": "{"n_characters": 85, "passed_quality_check": "True", "flesch_reading_ease": 82.39000000000001, "entropy": 0.4352176404374839}", "prompt": "Can brain cells move? By movement I mean long distance migration (preferably within the brain only).", "response": [], "response-suggestion": "The question is relatively broad and one should take into account that the brain not only consists of neurons, but also glial cells (supportive cells) and pre-mitotic neuronal stem cells. Furthermore, as critical fellow-scientists have indicated, developmental stage is very important, as the developing embryonic brain is very different from the adult brain. However, after sifting through various publications, the answer to the question is actually remarkably simple: Yes, brain cells migrate. In the adult brain glial cells migrate in the brain (Klämbt, 2009). Glial cells are involved in a myriad of functions, but a notable example of migrating glial cells are the oligodendrocytes that migrate relative long distances to find their target axons onto which they wrap themselves to form the insulating myelin sheath (Tsai and Miller, 2002). Neuronal stem cells migrate over long distances in response to injury (Imitola et al., 2004) and they migrate from specific stem-cell locations (e.g., hippocampus and subventricular zone) to other regions (Clarke, 2003). Post-mitotic, but non-differentiated neurons have been shown to migrate in the adult brain in fish (Scott et al., 2012), and in mammals and non-human primates as well (Sawada et al., 2011). Not surprisingly, glial cells, stem cells and neurons also migrate during embryonic development. Most notably, post-mitotic neurons destined to fulfill peripheral functions have to migrate over relatively long distances from the neural crest to their target locations (Neuroscience, 2nd ed, Neuronal Migration).", "response-suggestion-metadata": { "agent": null, "score": null, "type": null } }
数据字段
数据字段包括:
-
字段:这些是数据记录本身,目前仅支持文本字段。这些字段将用于提供对问题的响应。
- prompt 是
FieldTypes.text类型。 - context 是
FieldTypes.text类型(可选)。
- prompt 是
-
问题:这些问题将向标注者提出。它们可以是不同类型,如
RatingQuestion、TextQuestion、LabelQuestion、MultiLabelQuestion和RankingQuestion。- response 是
QuestionTypes.text类型。
- response 是
-
建议:从 Argilla 1.13.0 开始,建议已被包含在内,以提供标注者在标注过程中易于或辅助的建议。建议与现有问题相关联,总是可选的,并且不仅包含建议本身,还包含与之相关的元数据(如果适用)。
- response-suggestion 是
QuestionTypes.text类型(可选)。
- response-suggestion 是
此外,还有两个可选字段:
- 元数据:这是一个可选字段,用于提供关于数据记录的额外信息。这可以为标注者提供额外的上下文,或提供关于数据记录本身的额外信息。元数据总是可选的,并且可能与
argilla.yaml中定义的metadata_properties相关联。 - external_id:这是一个可选字段,用于为数据记录提供外部 ID。这可以用于将数据记录与外部资源(如数据库或文件)相关联。
数据分割
数据集包含一个单一分割,即 train。



