allezallezallez/racisme

Name: allezallezallez/racisme
Creator: allezallezallez
Published: 2026-04-08 12:31:58
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/allezallezallez/racisme

下载链接

链接失效反馈

官方服务：

资源简介：

--- tags: - rlfh - argilla - human-feedback --- # Dataset Card for racisme This dataset has been created with [Argilla](https://github.com/argilla-io/argilla). As shown in the sections below, this dataset can be loaded into your Argilla server as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets). ## Using this dataset with Argilla To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code: ```python import argilla as rg ds = rg.Dataset.from_hub("allezallezallez/racisme", settings="auto") ``` This will load the settings and records from the dataset repository and push them to you Argilla server for exploration and annotation. ## Using this dataset with `datasets` To load the records of this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset ds = load_dataset("allezallezallez/racisme") ``` This will only load the records of the dataset, but not the Argilla settings. ## Dataset Structure This dataset repo contains: * Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `rg.Dataset.from_hub` and can be loaded independently using the `datasets` library via `load_dataset`. * The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla. * A dataset configuration folder conforming to the Argilla dataset format in `.argilla`. The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**. ### Fields The **fields** are the features or text of a dataset's records. For example, the 'text' column of a text classification dataset of the 'prompt' column of an instruction following dataset. | Field Name | Title | Type | Required | | ---------- | ----- | ---- | -------- | | message | Message | text | True | | desaccord | Désaccord inter-annotateurs | text | True | | contexte | Contexte (locuteur) | text | False | ### Questions The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking. | Question Name | Title | Type | Required | Description | Values/Labels | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | decision | Quelle annotation est correcte ? | label_selection | True | R1 / R2 = garder cette annotation, Aucun = rejeter le span, Autre = corriger manuellement | ['R1', 'R2', 'Aucun', 'Autre'] | | correction_categories | Correction : catégories | multi_label_selection | False | Si 'Autre' : sélectionnez la ou les catégories correctes pour ce span. | ['Colère', 'Dégoût', 'Joie', 'Peur', 'Surprise', 'Tristesse', 'Admiration', 'Culpabilité', 'Embarras', 'Fierté', 'Jalousie', 'Autre'] | | correction_mode | Correction : mode | label_selection | False | Si 'Autre' : sélectionnez le mode correct pour ce span. | ['Désignée', 'Comportementale', 'Suggérée', 'Montrée'] | | notes | Notes | text | False | N/A | N/A |  ### Metadata The **metadata** is a dictionary that can be used to provide additional information about the dataset record. | Metadata Name | Title | Type | Values | Visible for Annotators | | ------------- | ----- | ---- | ------ | ---------------------- | | idx | Index message | integer | - | True | | n_disagreements | Nb désaccords (message) | integer | - | True | | type_desaccord | Type de désaccord | terms | - | True | ### Data Splits The dataset contains a single split, which is `train`. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation guidelines [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]

--- 标签： - rlfh - Argilla - 人类反馈（human-feedback） --- # 数据集卡片：racisme 本数据集基于[Argilla（Argilla）](https://github.com/argilla-io/argilla)构建。如下文所述，本数据集既可按照[通过Argilla加载](#通过argilla加载)的方式加载至您的Argilla服务器，也可直接通过`datasets（datasets）`库按照[通过datasets加载](#通过datasets加载)的方式使用。 ## 结合Argilla使用本数据集若要通过Argilla加载本数据集，仅需执行`pip install argilla --upgrade`升级安装Argilla，随后运行如下代码： python import argilla as rg ds = rg.Dataset.from_hub("allezallezallez/racisme", settings="auto") 该操作将从数据集仓库加载配置与记录，并推送至您的Argilla服务器以供探索与标注。 ## 结合datasets库使用本数据集若要通过`datasets`库加载本数据集的记录，仅需执行`pip install datasets --upgrade`升级安装datasets库，随后运行如下代码： python from datasets import load_dataset ds = load_dataset("allezallezallez/racisme") 该操作仅会加载数据集的记录，而非Argilla配置。 ## 数据集结构本数据集仓库包含以下内容： * 兼容HuggingFace `datasets`格式的数据集记录：使用`rg.Dataset.from_hub`时会自动加载这些记录，也可通过`datasets`库的`load_dataset`函数独立加载。 * 用于构建与整理数据集的[标注指南](#标注指南)（若已在Argilla中定义）。 * 符合Argilla数据集格式的`.argilla`数据集配置文件夹。本数据集在Argilla中通过以下元素构建：**字段（fields）**、**问题（questions）**、**建议（suggestions）**、**元数据（metadata）**、**向量（vectors）**与**指南（guidelines）**。 ### 字段 **字段**即数据集记录的特征或文本内容。例如文本分类数据集的`text`列，或指令跟随数据集的`prompt`列。 | 字段名称 | 标题 | 类型 | 是否必填 | | -------- | ---- | ---- | -------- | | message | 消息 | text | 是 | | desaccord | 标注者间分歧 | text | 是 | | contexte | 上下文（发言者） | text | 否 | ### 标注问题 **标注问题**即向标注者提出的问题，可分为多种类型，如评分、文本、标签选择、多标签选择或排序。 | 问题名称 | 标题 | 类型 | 是否必填 | 描述 | 可选值/标签 | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | decision | 哪项标注正确？ | label_selection | 是 | R1 / R2 = 保留该标注，Aucun = 拒绝该片段，Autre = 手动修正 | ['R1', 'R2', 'Aucun', 'Autre'] | | correction_categories | 修正：类别 | multi_label_selection | 否 | 若选择「Autre」，请为该片段选择正确的一个或多个类别 | ['愤怒', '厌恶', '喜悦', '恐惧', '惊讶', '悲伤', '钦佩', '愧疚', '尴尬', '自豪', '嫉妒', '其他'] | | correction_mode | 修正：方式 | label_selection | 否 | 若选择「Autre」，请为该片段选择正确的方式 | ['指定式', '行为式', '暗示式', '展示式'] | | notes | 备注 | text | 否 | 无 | 无 | ### 元数据 **元数据**是用于提供数据集记录额外信息的字典。 | 元数据名称 | 标题 | 类型 | 可选值 | 对标注者可见 | | ------------- | ----- | ---- | ------ | ---------------------- | | idx | 消息索引 | integer | - | 是 | | n_disagreements | 消息分歧数 | integer | - | 是 | | type_desaccord | 分歧类型 | terms | - | 是 | ### 数据划分本数据集仅包含一个划分，即`train`（训练集）。 ## 数据集构建 ### 构建依据 [需补充更多信息] ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生产者是谁？ [需补充更多信息] ### 标注信息 #### 标注指南 [需补充更多信息] #### 标注流程 [需补充更多信息] #### 标注者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知限制 [需补充更多信息] ## 补充信息 ### 数据集整理者 [需补充更多信息] ### 许可信息 [需补充更多信息] ### 引用信息 [需补充更多信息] ### 贡献者 [需补充更多信息]

提供机构：

allezallezallez

5,000+

优质数据集

54 个

任务类型

进入经典数据集