allezallezallez/racisme
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/allezallezallez/racisme
下载链接
链接失效反馈官方服务:
资源简介:
---
tags:
- rlfh
- argilla
- human-feedback
---
# Dataset Card for racisme
This dataset has been created with [Argilla](https://github.com/argilla-io/argilla). As shown in the sections below, this dataset can be loaded into your Argilla server as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets).
## Using this dataset with Argilla
To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code:
```python
import argilla as rg
ds = rg.Dataset.from_hub("allezallezallez/racisme", settings="auto")
```
This will load the settings and records from the dataset repository and push them to you Argilla server for exploration and annotation.
## Using this dataset with `datasets`
To load the records of this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code:
```python
from datasets import load_dataset
ds = load_dataset("allezallezallez/racisme")
```
This will only load the records of the dataset, but not the Argilla settings.
## Dataset Structure
This dataset repo contains:
* Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `rg.Dataset.from_hub` and can be loaded independently using the `datasets` library via `load_dataset`.
* The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla.
* A dataset configuration folder conforming to the Argilla dataset format in `.argilla`.
The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**.
### Fields
The **fields** are the features or text of a dataset's records. For example, the 'text' column of a text classification dataset of the 'prompt' column of an instruction following dataset.
| Field Name | Title | Type | Required |
| ---------- | ----- | ---- | -------- |
| message | Message | text | True |
| desaccord | Désaccord inter-annotateurs | text | True |
| contexte | Contexte (locuteur) | text | False |
### Questions
The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking.
| Question Name | Title | Type | Required | Description | Values/Labels |
| ------------- | ----- | ---- | -------- | ----------- | ------------- |
| decision | Quelle annotation est correcte ? | label_selection | True | R1 / R2 = garder cette annotation, Aucun = rejeter le span, Autre = corriger manuellement | ['R1', 'R2', 'Aucun', 'Autre'] |
| correction_categories | Correction : catégories | multi_label_selection | False | Si 'Autre' : sélectionnez la ou les catégories correctes pour ce span. | ['Colère', 'Dégoût', 'Joie', 'Peur', 'Surprise', 'Tristesse', 'Admiration', 'Culpabilité', 'Embarras', 'Fierté', 'Jalousie', 'Autre'] |
| correction_mode | Correction : mode | label_selection | False | Si 'Autre' : sélectionnez le mode correct pour ce span. | ['Désignée', 'Comportementale', 'Suggérée', 'Montrée'] |
| notes | Notes | text | False | N/A | N/A |
<!-- check length of metadata properties -->
### Metadata
The **metadata** is a dictionary that can be used to provide additional information about the dataset record.
| Metadata Name | Title | Type | Values | Visible for Annotators |
| ------------- | ----- | ---- | ------ | ---------------------- |
| idx | Index message | integer | - | True |
| n_disagreements | Nb désaccords (message) | integer | - | True |
| type_desaccord | Type de désaccord | terms | - | True |
### Data Splits
The dataset contains a single split, which is `train`.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation guidelines
[More Information Needed]
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
---
标签:
- rlfh
- Argilla
- 人类反馈(human-feedback)
---
# 数据集卡片:racisme
本数据集基于[Argilla(Argilla)](https://github.com/argilla-io/argilla)构建。如下文所述,本数据集既可按照[通过Argilla加载](#通过argilla加载)的方式加载至您的Argilla服务器,也可直接通过`datasets(datasets)`库按照[通过datasets加载](#通过datasets加载)的方式使用。
## 结合Argilla使用本数据集
若要通过Argilla加载本数据集,仅需执行`pip install argilla --upgrade`升级安装Argilla,随后运行如下代码:
python
import argilla as rg
ds = rg.Dataset.from_hub("allezallezallez/racisme", settings="auto")
该操作将从数据集仓库加载配置与记录,并推送至您的Argilla服务器以供探索与标注。
## 结合datasets库使用本数据集
若要通过`datasets`库加载本数据集的记录,仅需执行`pip install datasets --upgrade`升级安装datasets库,随后运行如下代码:
python
from datasets import load_dataset
ds = load_dataset("allezallezallez/racisme")
该操作仅会加载数据集的记录,而非Argilla配置。
## 数据集结构
本数据集仓库包含以下内容:
* 兼容HuggingFace `datasets`格式的数据集记录:使用`rg.Dataset.from_hub`时会自动加载这些记录,也可通过`datasets`库的`load_dataset`函数独立加载。
* 用于构建与整理数据集的[标注指南](#标注指南)(若已在Argilla中定义)。
* 符合Argilla数据集格式的`.argilla`数据集配置文件夹。
本数据集在Argilla中通过以下元素构建:**字段(fields)**、**问题(questions)**、**建议(suggestions)**、**元数据(metadata)**、**向量(vectors)**与**指南(guidelines)**。
### 字段
**字段**即数据集记录的特征或文本内容。例如文本分类数据集的`text`列,或指令跟随数据集的`prompt`列。
| 字段名称 | 标题 | 类型 | 是否必填 |
| -------- | ---- | ---- | -------- |
| message | 消息 | text | 是 |
| desaccord | 标注者间分歧 | text | 是 |
| contexte | 上下文(发言者) | text | 否 |
### 标注问题
**标注问题**即向标注者提出的问题,可分为多种类型,如评分、文本、标签选择、多标签选择或排序。
| 问题名称 | 标题 | 类型 | 是否必填 | 描述 | 可选值/标签 |
| ------------- | ----- | ---- | -------- | ----------- | ------------- |
| decision | 哪项标注正确? | label_selection | 是 | R1 / R2 = 保留该标注,Aucun = 拒绝该片段,Autre = 手动修正 | ['R1', 'R2', 'Aucun', 'Autre'] |
| correction_categories | 修正:类别 | multi_label_selection | 否 | 若选择「Autre」,请为该片段选择正确的一个或多个类别 | ['愤怒', '厌恶', '喜悦', '恐惧', '惊讶', '悲伤', '钦佩', '愧疚', '尴尬', '自豪', '嫉妒', '其他'] |
| correction_mode | 修正:方式 | label_selection | 否 | 若选择「Autre」,请为该片段选择正确的方式 | ['指定式', '行为式', '暗示式', '展示式'] |
| notes | 备注 | text | 否 | 无 | 无 |
### 元数据
**元数据**是用于提供数据集记录额外信息的字典。
| 元数据名称 | 标题 | 类型 | 可选值 | 对标注者可见 |
| ------------- | ----- | ---- | ------ | ---------------------- |
| idx | 消息索引 | integer | - | 是 |
| n_disagreements | 消息分歧数 | integer | - | 是 |
| type_desaccord | 分歧类型 | terms | - | 是 |
### 数据划分
本数据集仅包含一个划分,即`train`(训练集)。
## 数据集构建
### 构建依据
[需补充更多信息]
### 源数据
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言生产者是谁?
[需补充更多信息]
### 标注信息
#### 标注指南
[需补充更多信息]
#### 标注流程
[需补充更多信息]
#### 标注者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知限制
[需补充更多信息]
## 补充信息
### 数据集整理者
[需补充更多信息]
### 许可信息
[需补充更多信息]
### 引用信息
[需补充更多信息]
### 贡献者
[需补充更多信息]
提供机构:
allezallezallez



