atoomic/emoticonnect-sample
收藏Hugging Face2023-07-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/atoomic/emoticonnect-sample
下载链接
链接失效反馈官方服务:
资源简介:
---
license: artistic-2.0
task_categories:
- text-classification
language:
- fr
---
# Description
Data is using `.jsonl` format (each line is self isolated .json and can be parsed on its own).
Each row contains a text indexed by the key `content:` and some ratings split into groups.
* csp
* feeling
* gen
* persona
* sex
At this stage only the `feeling` group is filled.
Note: for now all vectors are filled with `0` value when mising.
This could change over time to save some space.
## Sample
Row example (pretty)
```json
{
"content": "...some text...",
"metadata":
{
"lng": "fr"
},
"rating":
{
},
"ratings":
{
"csp":
{
"c1": 0,
"c2": 0,
"c3": 0,
"c4": 0,
"c5": 0,
"c6": 0,
"c7": 0,
"c8": 0
},
"feeling":
{
"f1": 0,
"f2": 100,
"f3": 0,
"f4": 0,
"f5": 0,
"f6": 0,
"f7": 0,
"f8": 0
},
"gen":
{
"g1": 0,
"g2": 0,
"g3": 0,
"g4": 0
},
"persona":
{
"p1": 0,
"p2": 0,
"p3": 0,
"p4": 0,
"p5": 0,
"p6": 0,
"p7": 0,
"p8": 0
},
"sex":
{
"s1": 0,
"s2": 0
}
}
}
```
Note: more than a field can be set for a group
```json
{
"content": "...some text...",
"metadata":
{
"lng": "fr"
},
"rating":
{
},
"ratings":
{
"csp":
{
"c1": 0,
"c2": 0,
"c3": 0,
"c4": 0,
"c5": 0,
"c6": 0,
"c7": 0,
"c8": 0
},
"feeling":
{
"f1": 0,
"f2": 0,
"f3": 0,
"f4": 0,
"f5": 33.33,
"f6": 66.67,
"f7": 0,
"f8": 0
},
"gen":
{
"g1": 0,
"g2": 0,
"g3": 0,
"g4": 0
},
"persona":
{
"p1": 0,
"p2": 0,
"p3": 0,
"p4": 0,
"p5": 0,
"p6": 0,
"p7": 0,
"p8": 0
},
"sex":
{
"s1": 0,
"s2": 0
}
}
}
```
提供机构:
atoomic
原始信息汇总
数据集概述
数据集基本信息
- 许可证: artistic-2.0
- 任务类别: 文本分类
- 语言: 法语 (
fr)
数据格式与结构
- 文件格式:
.jsonl,每行是一个独立的JSON格式数据。 - 数据内容:
- 每条记录包含一个文本内容,索引为
content:。 - 包含多个评分组,目前仅
feeling组有数据填充。 - 评分组包括:
csp,feeling,gen,persona,sex。
- 每条记录包含一个文本内容,索引为
数据示例
- 文本内容: 示例中未具体展示文本内容。
- 评分数据:
feeling组示例:f2值为100,其他为0;另一示例中f5值为33.33,f6值为66.67,其他为0。- 其他评分组(
csp,gen,persona,sex)当前值均为0。
数据集特点
- 缺失值处理: 目前缺失的评分值用0填充,未来可能优化以节省空间。
- 评分组多值: 一个评分组可以包含多个评分值。



