Thaweewat/oasst1_th
收藏Hugging Face2023-10-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Thaweewat/oasst1_th
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: default
features:
- name: message_id
dtype: string
- name: parent_id
dtype: string
- name: user_id
dtype: string
- name: created_date
dtype: string
- name: text
dtype: string
- name: text_th
dtype: string
- name: role
dtype: string
- name: lang
dtype: string
- name: review_count
dtype: int32
- name: review_result
dtype: bool
- name: deleted
dtype: bool
- name: rank
dtype: float64
- name: synthetic
dtype: bool
- name: model_name
dtype: 'null'
- name: detoxify
struct:
- name: identity_attack
dtype: float64
- name: insult
dtype: float64
- name: obscene
dtype: float64
- name: severe_toxicity
dtype: float64
- name: sexual_explicit
dtype: float64
- name: threat
dtype: float64
- name: toxicity
dtype: float64
- name: message_tree_id
dtype: string
- name: tree_state
dtype: string
- name: emojis
struct:
- name: count
sequence: int32
- name: name
sequence: string
- name: labels
struct:
- name: count
sequence: int32
- name: name
sequence: string
- name: value
sequence: float64
splits:
- name: train
num_bytes: 10381992
num_examples: 4401
download_size: 0
dataset_size: 10381992
- config_name: train
features:
- name: message_id
dtype: string
- name: parent_id
dtype: string
- name: user_id
dtype: string
- name: created_date
dtype: string
- name: text
dtype: string
- name: text_th
dtype: string
- name: role
dtype: string
- name: lang
dtype: string
- name: review_count
dtype: int32
- name: review_result
dtype: bool
- name: deleted
dtype: bool
- name: rank
dtype: float64
- name: synthetic
dtype: bool
- name: model_name
dtype: 'null'
- name: detoxify
struct:
- name: identity_attack
dtype: float64
- name: insult
dtype: float64
- name: obscene
dtype: float64
- name: severe_toxicity
dtype: float64
- name: sexual_explicit
dtype: float64
- name: threat
dtype: float64
- name: toxicity
dtype: float64
- name: message_tree_id
dtype: string
- name: tree_state
dtype: string
- name: emojis
struct:
- name: count
sequence: int32
- name: name
sequence: string
- name: labels
struct:
- name: count
sequence: int32
- name: name
sequence: string
- name: value
sequence: float64
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 200135278
num_examples: 84437
download_size: 75167235
dataset_size: 200135278
- config_name: val
features:
- name: message_id
dtype: string
- name: parent_id
dtype: string
- name: user_id
dtype: string
- name: created_date
dtype: string
- name: text
dtype: string
- name: text_th
dtype: string
- name: role
dtype: string
- name: lang
dtype: string
- name: review_count
dtype: int32
- name: review_result
dtype: bool
- name: deleted
dtype: bool
- name: rank
dtype: float64
- name: synthetic
dtype: bool
- name: model_name
dtype: 'null'
- name: detoxify
struct:
- name: identity_attack
dtype: float64
- name: insult
dtype: float64
- name: obscene
dtype: float64
- name: severe_toxicity
dtype: float64
- name: sexual_explicit
dtype: float64
- name: threat
dtype: float64
- name: toxicity
dtype: float64
- name: message_tree_id
dtype: string
- name: tree_state
dtype: string
- name: emojis
struct:
- name: count
sequence: int32
- name: name
sequence: string
- name: labels
struct:
- name: count
sequence: int32
- name: name
sequence: string
- name: value
sequence: float64
splits:
- name: train
num_bytes: 10381992
num_examples: 4401
download_size: 3907352
dataset_size: 10381992
configs:
- config_name: train
data_files:
- split: train
path: train/train-*
- config_name: val
data_files:
- split: train
path: val/train-*
language:
- th
---
# Dataset Card for "oasst1_th"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Thaweewat
原始信息汇总
数据集概述
数据集配置
-
default
- 特征:
message_id: 字符串parent_id: 字符串user_id: 字符串created_date: 字符串text: 字符串text_th: 字符串role: 字符串lang: 字符串review_count: 32位整数review_result: 布尔值deleted: 布尔值rank: 64位浮点数synthetic: 布尔值model_name: nulldetoxify: 结构体identity_attack: 64位浮点数insult: 64位浮点数obscene: 64位浮点数severe_toxicity: 64位浮点数sexual_explicit: 64位浮点数threat: 64位浮点数toxicity: 64位浮点数
message_tree_id: 字符串tree_state: 字符串emojis: 结构体count: 序列化的32位整数name: 序列化的字符串
labels: 结构体count: 序列化的32位整数name: 序列化的字符串value: 序列化的64位浮点数
- 分割:
train- 字节数: 10381992
- 样本数: 4401
- 下载大小: 0
- 数据集大小: 10381992
- 特征:
-
train
- 特征:
message_id: 字符串parent_id: 字符串user_id: 字符串created_date: 字符串text: 字符串text_th: 字符串role: 字符串lang: 字符串review_count: 32位整数review_result: 布尔值deleted: 布尔值rank: 64位浮点数synthetic: 布尔值model_name: nulldetoxify: 结构体identity_attack: 64位浮点数insult: 64位浮点数obscene: 64位浮点数severe_toxicity: 64位浮点数sexual_explicit: 64位浮点数threat: 64位浮点数toxicity: 64位浮点数
message_tree_id: 字符串tree_state: 字符串emojis: 结构体count: 序列化的32位整数name: 序列化的字符串
labels: 结构体count: 序列化的32位整数name: 序列化的字符串value: 序列化的64位浮点数
__index_level_0__: 64位整数
- 分割:
train- 字节数: 200135278
- 样本数: 84437
- 下载大小: 75167235
- 数据集大小: 200135278
- 特征:
-
val
- 特征:
message_id: 字符串parent_id: 字符串user_id: 字符串created_date: 字符串text: 字符串text_th: 字符串role: 字符串lang: 字符串review_count: 32位整数review_result: 布尔值deleted: 布尔值rank: 64位浮点数synthetic: 布尔值model_name: nulldetoxify: 结构体identity_attack: 64位浮点数insult: 64位浮点数obscene: 64位浮点数severe_toxicity: 64位浮点数sexual_explicit: 64位浮点数threat: 64位浮点数toxicity: 64位浮点数
message_tree_id: 字符串tree_state: 字符串emojis: 结构体count: 序列化的32位整数name: 序列化的字符串
labels: 结构体count: 序列化的32位整数name: 序列化的字符串value: 序列化的64位浮点数
- 分割:
train- 字节数: 10381992
- 样本数: 4401
- 下载大小: 3907352
- 数据集大小: 10381992
- 特征:
数据文件配置
- train
- 数据文件:
split: trainpath: train/train-*
- 数据文件:
- val
- 数据文件:
split: trainpath: val/train-*
- 数据文件:
语言
- 泰语 (th)



