acmc/watermarked_c4_dataset
收藏Hugging Face2024-06-09 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/acmc/watermarked_c4_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: timestamp
dtype: string
- name: url
dtype: string
- name: generated
dtype: bool
- name: model
dtype: string
- name: results
list:
- name: confidence
sequence: float64
- name: generated
dtype: int64
- name: green_fraction
sequence: float64
- name: label
dtype: int64
- name: metadata
struct:
- name: func
dtype: string
- name: model
dtype: string
- name: params
struct:
- name: percentage
dtype: float64
- name: percentage_to_replace
dtype: float64
- name: num_green_tokens
sequence: float64
- name: num_tokens_scored
sequence: float64
- name: p_value
sequence: float64
- name: prediction
sequence: bool
- name: score
sequence: float64
- name: z_score
sequence: float64
splits:
- name: train
num_bytes: 7334816
num_examples: 2000
download_size: 1851699
dataset_size: 7334816
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset includes multiple features such as text, timestamp, URL, generated flag, model name, etc. Each feature has its specific data type. Additionally, the dataset contains multiple result fields such as confidence, generated flag, green fraction, label, etc., which also have different data types and structures. The dataset is divided into a training set with 2000 samples. The download size and actual size of the dataset are also provided.
提供机构:
acmc
原始信息汇总
数据集概述
数据集特征
- text:文本类型,字符串
- timestamp:时间戳,字符串
- url:网址,字符串
- generated:是否生成,布尔类型
- model:模型名称,字符串
结果特征
- confidence:置信度,浮点数序列
- generated:生成标识,整数类型
- green_fraction:绿色部分比例,浮点数序列
- label:标签,整数类型
- metadata:元数据,结构类型,包含:
- func:函数名,字符串
- model:模型名,字符串
- params:参数,结构类型,包含:
- percentage:百分比,浮点数
- percentage_to_replace:替换百分比,浮点数
- num_green_tokens:绿色令牌数量,浮点数序列
- num_tokens_scored:评分令牌数量,浮点数序列
- p_value:P值,浮点数序列
- prediction:预测结果,布尔类型序列
- score:分数,浮点数序列
- z_score:Z分数,浮点数序列
数据集分割
- train:训练集,包含2000个示例,总大小7334816字节
数据集大小
- 下载大小:1851699字节
- 数据集大小:7334816字节
配置
- config_name:默认配置
- data_files:
- split:训练
- path:data/train-*



