stanford-crfm/i2s-webpage-test
收藏Hugging Face2024-05-18 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/stanford-crfm/i2s-webpage-test
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: css
features:
- name: structure
dtype: string
- name: text
dtype: string
- name: image
dtype: image
- name: download_url
dtype: string
- name: instance_name
dtype: string
- name: date
dtype: string
- name: additional_info
dtype: string
- name: date_scrapped
dtype: string
- name: file_filters
dtype: string
- name: compilation_info
dtype: string
- name: rendering_filters
dtype: string
- name: assets
sequence: string
- name: category
dtype: string
- name: uuid
dtype: string
- name: length
dtype: string
- name: difficulty
dtype: string
splits:
- name: validation
num_bytes: 17275376.0
num_examples: 10
download_size: 17168881
dataset_size: 17275376.0
- config_name: html
features:
- name: structure
dtype: string
- name: text
dtype: string
- name: image
dtype: image
- name: download_url
dtype: string
- name: instance_name
dtype: string
- name: date
dtype: string
- name: additional_info
dtype: string
- name: date_scrapped
dtype: string
- name: file_filters
dtype: string
- name: compilation_info
dtype: string
- name: rendering_filters
dtype: string
- name: assets
sequence: string
- name: category
dtype: string
- name: uuid
dtype: string
- name: length
dtype: string
- name: difficulty
dtype: string
splits:
- name: validation
num_bytes: 3714540.0
num_examples: 10
download_size: 3421629
dataset_size: 3714540.0
- config_name: javascript
features:
- name: structure
dtype: string
- name: text
dtype: string
- name: image
dtype: image
- name: download_url
dtype: string
- name: instance_name
dtype: string
- name: date
dtype: string
- name: additional_info
dtype: string
- name: date_scrapped
dtype: string
- name: file_filters
dtype: string
- name: compilation_info
dtype: string
- name: rendering_filters
dtype: string
- name: assets
sequence: string
- name: category
dtype: string
- name: uuid
dtype: string
- name: length
dtype: string
- name: difficulty
dtype: string
splits:
- name: validation
num_bytes: 3408339.0
num_examples: 10
download_size: 3272228
dataset_size: 3408339.0
- config_name: real
features:
- name: structure
dtype: string
- name: image
dtype: image
- name: url
dtype: string
- name: instance_name
dtype: string
- name: date_scrapped
dtype: string
- name: uuid
dtype: string
- name: category
dtype: string
- name: additional_info
dtype: string
- name: assets
sequence: string
- name: difficulty
dtype: string
splits:
- name: validation
num_bytes: 3700230.0
num_examples: 10
download_size: 3662408
dataset_size: 3700230.0
configs:
- config_name: css
data_files:
- split: validation
path: css/validation-*
- config_name: html
data_files:
- split: validation
path: html/validation-*
- config_name: javascript
data_files:
- split: validation
path: javascript/validation-*
- config_name: real
data_files:
- split: validation
path: real/validation-*
---
The dataset includes four configurations: css, html, javascript, and real. Each configuration contains multiple features such as structure, text, image, etc., and different data types. Each configuration has a validation split with 10 examples. The size and download size of the dataset are detailed in each configuration.
提供机构:
stanford-crfm
原始信息汇总
数据集概述
配置名称:css
- 特征信息:
- 结构(string)
- 文本(string)
- 图像(image)
- 下载URL(string)
- 实例名称(string)
- 日期(string)
- 附加信息(string)
- 抓取日期(string)
- 文件过滤器(string)
- 编译信息(string)
- 渲染过滤器(string)
- 资产(sequence: string)
- 类别(string)
- UUID(string)
- 长度(string)
- 难度(string)
- 数据分割:
- 验证集:
- 字节数:17275376.0
- 示例数:10
- 验证集:
- 数据集大小:17275376.0
- 下载大小:17168881
配置名称:html
- 特征信息:
- 结构(string)
- 文本(string)
- 图像(image)
- 下载URL(string)
- 实例名称(string)
- 日期(string)
- 附加信息(string)
- 抓取日期(string)
- 文件过滤器(string)
- 编译信息(string)
- 渲染过滤器(string)
- 资产(sequence: string)
- 类别(string)
- UUID(string)
- 长度(string)
- 难度(string)
- 数据分割:
- 验证集:
- 字节数:3714540.0
- 示例数:10
- 验证集:
- 数据集大小:3714540.0
- 下载大小:3421629
配置名称:javascript
- 特征信息:
- 结构(string)
- 文本(string)
- 图像(image)
- 下载URL(string)
- 实例名称(string)
- 日期(string)
- 附加信息(string)
- 抓取日期(string)
- 文件过滤器(string)
- 编译信息(string)
- 渲染过滤器(string)
- 资产(sequence: string)
- 类别(string)
- UUID(string)
- 长度(string)
- 难度(string)
- 数据分割:
- 验证集:
- 字节数:3408339.0
- 示例数:10
- 验证集:
- 数据集大小:3408339.0
- 下载大小:3272228
配置名称:real
- 特征信息:
- 结构(string)
- 图像(image)
- URL(string)
- 实例名称(string)
- 抓取日期(string)
- UUID(string)
- 类别(string)
- 附加信息(string)
- 资产(sequence: string)
- 难度(string)
- 数据分割:
- 验证集:
- 字节数:3700230.0
- 示例数:10
- 验证集:
- 数据集大小:3700230.0
- 下载大小:3662408



