five

stanford-crfm/i2s-webpage-test

收藏
Hugging Face2024-05-18 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/stanford-crfm/i2s-webpage-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: css features: - name: structure dtype: string - name: text dtype: string - name: image dtype: image - name: download_url dtype: string - name: instance_name dtype: string - name: date dtype: string - name: additional_info dtype: string - name: date_scrapped dtype: string - name: file_filters dtype: string - name: compilation_info dtype: string - name: rendering_filters dtype: string - name: assets sequence: string - name: category dtype: string - name: uuid dtype: string - name: length dtype: string - name: difficulty dtype: string splits: - name: validation num_bytes: 17275376.0 num_examples: 10 download_size: 17168881 dataset_size: 17275376.0 - config_name: html features: - name: structure dtype: string - name: text dtype: string - name: image dtype: image - name: download_url dtype: string - name: instance_name dtype: string - name: date dtype: string - name: additional_info dtype: string - name: date_scrapped dtype: string - name: file_filters dtype: string - name: compilation_info dtype: string - name: rendering_filters dtype: string - name: assets sequence: string - name: category dtype: string - name: uuid dtype: string - name: length dtype: string - name: difficulty dtype: string splits: - name: validation num_bytes: 3714540.0 num_examples: 10 download_size: 3421629 dataset_size: 3714540.0 - config_name: javascript features: - name: structure dtype: string - name: text dtype: string - name: image dtype: image - name: download_url dtype: string - name: instance_name dtype: string - name: date dtype: string - name: additional_info dtype: string - name: date_scrapped dtype: string - name: file_filters dtype: string - name: compilation_info dtype: string - name: rendering_filters dtype: string - name: assets sequence: string - name: category dtype: string - name: uuid dtype: string - name: length dtype: string - name: difficulty dtype: string splits: - name: validation num_bytes: 3408339.0 num_examples: 10 download_size: 3272228 dataset_size: 3408339.0 - config_name: real features: - name: structure dtype: string - name: image dtype: image - name: url dtype: string - name: instance_name dtype: string - name: date_scrapped dtype: string - name: uuid dtype: string - name: category dtype: string - name: additional_info dtype: string - name: assets sequence: string - name: difficulty dtype: string splits: - name: validation num_bytes: 3700230.0 num_examples: 10 download_size: 3662408 dataset_size: 3700230.0 configs: - config_name: css data_files: - split: validation path: css/validation-* - config_name: html data_files: - split: validation path: html/validation-* - config_name: javascript data_files: - split: validation path: javascript/validation-* - config_name: real data_files: - split: validation path: real/validation-* ---

The dataset includes four configurations: css, html, javascript, and real. Each configuration contains multiple features such as structure, text, image, etc., and different data types. Each configuration has a validation split with 10 examples. The size and download size of the dataset are detailed in each configuration.
提供机构:
stanford-crfm
原始信息汇总

数据集概述

配置名称:css

  • 特征信息
    • 结构(string)
    • 文本(string)
    • 图像(image)
    • 下载URL(string)
    • 实例名称(string)
    • 日期(string)
    • 附加信息(string)
    • 抓取日期(string)
    • 文件过滤器(string)
    • 编译信息(string)
    • 渲染过滤器(string)
    • 资产(sequence: string)
    • 类别(string)
    • UUID(string)
    • 长度(string)
    • 难度(string)
  • 数据分割
    • 验证集:
      • 字节数:17275376.0
      • 示例数:10
  • 数据集大小:17275376.0
  • 下载大小:17168881

配置名称:html

  • 特征信息
    • 结构(string)
    • 文本(string)
    • 图像(image)
    • 下载URL(string)
    • 实例名称(string)
    • 日期(string)
    • 附加信息(string)
    • 抓取日期(string)
    • 文件过滤器(string)
    • 编译信息(string)
    • 渲染过滤器(string)
    • 资产(sequence: string)
    • 类别(string)
    • UUID(string)
    • 长度(string)
    • 难度(string)
  • 数据分割
    • 验证集:
      • 字节数:3714540.0
      • 示例数:10
  • 数据集大小:3714540.0
  • 下载大小:3421629

配置名称:javascript

  • 特征信息
    • 结构(string)
    • 文本(string)
    • 图像(image)
    • 下载URL(string)
    • 实例名称(string)
    • 日期(string)
    • 附加信息(string)
    • 抓取日期(string)
    • 文件过滤器(string)
    • 编译信息(string)
    • 渲染过滤器(string)
    • 资产(sequence: string)
    • 类别(string)
    • UUID(string)
    • 长度(string)
    • 难度(string)
  • 数据分割
    • 验证集:
      • 字节数:3408339.0
      • 示例数:10
  • 数据集大小:3408339.0
  • 下载大小:3272228

配置名称:real

  • 特征信息
    • 结构(string)
    • 图像(image)
    • URL(string)
    • 实例名称(string)
    • 抓取日期(string)
    • UUID(string)
    • 类别(string)
    • 附加信息(string)
    • 资产(sequence: string)
    • 难度(string)
  • 数据分割
    • 验证集:
      • 字节数:3700230.0
      • 示例数:10
  • 数据集大小:3700230.0
  • 下载大小:3662408
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作