gbenson/webui-dom-snapshots

Name: gbenson/webui-dom-snapshots
Creator: gbenson
Published: 2024-06-09 07:36:33
License: 暂无描述

Hugging Face2024-06-09 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/gbenson/webui-dom-snapshots

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc0-1.0 size_categories: - 1K<n<10K source_datasets: - biglab/webui-7k - original multilinguality: - multilingual task_categories: - image-feature-extraction - reinforcement-learning - text-classification pretty_name: WebUI DOM snapshots dataset_info: features: - name: image dtype: image - name: requested_url dtype: string - name: displayed_url dtype: string - name: num_frames dtype: int64 - name: body_elements sequence: string - name: dom_snapshot struct: - name: documents list: - name: documentURL dtype: int64 - name: title dtype: int64 - name: baseURL dtype: int64 - name: contentLanguage dtype: int64 - name: encodingName dtype: int64 - name: publicId dtype: int64 - name: systemId dtype: int64 - name: frameId dtype: int64 - name: nodes struct: - name: parentIndex sequence: int64 - name: nodeType sequence: int64 - name: shadowRootType struct: - name: index sequence: int64 - name: value sequence: int64 - name: nodeName sequence: int64 - name: nodeValue sequence: int64 - name: backendNodeId sequence: int64 - name: attributes sequence: sequence: int64 - name: textValue struct: - name: index sequence: int64 - name: value sequence: int64 - name: inputValue struct: - name: index sequence: int64 - name: value sequence: int64 - name: inputChecked struct: - name: index sequence: int64 - name: optionSelected struct: - name: index sequence: int64 - name: contentDocumentIndex struct: - name: index sequence: int64 - name: value sequence: int64 - name: pseudoType struct: - name: index sequence: int64 - name: value sequence: int64 - name: pseudoIdentifier struct: - name: index sequence: 'null' - name: value sequence: 'null' - name: isClickable struct: - name: index sequence: int64 - name: currentSourceURL struct: - name: index sequence: int64 - name: value sequence: int64 - name: originURL struct: - name: index sequence: 'null' - name: value sequence: 'null' - name: layout struct: - name: nodeIndex sequence: int64 - name: styles sequence: sequence: int64 - name: bounds sequence: sequence: float64 - name: text sequence: int64 - name: stackingContexts struct: - name: index sequence: int64 - name: paintOrders sequence: int64 - name: textBoxes struct: - name: layoutIndex sequence: int64 - name: bounds sequence: sequence: float64 - name: start sequence: int64 - name: length sequence: int64 - name: scrollOffsetX dtype: int64 - name: scrollOffsetY dtype: int64 - name: contentWidth dtype: int64 - name: contentHeight dtype: int64 - name: strings sequence: string - name: capture_options struct: - name: computedStyles sequence: string - name: includePaintOrder dtype: bool - name: source_index dtype: int64 - name: source_key_name dtype: string - name: source_image_ssim dtype: float64 - name: detected_language dtype: string splits: - name: train num_bytes: 2707342861 num_examples: 4536 download_size: 1972567064 dataset_size: 2707342861 configs: - config_name: default data_files: - split: train path: data/train-* language: - en - nl - fr - zh - ja - de - id - cs - ru - pt - fi - sv - 'no' - pl - da - sl - hu - vi - is - ko - th - tr - ar - bg - el - uk - es - et - gd - ne - sk - af - bn - gl - hi - it - lt - lv - ml - sr - to --- # Dataset Card for WebUI DOM snapshots  This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description  - **Curated by:** [Gary Benson](https://gbenson.net/)  - **Languages:** Mostly English (87%); Dutch, French, Chinese, Japanese (1-2% each); 30+ others (<1% each) - **License:** [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure   [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations 87% of the examples are English.  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

gbenson

原始信息汇总

数据集概述

数据集描述

数据集名称: WebUI DOM snapshots
数据集大小: 1K<n<10K
多语言性: 多语言
任务类别:
- 图像特征提取
- 强化学习
- 文本分类
许可证: CC0 1.0 Universal

数据集结构

特征

image: 图像数据
requested_url: 请求的URL
displayed_url: 显示的URL
num_frames: 帧数
body_elements: 主体元素序列
dom_snapshot: DOM快照结构
- documents: 文档列表
  - documentURL: 文档URL
  - title: 标题
  - baseURL: 基础URL
  - contentLanguage: 内容语言
  - encodingName: 编码名称
  - publicId: 公共ID
  - systemId: 系统ID
  - frameId: 框架ID
  - nodes: 节点结构
    - parentIndex: 父节点索引序列
    - nodeType: 节点类型序列
    - shadowRootType: 阴影根类型结构
      - index: 索引序列
      - value: 值序列
    - nodeName: 节点名称序列
    - nodeValue: 节点值序列
    - backendNodeId: 后端节点ID序列
    - attributes: 属性序列
    - textValue: 文本值结构
      - index: 索引序列
      - value: 值序列
    - inputValue: 输入值结构
      - index: 索引序列
      - value: 值序列
    - inputChecked: 输入选中结构
      - index: 索引序列
    - optionSelected: 选项选中结构
      - index: 索引序列
    - contentDocumentIndex: 内容文档索引结构
      - index: 索引序列
      - value: 值序列
    - pseudoType: 伪类型结构
      - index: 索引序列
      - value: 值序列
    - pseudoIdentifier: 伪标识符结构
      - index: 索引序列
      - value: 值序列
    - isClickable: 可点击结构
      - index: 索引序列
    - currentSourceURL: 当前源URL结构
      - index: 索引序列
      - value: 值序列
    - originURL: 原始URL结构
      - index: 索引序列
      - value: 值序列
- layout: 布局结构
  - nodeIndex: 节点索引序列
  - styles: 样式序列
  - bounds: 边界序列
  - text: 文本序列
  - stackingContexts: 堆叠上下文结构
    - index: 索引序列
  - paintOrders: 绘制顺序序列
- textBoxes: 文本框结构
  - layoutIndex: 布局索引序列
  - bounds: 边界序列
  - start: 起始序列
  - length: 长度序列
- scrollOffsetX: 水平滚动偏移
- scrollOffsetY: 垂直滚动偏移
- contentWidth: 内容宽度
- contentHeight: 内容高度
strings: 字符串序列
capture_options: 捕获选项结构
- computedStyles: 计算样式序列
- includePaintOrder: 包含绘制顺序
source_index: 源索引
source_key_name: 源键名
source_image_ssim: 源图像相似度
detected_language: 检测到的语言

分割

train: 训练集
- num_bytes: 2707342861
- num_examples: 4536

配置

default: 默认配置
- data_files:
  - split: train
  - path: data/train-*

语言

en, nl, fr, zh, ja, de, id, cs, ru, pt, fi, sv, no, pl, da, sl, hu, vi, is, ko, th, tr, ar, bg, el, uk, es, et, gd, ne, sk, af, bn, gl, hi, it, lt, lv, ml, sr, to

5,000+

优质数据集

54 个

任务类型

进入经典数据集