johnatanebonilla/yt_haber_parsed

Name: johnatanebonilla/yt_haber_parsed
Creator: johnatanebonilla
Published: 2024-06-28 02:56:21
License: 暂无描述

Hugging Face2024-06-28 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/johnatanebonilla/yt_haber_parsed

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个字段，主要涉及文本数据，包括Before、Haber、After、Match等字段，这些字段可能用于描述某种事件或行为的前后状态及其匹配情况。此外，还包括Exact_Timestamp、Video ID等字段，可能用于时间戳和视频标识。数据集还包含与Haber相关的详细语言特征，如词性、依存关系、形态等，以及其子节点和头节点的详细信息。数据集分为训练集，包含215,512个样本，总大小为99,233,414字节。

This dataset includes multiple fields primarily related to text data, such as Before, Haber, After, Match, which may describe the pre and post states of certain events or behaviors and their matching situations. Additionally, it includes fields like Exact_Timestamp and Video ID, possibly for timestamping and video identification. The dataset also contains detailed linguistic features related to Haber, such as part of speech, dependency relations, morphology, and detailed information about its child and head nodes. The dataset is divided into a training set containing 215,512 samples, with a total size of 99,233,414 bytes.

提供机构：

johnatanebonilla

原始信息汇总

数据集概述

特征信息

Before: 数据类型为字符串（string）
Haber: 数据类型为字符串（string）
After: 数据类型为字符串（string）
Match: 数据类型为字符串（string）
Exact_Timestamp: 数据类型为64位整数（int64）
Video ID: 数据类型为字符串（string）
haber.pos: 数据类型为字符串（string）
haber.dep: 数据类型为字符串（string）
haber.morph: 数据类型为字符串（string）
haber.child.text: 数据类型为字符串（string）
haber.child.lemma: 数据类型为字符串（string）
haber.child.pos: 数据类型为字符串（string）
haber.child.dep: 数据类型为字符串（string）
haber.child.morph: 数据类型为字符串（string）
haber.head.text: 数据类型为字符串（string）
haber.head.lemma: 数据类型为字符串（string）
haber.head.pos: 数据类型为字符串（string）
haber.head.dep: 数据类型为字符串（string）
haber.head.morph: 数据类型为字符串（string）

数据集划分

train: 包含215,512个样本，总字节数为99,233,414

数据集大小

下载大小: 34,353,843字节
数据集总大小: 99,233,414字节

配置信息

config_name: default
- data_files:
  - split: train
  - path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集