johnatanebonilla/yt_haber_parsed
收藏Hugging Face2024-06-28 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/johnatanebonilla/yt_haber_parsed
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个字段,主要涉及文本数据,包括Before、Haber、After、Match等字段,这些字段可能用于描述某种事件或行为的前后状态及其匹配情况。此外,还包括Exact_Timestamp、Video ID等字段,可能用于时间戳和视频标识。数据集还包含与Haber相关的详细语言特征,如词性、依存关系、形态等,以及其子节点和头节点的详细信息。数据集分为训练集,包含215,512个样本,总大小为99,233,414字节。
This dataset includes multiple fields primarily related to text data, such as Before, Haber, After, Match, which may describe the pre and post states of certain events or behaviors and their matching situations. Additionally, it includes fields like Exact_Timestamp and Video ID, possibly for timestamping and video identification. The dataset also contains detailed linguistic features related to Haber, such as part of speech, dependency relations, morphology, and detailed information about its child and head nodes. The dataset is divided into a training set containing 215,512 samples, with a total size of 99,233,414 bytes.
提供机构:
johnatanebonilla
原始信息汇总
数据集概述
特征信息
- Before: 数据类型为字符串(string)
- Haber: 数据类型为字符串(string)
- After: 数据类型为字符串(string)
- Match: 数据类型为字符串(string)
- Exact_Timestamp: 数据类型为64位整数(int64)
- Video ID: 数据类型为字符串(string)
- haber.pos: 数据类型为字符串(string)
- haber.dep: 数据类型为字符串(string)
- haber.morph: 数据类型为字符串(string)
- haber.child.text: 数据类型为字符串(string)
- haber.child.lemma: 数据类型为字符串(string)
- haber.child.pos: 数据类型为字符串(string)
- haber.child.dep: 数据类型为字符串(string)
- haber.child.morph: 数据类型为字符串(string)
- haber.head.text: 数据类型为字符串(string)
- haber.head.lemma: 数据类型为字符串(string)
- haber.head.pos: 数据类型为字符串(string)
- haber.head.dep: 数据类型为字符串(string)
- haber.head.morph: 数据类型为字符串(string)
数据集划分
- train: 包含215,512个样本,总字节数为99,233,414
数据集大小
- 下载大小: 34,353,843字节
- 数据集总大小: 99,233,414字节
配置信息
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



