sunhaozhepy/sst_roberta_keywords_embeddings
收藏Hugging Face2024-01-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/sunhaozhepy/sst_roberta_keywords_embeddings
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: sentence
dtype: string
- name: label
dtype: float32
- name: tokens
dtype: string
- name: tree
dtype: string
- name: keywords
dtype: string
- name: keywords_embeddings
sequence: float32
splits:
- name: train
num_bytes: 29316285
num_examples: 8544
- name: validation
num_bytes: 3780849
num_examples: 1101
- name: test
num_bytes: 7584117
num_examples: 2210
download_size: 46990512
dataset_size: 40681251
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
The dataset includes features such as sentence (string type), label (float32 type), tokens (string type), tree structure (string type), keywords (string type), and their embeddings (float32 sequence type). The dataset is divided into a training set (8544 examples), a validation set (1101 examples), and a test set (2210 examples), with a total download size of 46990512 bytes and an actual usage size of 40681251 bytes.
提供机构:
sunhaozhepy
原始信息汇总
数据集概述
特征信息
- sentence: 类型为字符串。
- label: 类型为浮点数(float32)。
- tokens: 类型为字符串。
- tree: 类型为字符串。
- keywords: 类型为字符串。
- keywords_embeddings: 类型为浮点数序列(sequence: float32)。
数据分割
- train: 包含8544个样本,大小为29316285字节。
- validation: 包含1101个样本,大小为3780849字节。
- test: 包含2210个样本,大小为7584117字节。
数据集大小
- 下载大小: 46990512字节。
- 实际大小: 40681251字节。
配置信息
- config_name: default
- 数据文件路径:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*
- 数据文件路径:



