VanoInvestigations/BOE
收藏Hugging Face2023-10-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/VanoInvestigations/BOE
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: boe_date_publication
dtype: string
- name: boe_previous
dtype: string
- name: boe_id
dtype: string
- name: boe_title
dtype: string
- name: boe_soup_xml
dtype: string
- name: tweet_date
dtype: string
- name: boe_text_cleaned
dtype: string
- name: tweet_original
dtype: string
- name: boe_alert
sequence: string
- name: boe_category
dtype: string
- name: boe_departament
dtype: string
- name: tweet_text_cleaned
dtype: string
- name: boe_subsequent
dtype: string
- name: boe_materials
sequence: string
- name: id
dtype: int64
splits:
- name: train
num_bytes: 179564833
num_examples: 2867
- name: validation
num_bytes: 19448449
num_examples: 392
- name: test
num_bytes: 22514673
num_examples: 389
download_size: 84281867
dataset_size: 221527955
---
# Dataset Card for "BOE"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
VanoInvestigations
原始信息汇总
数据集概述
特征信息
数据集包含以下特征:
- boe_date_publication: 类型为字符串
- boe_previous: 类型为字符串
- boe_id: 类型为字符串
- boe_title: 类型为字符串
- boe_soup_xml: 类型为字符串
- tweet_date: 类型为字符串
- boe_text_cleaned: 类型为字符串
- tweet_original: 类型为字符串
- boe_alert: 类型为字符串序列
- boe_category: 类型为字符串
- boe_departament: 类型为字符串
- tweet_text_cleaned: 类型为字符串
- boe_subsequent: 类型为字符串
- boe_materials: 类型为字符串序列
- id: 类型为整数64位
数据分割
数据集分为以下几个部分:
- train: 包含2867个样本,大小为179564833字节
- validation: 包含392个样本,大小为19448449字节
- test: 包含389个样本,大小为22514673字节
数据集大小
- 下载大小: 84281867字节
- 数据集总大小: 221527955字节



