cu-kairos/ontonotes_mentions_sample
收藏Hugging Face2024-05-30 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/cu-kairos/ontonotes_mentions_sample
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: mention_id
dtype: string
- name: split
dtype: string
- name: men_type
dtype: string
- name: doc_id
dtype: string
- name: sentence_id
dtype: int64
- name: sentence
dtype: string
- name: start_char
dtype: int64
- name: end_char
dtype: int64
- name: mention_text
dtype: string
- name: gold_cluster
dtype: string
- name: lemma
dtype: string
- name: sentence_tokens
dtype: string
- name: marked_sentence
dtype: string
- name: marked_doc
dtype: string
- name: srl_response
dtype: string
splits:
- name: train
num_bytes: 7677
num_examples: 10
- name: validation
num_bytes: 4594
num_examples: 5
download_size: 26564
dataset_size: 12271
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
The dataset includes multiple features such as mention_id, split, men_type, etc., each with its data type. The dataset is divided into a training set and a validation set, containing 10 and 5 samples respectively. The download size of the dataset is 26564 bytes, and the actual size is 12271 bytes. The dataset configuration is set to default, with the training and validation data files stored in the data directory.
提供机构:
cu-kairos
原始信息汇总
数据集概述
数据集特征
- mention_id: 字符串类型
- split: 字符串类型
- men_type: 字符串类型
- doc_id: 字符串类型
- sentence_id: 整数类型
- sentence: 字符串类型
- start_char: 整数类型
- end_char: 整数类型
- mention_text: 字符串类型
- gold_cluster: 字符串类型
- lemma: 字符串类型
- sentence_tokens: 字符串类型
- marked_sentence: 字符串类型
- marked_doc: 字符串类型
- srl_response: 字符串类型
数据集划分
- 训练集:
- 大小: 7677字节
- 示例数量: 10
- 验证集:
- 大小: 4594字节
- 示例数量: 5
数据集大小
- 下载大小: 26564字节
- 数据集大小: 12271字节
配置文件
- 配置名称: default
- 数据文件路径:
- 训练集: data/train-*
- 验证集: data/validation-*



