coref-data/superglue_wsc_indiscrim

Name: coref-data/superglue_wsc_indiscrim
Creator: coref-data
Published: 2024-01-27 04:13:50
License: 暂无描述

Hugging Face2024-01-27 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/coref-data/superglue_wsc_indiscrim

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: text dtype: string - name: id dtype: int64 - name: sentences list: - name: end_char dtype: int64 - name: id dtype: int64 - name: speaker dtype: 'null' - name: start_char dtype: int64 - name: text dtype: string - name: tokens list: - name: deprel dtype: string - name: end_char dtype: int64 - name: feats dtype: string - name: head dtype: int64 - name: id dtype: int64 - name: lemma dtype: string - name: start_char dtype: int64 - name: text dtype: string - name: upos dtype: string - name: xpos dtype: string - name: coref_chains sequence: sequence: sequence: int64 - name: genre dtype: string - name: meta_data struct: - name: comment dtype: string splits: - name: train num_bytes: 1558989 num_examples: 554 - name: validation num_bytes: 384085 num_examples: 104 - name: test num_bytes: 545213 num_examples: 146 download_size: 332859 dataset_size: 2488287 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- This dataset was generated by reformatting [`coref-data/superglue_wsc_raw`](https://huggingface.co/datasets/coref-data/superglue_wsc_raw) into the indiscrim coreference format. See that repo for dataset details. See [ianporada/coref-data](https://github.com/ianporada/coref-data) for additional conversion details and the conversion script. Please create an issue in the repo above or in this dataset repo for any questions.

数据集信息：特征： - 名称：text，数据类型：字符串 - 名称：id，数据类型：64位整数 - 名称：sentences，为列表类型，其内部结构如下： - 名称：end_char，数据类型：int64 - 名称：id，数据类型：64位整数 - 名称：speaker，数据类型：空值 - 名称：start_char，数据类型：int64 - 名称：text，数据类型：字符串 - 名称：tokens，为列表类型，其内部结构如下： - 名称：deprel，数据类型：字符串 - 名称：end_char，数据类型：int64 - 名称：feats，数据类型：字符串 - 名称：head，数据类型：int64 - 名称：id，数据类型：64位整数 - 名称：lemma，数据类型：字符串 - 名称：start_char，数据类型：int64 - 名称：text，数据类型：字符串 - 名称：upos，数据类型：字符串 - 名称：xpos，数据类型：字符串 - 名称：coref_chains（共指链），为三层嵌套序列类型，最内层元素数据类型为int64 - 名称：genre，数据类型：字符串 - 名称：meta_data（元数据），为结构体类型，其内部字段： - 名称：comment，数据类型：字符串数据集划分： - 名称：训练集（train），占用字节数：1558989，样本数量：554 - 名称：验证集（validation），占用字节数：384085，样本数量：104 - 名称：测试集（test），占用字节数：545213，样本数量：146 下载总大小：332859，数据集总存储大小：2488287 配置项： - 配置名称：默认配置（default），对应数据文件如下： - 训练集划分：对应路径data/train-* - 验证集划分：对应路径data/validation-* - 测试集划分：对应路径data/test-* 本数据集通过将[`coref-data/superglue_wsc_raw`](https://huggingface.co/datasets/coref-data/superglue_wsc_raw)重构为无差别共指（coreference）格式生成。如需了解数据集详情，请参阅该开源仓库。如需获取更多转换细节与转换脚本，请参阅[ianporada/coref-data](https://github.com/ianporada/coref-data)开源仓库。如有任何疑问，请在上述任一仓库或本数据集仓库中提交议题（issue）。

提供机构：

coref-data

原始信息汇总

数据集概述

数据集信息

特征

text: 数据类型为字符串。
id: 数据类型为整数（int64）。
sentences: 列表类型，包含以下子特征：
- end_char: 数据类型为整数（int64）。
- id: 数据类型为整数（int64）。
- speaker: 数据类型为空（null）。
- start_char: 数据类型为整数（int64）。
- text: 数据类型为字符串。
- tokens: 列表类型，包含以下子特征：
  - deprel: 数据类型为字符串。
  - end_char: 数据类型为整数（int64）。
  - feats: 数据类型为字符串。
  - head: 数据类型为整数（int64）。
  - id: 数据类型为整数（int64）。
  - lemma: 数据类型为字符串。
  - start_char: 数据类型为整数（int64）。
  - text: 数据类型为字符串。
  - upos: 数据类型为字符串。
  - xpos: 数据类型为字符串。
coref_chains: 序列类型，包含多层嵌套的整数（int64）序列。
genre: 数据类型为字符串。
meta_data: 结构类型，包含以下子特征：
- comment: 数据类型为字符串。

数据分割

train: 包含554个样本，占用1558989字节。
validation: 包含104个样本，占用384085字节。
test: 包含146个样本，占用545213字节。

数据大小

下载大小: 332859字节。
数据集大小: 2488287字节。

配置

default: 包含以下数据文件：
- train: 路径为data/train-*。
- validation: 路径为data/validation-*。
- test: 路径为data/test-*。

5,000+

优质数据集

54 个

任务类型

进入经典数据集