liaad/Buscape

Name: liaad/Buscape
Creator: liaad
Published: 2024-06-21 14:23:55
License: 暂无描述

Hugging Face2024-06-21 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/liaad/Buscape

下载链接

链接失效反馈

官方服务：

资源简介：

Buscapé数据集是一个用于语义角色标注（Semantic Role Labelling, SRL）的葡萄牙语数据集。该数据集基于Buscapé语料库，该语料库包含用户对产品的评论。数据集中的实例通过Palavras解析器生成的句法树进行标注，标注过程采用双盲标注方法。数据集经过处理，排除了某些不符合标准的命题，如动词索引标注错误、无动词注释、一个词有多个标签等。数据集旨在用于评估语义角色分类器。

The Buscapé dataset is a Portuguese dataset for Semantic Role Labelling (SRL). It is based on the Buscapé corpus, which contains user reviews of products. The instances in the dataset are annotated on syntactic trees generated by the Palavras parser, using a double-blind annotation method. The dataset has been processed to exclude certain propositions that do not meet the standards, such as those with verb index annotation errors, no verb annotations, or multiple labels for a single word. The dataset is intended for evaluating semantic role classifiers.

提供机构：

liaad

原始信息汇总

数据集概述

基本信息

语言: 葡萄牙语 (pt)
任务类别: 词元分类
数据集名称: Buscapé
许可证: MIT

数据集配置

配置名称: default
- 特征:
  - tokens: 字符串序列
  - srl_frames: 列表，包含
    - frames: 字符串序列
    - verb: 字符串
- 分割:
  - train: 709个样本，数据大小215929字节
- 下载大小: 47346字节
- 数据集大小: 215929字节
配置名称: flatten
- 特征:
  - tokens: 字符串序列
  - verb: 字符串
  - frames: 序列，包含类别标签
- 分割:
  - train: 709个样本，数据大小236352字节
- 下载大小: 46914字节
- 数据集大小: 236352字节

数据集用途

目的: 用于评估语义角色分类器
来源: 从Buscapé语料库提取，该语料库包含用户对产品的评论
注释: 在由Palavras解析器生成的句法树上进行双盲注释

数据处理

排除项: 排除了131个命题，包括动词索引注释错误、无动词注释或一个词有多个标签的情况
标签移除: 移除了"AM-MED"和"AM-PIN"标签，以及带有"WRONGSUBCORPUS", "LATER"或"REEXAMINE"标志的命题

引用信息

BibTeX: 见原文
APA: 见原文

5,000+

优质数据集

54 个

任务类型

进入经典数据集