five

grsilva/rebel_portuguese

收藏
Hugging Face2023-10-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/grsilva/rebel_portuguese
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - pt pretty_name: rebel_pt --- This is a dataset that was created to re-train [REBEL](https://github.com/Babelscape/rebel) to work better for the Portuguese language. This dataset was generated using [CROCODILE](https://github.com/Babelscape/crocodile), which was adapted to use a Portuguese specific model (pt_core_news_sm) instead of their default multi-language model (xx_ent_wiki_sm). The dataset comes with a train, test, dev and train_dev splits. The train_dev split accounts for 80% of the dataset with the remaining 20% being the training data. The train and dev split was generated from the 80% train_dev data which was further split into an 80/20. The split for the dataset ends up being: * Train_dev -> 80% of the data * Test -> 20% of the data * Train -> 64% of the data * Dev -> 16% of the data
提供机构:
grsilva
原始信息汇总

数据集概述

数据集名称

  • 名称: rebel_pt

语言

  • 语言: 葡萄牙语

数据集目的

  • 目的: 用于重新训练 REBEL,以更好地适应葡萄牙语。

数据集生成工具

  • 生成工具: CROCODILE,使用葡萄牙语特定模型 (pt_core_news_sm) 替代默认的多语言模型 (xx_ent_wiki_sm)。

数据集划分

  • 划分:
    • Train_dev: 80% 的数据
    • Test: 20% 的数据
    • Train: 64% 的数据
    • Dev: 16% 的数据
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作