grsilva/rebel_portuguese
收藏Hugging Face2023-10-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/grsilva/rebel_portuguese
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- pt
pretty_name: rebel_pt
---
This is a dataset that was created to re-train [REBEL](https://github.com/Babelscape/rebel) to work better for the Portuguese language.
This dataset was generated using [CROCODILE](https://github.com/Babelscape/crocodile), which was adapted to use a Portuguese specific model (pt_core_news_sm) instead of their default multi-language model (xx_ent_wiki_sm).
The dataset comes with a train, test, dev and train_dev splits. The train_dev split accounts for 80% of the dataset with the remaining 20% being the training data. The train and dev split was generated from the 80% train_dev data which was further split into an 80/20.
The split for the dataset ends up being:
* Train_dev -> 80% of the data
* Test -> 20% of the data
* Train -> 64% of the data
* Dev -> 16% of the data
提供机构:
grsilva



