five

pt-br2libras-gloss

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/ryj88ckjww
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a UTF-8 encoded Comma-Separated Values (CSV) format containing a bilingual parallel corpus of 127,349 aligned sentence pairs in Brazilian Portuguese and LIBRAS gloss. The file includes four columns: pt-br: Original sentences in Brazilian Portuguese. libras-gloss: Corresponding translations in LIBRAS gloss notation, forming the primary aligned pair with the Brazilian Portuguese sentences. is_government_source: A boolean field indicating whether the source sentence was extracted from an official Brazilian Federal Government website (True) or from a non-governmental source (False). english_translation: An automatically generated English translation of the Brazilian Portuguese sentence. This field serves as supplementary metadata for general understanding and is not part of the core bilingual alignment. A total of 55,047 sentence pairs in the dataset originate from government sources. This dataset is primarily intended to support research in bilingual corpora, machine translation, and sign language processing, specifically focusing on applications involving Brazilian Portuguese and Brazilian Sign Language (LIBRAS).
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作