SamuelMauli/parity-juridico-dataset-v2
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/SamuelMauli/parity-juridico-dataset-v2
下载链接
链接失效反馈官方服务:
资源简介:
parity-juridico-dataset-v2是一个巴西法律检索数据集,包含法律文本的三元组、对和原始语料库,均为葡萄牙语(PT-BR)。数据集整合了多个公开的HuggingFace数据集和Parity精心筛选的语料,以及可选的TCU/PNCP爬取数据。具体包括2366个三元组(anchor, positive, negative)、7317个文本对(texto_a, texto_b, label∈{0,1})和26079个原始法律文本。数据集适用于文本检索、句子相似度和文本分类等任务,特别针对巴西法律领域。
The parity-juridico-dataset-v2 is a Brazilian legal retrieval dataset comprising triplets, pairs, and a corpus of raw legal texts in Portuguese (PT-BR). It aggregates various public HF datasets, Paritys curated corpus, and optional scrapings from TCU/PNCP. Specifically, it includes 2366 triplets (anchor, positive, negative), 7317 text pairs (texto_a, texto_b, label∈{0,1}), and 26079 raw legal texts. The dataset is designed for tasks such as text retrieval, sentence similarity, and text classification, particularly in the Brazilian legal domain.
提供机构:
SamuelMauli



