Brazilian Legal Proceedings (Conference)
收藏DataCite Commons2020-08-26 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/Brazilian_Legal_Proceedings_Conference_/11750184
下载链接
链接失效反馈官方服务:
资源简介:
Our data is composed by two datasets: a dataset of 3*10^6 unlabelled motions (mov_treino.txt) and a dataset containing legal proceedings (mov.txt), each with an individual and variable number of motions, but which have been labeled by law experts (tags.txt). They were labeled in 3 classes: arquivado (archived), ativo (active), suspenso (suspended). These datasets are random samples from the first (São Paulo) ans third (Rio de Janeiro) biggest State Courts. State Courts handle the most variable types of cases throughout the Courts in Brazil, and are responsible for 80% of the total amount of lawsuits. Therefore, these datasets are representative of a very significative portion of the variable use of language and expressions in Courts vocabulary.
本数据集包含两个子数据集:其一为收录300万条未标注诉讼文书的数据集(对应文件为mov_treino.txt),其二为收录法律诉讼程序文本的数据集(对应文件为mov.txt);两个子数据集各自包含数量不定的诉讼文书,且均由法律专家完成标注,标注文件为tags.txt。本次标注共设三类:归档(arquivado, archived)、有效(ativo, active)、中止(suspenso, suspended)。上述两个数据集均为随机抽样样本,分别取自巴西规模排名首位的圣保罗州法院以及第三位的里约热内卢州法院。巴西州级法院管辖全国法院体系中类型最为多元的案件,其承办的诉讼案件总量占全国总诉讼量的80%。因此,本数据集能够充分代表巴西司法语境中法院用语所涵盖的多元语言使用与表达范式。
提供机构:
figshare
创建时间:
2020-01-28



