five

IDTheftCase-JudgmentCorpus: Indonesian Theft Case Judgment Corpus - Levels of Court

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/48x9xm7rkf
下载链接
链接失效反馈
官方服务:
资源简介:
IDTheftCase-JudgmentCorpus: Indonesian Theft Case Judgment Corpus – Levels of Court is a dataset containing the full-text documents of written judgments handed down by Indonesian courts in criminal theft cases at three levels: the court of first instance, the appellate court, and the cassation court. The dataset was created to support research and development activities in information extraction and Natural Language Processing (NLP), specifically about the processing and understanding the legal texts and court documents. All annotated entity names have been standardized and translated into English, making the dataset more suitable for international NLP research and development of multilingual or cross-lingual models. Available Annotated Files: • 1-first-instance.csv – Contains tokenized and BIO-tagged court decisions from the district courts (Pengadilan Negeri). • 2-appellate.csv – Contains tokenized and BIO-tagged decisions from the appellate courts (Pengadilan Tinggi). • 3-cassation.csv – Contains tokenized and BIO-tagged decisions from the cassation level (Mahkamah Agung). • metadata.csv – Contains contextual and hierarchical information about the judgment documents, structured into the following columns: o decision_id, o original_id, o court_level, o court_name, o year, o verdict_type, and o cross-referenced case identifiers (first_id, appellate_id, cassation_id). Entity Annotations: The dataset is annotated using a BIO tagging format, identifying over 56 legal entities that appear in court documents. All entity labels are expressed in English, covering information such as: • Parties and roles: Defendant, Lawyer, Prosecutor, Witness, PresidingJudge • Legal process: ProsecutionDate, DecisionDate, ArrestDate, CassationReason • Legal references: ChargeArticles, ProsecutionArticles, CourtRuling, DecisionCosts • Case identifiers and metadata: DecisionNumber, ChargeType, CaseLevel, IncidentLocation All documents in this dataset were obtained from public records on the official website of the Supreme Court of the Republic of Indonesia (https://putusan3.mahkamahagung.go.id/). As such, the dataset represents real-world cases and reflects the legal form of Indonesian court documents. IDTheftCase-JudgmentCorpus is an essential dataset for research in named entity recognition and extraction, punishment imposition pattern analysis, and automatic document classification in the Indonesian legal context. Moreover, the dataset is useful for developers and researchers who aim to build and implement machine learning-based models to extract, group, and analyze judgment documents at different court levels.
创建时间:
2025-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作