five

CAIL2018

收藏
魔搭社区2025-12-19 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/qazwsxplkj/CAIL2018
下载链接
链接失效反馈
官方服务:
资源简介:
# CAIL2018 数据集来自 [CAIL2018](https://github.com/china-ai-law-challenge/CAIL2018),其源头为[中国裁判文书网](http://wenshu.court.gov.cn/)公开的刑事案件文书。其中每条数据由刑事案件文书中的案情描述和事实部分组成,同时也包括每个案件所涉及的法条、被告人被判的罪名和刑期长短等要素。 数据集共包括 291 万条数据(刑事案件),共涉及 202 条罪名,183 条法条,刑期长短包括**0-25年、无期、死刑**。 ## 文件介绍 * `exercise_contest_data_train.json`:练习赛训练数据 * `exercise_contest_data_test.json`:练习赛测试数据 * `exercise_contest_data_valid.json`:练习赛验证数据 * `first_stage_test.json`:第一阶段正赛测试数据 * `first_stage_train.json`:第一阶段正赛训练数据 * `final_test.json`:封闭测试阶段测试数据 * `rest_data.json`:在比赛中未用到的数据 ## 数据条目 数据文件的每一行是一条数据,对应一个刑事案件,以 JSON 对象的形式表示,例如: ```json { "fact": "公诉机关指控,2016年3月29日6时许,被告人严某某在其家中吸食毒品时被公安民警抓获,民警当场从其上衣口袋内搜缴甲基苯丙胺(冰毒)1包及甲基苯丙胺片剂(麻古)2包,共计12.44克。", "meta": { "relevant_articles": [ 248, 357 ], "accusation": [ "非法持有毒品" ], "punish_of_money": 2000, "criminals": [ "严某某" ], "term_of_imprisonment": { "death_penalty": false, "imprisonment": 6, "life_imprisonment": false } } } ``` 其中各个字段的含义为: * `fact`:事实描述(当作一个小文章来处理) * `meta`:元数据/标注信息 * `relevant_articles`:相关法条列表 * `accusation`:罪名列表 * `punish_of_money`:罚金(单位:元) * `criminals`:被告人列表 * `term_of_imprisonment`: * `death_penalty`:是否死刑 * `imprisonment`:刑期(单位:月) * `life_imprisonment`:是否无期徒刑

# CAIL2018 This dataset originates from [CAIL2018](https://github.com/china-ai-law-challenge/CAIL2018), which is derived from public criminal case documents on [China Judgments Online](http://wenshu.court.gov.cn/). Each data entry consists of the case description and factual section from a criminal judgment document, along with relevant elements including the applicable legal provisions, the convicted charges of the defendants, and the length of the sentence, etc. The dataset contains a total of 2.91 million criminal case entries, involving 202 charges and 183 legal provisions. The sentence types cover 0-25 years of imprisonment, life imprisonment, and the death penalty. ## File Introduction * `exercise_contest_data_train.json`: Training data for the practice contest * `exercise_contest_data_test.json`: Test data for the practice contest * `exercise_contest_data_valid.json`: Validation data for the practice contest * `first_stage_test.json`: Test data for the first stage formal contest * `first_stage_train.json`: Training data for the first stage formal contest * `final_test.json`: Test data for the closed test stage * `rest_data.json`: Data not utilized in the contest ## Data Entries Each line in the data files corresponds to a single criminal case, represented as a JSON object. For example: json { "fact": "公诉机关指控,2016年3月29日6时许,被告人严某某在其家中吸食毒品时被公安民警抓获,民警当场从其上衣口袋内搜缴甲基苯丙胺(冰毒)1包及甲基苯丙胺片剂(麻古)2包,共计12.44克。", "meta": { "relevant_articles": [ 248, 357 ], "accusation": [ "非法持有毒品" ], "punish_of_money": 2000, "criminals": [ "严某某" ], "term_of_imprisonment": { "death_penalty": false, "imprisonment": 6, "life_imprisonment": false } } } ## Field Explanations * `fact`: Factual description (treated as a short article) * `meta`: Metadata/annotation information * `relevant_articles`: List of applicable legal provisions * `accusation`: List of convicted charges * `punish_of_money`: Fine amount (unit: yuan/RMB) * `criminals`: List of defendants * `term_of_imprisonment`: * `death_penalty`: Whether the death penalty is imposed * `imprisonment`: Length of imprisonment (unit: month) * `life_imprisonment`: Whether life imprisonment is imposed
提供机构:
maas
创建时间:
2025-09-09
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CAIL2018是一个包含291万条中国刑事案件文书的数据集,涵盖202条罪名和183条法条,刑期范围从0-25年到无期、死刑。数据以JSON格式存储,包含案情描述、法条、罪名和刑期等详细信息,适用于法律领域的自然语言处理研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作