CAIL2018
收藏魔搭社区2025-12-19 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/qazwsxplkj/CAIL2018
下载链接
链接失效反馈官方服务:
资源简介:
# CAIL2018
数据集来自 [CAIL2018](https://github.com/china-ai-law-challenge/CAIL2018),其源头为[中国裁判文书网](http://wenshu.court.gov.cn/)公开的刑事案件文书。其中每条数据由刑事案件文书中的案情描述和事实部分组成,同时也包括每个案件所涉及的法条、被告人被判的罪名和刑期长短等要素。
数据集共包括 291 万条数据(刑事案件),共涉及 202 条罪名,183 条法条,刑期长短包括**0-25年、无期、死刑**。
## 文件介绍
* `exercise_contest_data_train.json`:练习赛训练数据
* `exercise_contest_data_test.json`:练习赛测试数据
* `exercise_contest_data_valid.json`:练习赛验证数据
* `first_stage_test.json`:第一阶段正赛测试数据
* `first_stage_train.json`:第一阶段正赛训练数据
* `final_test.json`:封闭测试阶段测试数据
* `rest_data.json`:在比赛中未用到的数据
## 数据条目
数据文件的每一行是一条数据,对应一个刑事案件,以 JSON 对象的形式表示,例如:
```json
{
"fact": "公诉机关指控,2016年3月29日6时许,被告人严某某在其家中吸食毒品时被公安民警抓获,民警当场从其上衣口袋内搜缴甲基苯丙胺(冰毒)1包及甲基苯丙胺片剂(麻古)2包,共计12.44克。",
"meta": {
"relevant_articles": [
248,
357
],
"accusation": [
"非法持有毒品"
],
"punish_of_money": 2000,
"criminals": [
"严某某"
],
"term_of_imprisonment": {
"death_penalty": false,
"imprisonment": 6,
"life_imprisonment": false
}
}
}
```
其中各个字段的含义为:
* `fact`:事实描述(当作一个小文章来处理)
* `meta`:元数据/标注信息
* `relevant_articles`:相关法条列表
* `accusation`:罪名列表
* `punish_of_money`:罚金(单位:元)
* `criminals`:被告人列表
* `term_of_imprisonment`:
* `death_penalty`:是否死刑
* `imprisonment`:刑期(单位:月)
* `life_imprisonment`:是否无期徒刑
# CAIL2018
This dataset originates from [CAIL2018](https://github.com/china-ai-law-challenge/CAIL2018), which is derived from public criminal case documents on [China Judgments Online](http://wenshu.court.gov.cn/). Each data entry consists of the case description and factual section from a criminal judgment document, along with relevant elements including the applicable legal provisions, the convicted charges of the defendants, and the length of the sentence, etc.
The dataset contains a total of 2.91 million criminal case entries, involving 202 charges and 183 legal provisions. The sentence types cover 0-25 years of imprisonment, life imprisonment, and the death penalty.
## File Introduction
* `exercise_contest_data_train.json`: Training data for the practice contest
* `exercise_contest_data_test.json`: Test data for the practice contest
* `exercise_contest_data_valid.json`: Validation data for the practice contest
* `first_stage_test.json`: Test data for the first stage formal contest
* `first_stage_train.json`: Training data for the first stage formal contest
* `final_test.json`: Test data for the closed test stage
* `rest_data.json`: Data not utilized in the contest
## Data Entries
Each line in the data files corresponds to a single criminal case, represented as a JSON object. For example:
json
{
"fact": "公诉机关指控,2016年3月29日6时许,被告人严某某在其家中吸食毒品时被公安民警抓获,民警当场从其上衣口袋内搜缴甲基苯丙胺(冰毒)1包及甲基苯丙胺片剂(麻古)2包,共计12.44克。",
"meta": {
"relevant_articles": [
248,
357
],
"accusation": [
"非法持有毒品"
],
"punish_of_money": 2000,
"criminals": [
"严某某"
],
"term_of_imprisonment": {
"death_penalty": false,
"imprisonment": 6,
"life_imprisonment": false
}
}
}
## Field Explanations
* `fact`: Factual description (treated as a short article)
* `meta`: Metadata/annotation information
* `relevant_articles`: List of applicable legal provisions
* `accusation`: List of convicted charges
* `punish_of_money`: Fine amount (unit: yuan/RMB)
* `criminals`: List of defendants
* `term_of_imprisonment`:
* `death_penalty`: Whether the death penalty is imposed
* `imprisonment`: Length of imprisonment (unit: month)
* `life_imprisonment`: Whether life imprisonment is imposed
提供机构:
maas
创建时间:
2025-09-09
搜集汇总
数据集介绍

背景与挑战
背景概述
CAIL2018是一个包含291万条中国刑事案件文书的数据集,涵盖202条罪名和183条法条,刑期范围从0-25年到无期、死刑。数据以JSON格式存储,包含案情描述、法条、罪名和刑期等详细信息,适用于法律领域的自然语言处理研究。
以上内容由遇见数据集搜集并总结生成



