five

中国裁判文书网司法文书数据集

收藏
国家数据集管理服务平台2026-04-28 更新2026-04-29 收录
下载链接:
https://www.ndsms.cn/dataRetrieval/datasetDetail/?id=05807f9480b64395f13d14ef18acb499
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集面向法律AI大模型研发团队、智慧司法系统开发商、法律知识图谱构建机构及法律科技产品企业,旨在解决司法领域裁判文书语料分散、格式不统一、案由与审判程序标签缺失等行业痛点,这些问题的存在导致法律AI模型在文书理解、案由分类与量刑预测等任务中缺乏高质量训练语料。数据集源自中国裁判文书网官方公开的裁判文书,以刑事案例为核心,覆盖交通肇事、徇私枉法、贩卖毒品、掩饰隐瞒犯罪所得、重婚等多发案由。每份文书以JSON格式结构化封装,包含标题、案号、法院、判决日期、案由、当事人主体、审判程序、法律依据及完整判决书正文等元数据字段,完整保留公诉机关指控、辩护意见、审理查明事实、法院认为、判决结果等原始文书结构。与传统零散下载的文书原始页面不同,本数据集已完成格式统一、标签规范与多案由覆盖的系统化整理,支持直接入库与模型训练。

This dataset is targeted at legal AI large language model (LLM) R&D teams, intelligent judicial system developers, legal knowledge graph construction institutions, and legal technology product enterprises. It aims to address industry pain points in the judicial field, such as scattered judicial document corpora, inconsistent formats, and missing labels for case causes and trial procedures. These issues have caused legal AI models to lack high-quality training corpora for tasks including document understanding, case cause classification, and sentencing prediction. The dataset is sourced from official public judicial documents on China Judgments Online, centered on criminal cases and covering frequently occurring case causes such as traffic accidents, crime of bending the law for selfish interests, drug trafficking, concealing or disguising criminal proceeds, and bigamy. Each document is structurally encapsulated in JSON format, including metadata fields such as title, case number, court, judgment date, case cause, parties involved, trial procedure, legal basis, and full text of the judgment. It fully retains the original document structure including the prosecution's allegations, defense opinions, facts found after trial, court opinions, and judgment results. Unlike traditional scattered original document pages downloaded individually, this dataset has undergone systematic organization including format unification, label standardization, and coverage of multiple case causes, and supports direct database import and model training.
提供机构:
上海库帕思科技有限公司
创建时间:
2026-04-27
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集源自中国裁判文书网,以刑事案例为核心,覆盖交通肇事、徇私枉法等多种案由,提供结构化JSON格式的司法文书。它旨在解决法律AI领域中文书语料分散、格式不统一的问题,支持直接用于模型训练与智慧司法系统开发。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务