DISC-Law-SFT
收藏魔搭社区2026-05-21 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT
下载链接
链接失效反馈官方服务:
资源简介:
本仓库拷贝自: https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT
可以通过modelscope SDK下载
```python
from modelscope import MsDataset
train_dataset = MsDataset.load('AI-ModelScope/DISC-Law-SFT',
subset_name='default', split='train').to_hf_dataset()
print(train_dataset)
"""Out[0]
Dataset({
features: ['Unnamed: 0', 'id', 'input', 'output'],
num_rows: 166758
})
"""
```
# DISC-Law-SFT Dataset
Legal Intelligent systems in Chinese require a combination of various abilities, including legal text understanding and generation. To achieve this, we have constructed a high-quality supervised fine-tuning dataset called DISC-Law-SFT, which covers different legal scenarios such as legal information extraction, legal judgment prediction, legal document summarization, and legal question answering. DISC-Law-SFT comprises two subsets, DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet. The former aims to introduce legal reasoning abilities to the LLM, while the latter helps enhance the model's capability to utilize external legal knowledge. For more detailed information, please refer to our [technical report](https://arxiv.org/abs/2309.11325). The distribution of the dataset is:
<img src="" alt="" width=""/>
<table>
<tr>
<th>Dataset</th>
<th>Task/Source</th>
<th>Size</th>
<th>Scenario</th>
</tr>
<tr>
<td rowspan="10">DISC-Law-SFT-Pair</td>
<td>Legal information extraction</td>
<td>32K</td>
<td rowspan="7">Legal professional assistant</td>
</tr>
<tr>
<td>Legal event detection</td>
<td>27K</td>
</tr>
<tr>
<td>Legal case classification</td>
<td>20K</td>
</tr>
<tr>
<td>Legal judgement prediction</td>
<td>11K</td>
</tr>
<tr>
<td>Legal case matching</td>
<td>8K</td>
</tr>
<tr>
<td>Legal text summarization</td>
<td>9K</td>
</tr>
<tr>
<td>Judicial public opinion summarization</td>
<td>6K</td>
</tr>
<tr>
<td>Legal question answering</td>
<td>93K</td>
<td>Legal consultation services</td>
</tr>
<tr>
<td>Legal reading comprehension</td>
<td>38K</td>
<td rowspan="2">Judicial examination assistant</td>
</tr>
<tr>
<td>Judicial examination</td>
<td>12K</td>
</tr>
<tr>
<td rowspan="2">DISC-Law-SFT-Triple</td>
<td>Legal judgement prediction</td>
<td>16K</td>
<td>Legal professional assistant</td>
</tr>
<tr>
<td>Legal question answering</td>
<td>23K</td>
<td>Legal consultation services</td>
</tr>
<tr>
<td rowspan="2">General</td>
<td>Alpaca-GPT4</td>
<td>48K</td>
<td rowspan="2">General scenarios</td>
</tr>
<tr>
<td>Firefly</td>
<td>60K</td>
</tr>
<tr>
<td>Total</td>
<td colspan="3">403K</td>
</tr>
</table>
We currently open-source most of the DISC-Law-SFT Dataset.
More detail and news check our [homepage](https://github.com/FudanDISC/DISC-LawLLM) !
该数据集为DISC-Law-SFT数据集,其托管页面位于Hugging Face数据集平台(Hugging Face Hub),由用户ShengbinYue上传,具体访问链接为:https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT
提供机构:
maas
创建时间:
2024-02-18
搜集汇总
数据集介绍

背景与挑战
背景概述
DISC-Law-SFT是一个高质量的中文法律监督微调数据集,专为法律智能系统设计,覆盖法律信息提取、判决预测、文档摘要和问答等多种场景。数据集包含DISC-Law-SFT-Pair和DISC-Law-SFT-Triplet两个子集,分别用于增强模型的法律推理能力和外部知识利用能力,总数据量约403K条,实际开放166,758行数据,适用于法律专业助理、咨询服务和司法考试助手等应用。
以上内容由遇见数据集搜集并总结生成



