pkheria7/indian-legal-opposing-counsel-dataset
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/pkheria7/indian-legal-opposing-counsel-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
task_categories:
- question-answering
- text-generation
tags:
- legal
- indian-law
- indian-constitution
- opposing-counsel
- moot-court
- sft
size_categories:
- 10K<n<100K
source_datasets:
- viber1/indian-law-dataset
- nisaar/Lawyer_GPT_India
- RMani1/indian-legal-dataset-indian-law
---
# ⚖️ Indian Legal Opposing Counsel Dataset
A combined, preprocessed dataset of **26,326 examples** for training an Indian legal opposing counsel AI model. Ready-to-use in ChatML format for SFT training.
## 📊 Dataset Stats
| Split | Rows | Size |
|-------|------|------|
| Train | 25,009 | 65 MB |
| Test | 1,317 | 3.5 MB |
| **Total** | **26,326** | **69 MB** |
## 📦 Sources
| Source Dataset | Rows | Content |
|---------------|------|---------|
| [viber1/indian-law-dataset](https://huggingface.co/datasets/viber1/indian-law-dataset) | 24,607 | Writs, PIL, civil procedure, constitutional law, IPC |
| [nisaar/Lawyer_GPT_India](https://huggingface.co/datasets/nisaar/Lawyer_GPT_India) | 150 | Landmark cases, IPC, contract law, constitutional principles |
| [RMani1/indian-legal-dataset-indian-law](https://huggingface.co/datasets/RMani1/indian-legal-dataset-indian-law) | 1,569 | Indian statutes, acts, legal provisions |
## 🗂️ Format
Each row has a `messages` column in **ChatML conversational format** (directly compatible with TRL SFTTrainer):
```json
{
"messages": [
{"role": "system", "content": "You are an experienced opposing counsel specializing in the Indian Constitution..."},
{"role": "user", "content": "What is the difference between a petition and a plaint in Indian law?"},
{"role": "assistant", "content": "A petition is a formal request submitted to a court..."}
],
"source": "viber1/indian-law-dataset"
}
```
## ⬇️ Download
### Option 1: Python (recommended)
```python
from datasets import load_dataset
ds = load_dataset("pkheria7/indian-legal-opposing-counsel-dataset")
print(ds)
# DatasetDict({
# train: Dataset(25009 rows),
# test: Dataset(1317 rows)
# })
```
### Option 2: Direct JSONL downloads
- 📥 [train.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/train.jsonl) (65 MB — 25,009 rows)
- 📥 [eval.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/eval.jsonl) (3.5 MB — 1,317 rows)
- 📥 [all.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/all.jsonl) (69 MB — all 26,326 rows)
- 📥 [raw_qa_pairs.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/raw_qa_pairs.jsonl) (15 MB — just user/assistant, no system prompt)
### Option 3: wget / curl
```bash
# Full dataset (all splits combined)
wget https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/all.jsonl
# Or just the training split
wget https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/train.jsonl
```
### Option 4: Git clone
```bash
git lfs install
git clone https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset
```
## 🏋️ Training
Use with the model repo: [pkheria7/indian-legal-opposing-counsel](https://huggingface.co/pkheria7/indian-legal-opposing-counsel)
```python
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
ds = load_dataset("pkheria7/indian-legal-opposing-counsel-dataset")
trainer = SFTTrainer(
model="Qwen/Qwen2.5-7B-Instruct",
train_dataset=ds["train"],
eval_dataset=ds["test"],
args=SFTConfig(
max_length=2048,
assistant_only_loss=True,
push_to_hub=True,
hub_model_id="your-username/your-model-name",
),
)
trainer.train()
```
## 📄 License
Apache 2.0
提供机构:
pkheria7



