five

pkheria7/indian-legal-opposing-counsel-dataset

收藏
Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/pkheria7/indian-legal-opposing-counsel-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - question-answering - text-generation tags: - legal - indian-law - indian-constitution - opposing-counsel - moot-court - sft size_categories: - 10K<n<100K source_datasets: - viber1/indian-law-dataset - nisaar/Lawyer_GPT_India - RMani1/indian-legal-dataset-indian-law --- # ⚖️ Indian Legal Opposing Counsel Dataset A combined, preprocessed dataset of **26,326 examples** for training an Indian legal opposing counsel AI model. Ready-to-use in ChatML format for SFT training. ## 📊 Dataset Stats | Split | Rows | Size | |-------|------|------| | Train | 25,009 | 65 MB | | Test | 1,317 | 3.5 MB | | **Total** | **26,326** | **69 MB** | ## 📦 Sources | Source Dataset | Rows | Content | |---------------|------|---------| | [viber1/indian-law-dataset](https://huggingface.co/datasets/viber1/indian-law-dataset) | 24,607 | Writs, PIL, civil procedure, constitutional law, IPC | | [nisaar/Lawyer_GPT_India](https://huggingface.co/datasets/nisaar/Lawyer_GPT_India) | 150 | Landmark cases, IPC, contract law, constitutional principles | | [RMani1/indian-legal-dataset-indian-law](https://huggingface.co/datasets/RMani1/indian-legal-dataset-indian-law) | 1,569 | Indian statutes, acts, legal provisions | ## 🗂️ Format Each row has a `messages` column in **ChatML conversational format** (directly compatible with TRL SFTTrainer): ```json { "messages": [ {"role": "system", "content": "You are an experienced opposing counsel specializing in the Indian Constitution..."}, {"role": "user", "content": "What is the difference between a petition and a plaint in Indian law?"}, {"role": "assistant", "content": "A petition is a formal request submitted to a court..."} ], "source": "viber1/indian-law-dataset" } ``` ## ⬇️ Download ### Option 1: Python (recommended) ```python from datasets import load_dataset ds = load_dataset("pkheria7/indian-legal-opposing-counsel-dataset") print(ds) # DatasetDict({ # train: Dataset(25009 rows), # test: Dataset(1317 rows) # }) ``` ### Option 2: Direct JSONL downloads - 📥 [train.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/train.jsonl) (65 MB — 25,009 rows) - 📥 [eval.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/eval.jsonl) (3.5 MB — 1,317 rows) - 📥 [all.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/all.jsonl) (69 MB — all 26,326 rows) - 📥 [raw_qa_pairs.jsonl](https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/raw_qa_pairs.jsonl) (15 MB — just user/assistant, no system prompt) ### Option 3: wget / curl ```bash # Full dataset (all splits combined) wget https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/all.jsonl # Or just the training split wget https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset/resolve/main/data/train.jsonl ``` ### Option 4: Git clone ```bash git lfs install git clone https://huggingface.co/datasets/pkheria7/indian-legal-opposing-counsel-dataset ``` ## 🏋️ Training Use with the model repo: [pkheria7/indian-legal-opposing-counsel](https://huggingface.co/pkheria7/indian-legal-opposing-counsel) ```python from datasets import load_dataset from trl import SFTConfig, SFTTrainer ds = load_dataset("pkheria7/indian-legal-opposing-counsel-dataset") trainer = SFTTrainer( model="Qwen/Qwen2.5-7B-Instruct", train_dataset=ds["train"], eval_dataset=ds["test"], args=SFTConfig( max_length=2048, assistant_only_loss=True, push_to_hub=True, hub_model_id="your-username/your-model-name", ), ) trainer.train() ``` ## 📄 License Apache 2.0
提供机构:
pkheria7
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作