hugoramallo/legal-ai-act-spanish-sft-7k
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hugoramallo/legal-ai-act-spanish-sft-7k
下载链接
链接失效反馈官方服务:
资源简介:
---
---
license: apache-2.0
language:
- es
task_categories:
- question-answering
- text-generation
tags:
- legal
- ai-act
- eu-ai-act
- spanish
- synthetic
- sft
- regulatory-ai
- llm-finetuning
---
⚠️ Legal and Liability Disclaimer
This dataset is provided for research and educational purposes only.
It does not constitute legal advice, nor does it represent an official or authoritative interpretation of Regulation (EU) 2024/1689 (EU AI Act).
The content is synthetically generated and may contain errors, omissions, or hallucinations.
Under no circumstances should this dataset be used as a basis for legal, compliance, or regulatory decision-making.
The authors disclaim any liability for damages arising from the use of this dataset.
This dataset is not intended for use in high-risk AI systems as defined under the EU AI Act.
No personal data is included in this dataset.
---
# ⚖️ Spanish Instruction Dataset based on the EU AI Act (7.3k)
This dataset contains **7,300 synthetic instruction-response pairs** based on **Regulation (EU) 2024/1689 (EU AI Act)** in Spanish.
## 📝 Dataset Description
The data was generated using large language models (Synthetic Data Generation) to transform the legal text into an instruction-following format suitable for **Supervised Fine-Tuning (SFT)**.
## 🔍 Data Generation Methodology
The dataset was generated using large language models prompted with structured instructions derived from publicly available legal texts.
## 📌 Key Features
- Format: JSONL with `conversations` format (`role`: user/assistant, `content`)
- Language: Spanish (ES)
- Size: 7,300 examples
- Topics:
- High-risk AI systems
- Prohibited practices
- Governance
- Transparency obligations
## ⚠️ Important Disclaimer
This is a **synthetically generated dataset** and has not been fully audited by legal experts.
- Some responses may contain inaccuracies or hallucinations
- Intended for research and experimental purposes only
- Recommended to combine with **RAG (Retrieval-Augmented Generation)** for factual use cases
## 🛠️ Training Reference
Used to fine-tune a **Gemma 4 E4B** model on an **NVIDIA RTX 5080**, achieving stable convergence (Final Train Loss: 1.08)
## 📜 License
Distributed under **Apache License 2.0**
提供机构:
hugoramallo



