hugoramallo/legal-ai-act-spanish-sft-7k

Name: hugoramallo/legal-ai-act-spanish-sft-7k
Creator: hugoramallo
Published: 2026-04-10 04:59:05
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/hugoramallo/legal-ai-act-spanish-sft-7k

下载链接

链接失效反馈

官方服务：

资源简介：

--- --- license: apache-2.0 language: - es task_categories: - question-answering - text-generation tags: - legal - ai-act - eu-ai-act - spanish - synthetic - sft - regulatory-ai - llm-finetuning --- ⚠️ Legal and Liability Disclaimer This dataset is provided for research and educational purposes only. It does not constitute legal advice, nor does it represent an official or authoritative interpretation of Regulation (EU) 2024/1689 (EU AI Act). The content is synthetically generated and may contain errors, omissions, or hallucinations. Under no circumstances should this dataset be used as a basis for legal, compliance, or regulatory decision-making. The authors disclaim any liability for damages arising from the use of this dataset. This dataset is not intended for use in high-risk AI systems as defined under the EU AI Act. No personal data is included in this dataset. --- # ⚖️ Spanish Instruction Dataset based on the EU AI Act (7.3k) This dataset contains **7,300 synthetic instruction-response pairs** based on **Regulation (EU) 2024/1689 (EU AI Act)** in Spanish. ## 📝 Dataset Description The data was generated using large language models (Synthetic Data Generation) to transform the legal text into an instruction-following format suitable for **Supervised Fine-Tuning (SFT)**. ## 🔍 Data Generation Methodology The dataset was generated using large language models prompted with structured instructions derived from publicly available legal texts. ## 📌 Key Features - Format: JSONL with `conversations` format (`role`: user/assistant, `content`) - Language: Spanish (ES) - Size: 7,300 examples - Topics: - High-risk AI systems - Prohibited practices - Governance - Transparency obligations ## ⚠️ Important Disclaimer This is a **synthetically generated dataset** and has not been fully audited by legal experts. - Some responses may contain inaccuracies or hallucinations - Intended for research and experimental purposes only - Recommended to combine with **RAG (Retrieval-Augmented Generation)** for factual use cases ## 🛠️ Training Reference Used to fine-tune a **Gemma 4 E4B** model on an **NVIDIA RTX 5080**, achieving stable convergence (Final Train Loss: 1.08) ## 📜 License Distributed under **Apache License 2.0**

提供机构：

hugoramallo

5,000+

优质数据集

54 个

任务类型

进入经典数据集