code-energie
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/louisbrulenaudet/code-energie
下载链接
链接失效反馈官方服务:
资源简介:
# Code de l'énergie, non-instruct (2025-09-20)
The objective of this project is to provide researchers, professionals and law students with simplified, up-to-date access to all French legal texts, enriched with a wealth of data to facilitate their integration into Community and European projects.
Normally, the data is refreshed daily on all legal codes, and aims to simplify the production of training sets and labeling pipelines for the development of free, open-source language models based on open data accessible to all.
## Concurrent reading of the LegalKit
[<img src="https://raw.githubusercontent.com/louisbrulenaudet/ragoon/main/assets/badge.svg" alt="Built with RAGoon" width="200" height="32"/>](https://github.com/louisbrulenaudet/ragoon)
To use all the legal data published on LegalKit, you can use RAGoon:
```bash
pip3 install ragoon
```
Then, you can load multiple datasets using this code snippet:
```python
# -*- coding: utf-8 -*-
from ragoon import load_datasets
req = [
"louisbrulenaudet/code-artisanat",
"louisbrulenaudet/code-action-sociale-familles",
# ...
]
datasets_list = load_datasets(
req=req,
streaming=False
)
dataset = datasets.concatenate_datasets(
datasets_list
)
```
### Data Structure for Article Information
This section provides a detailed overview of the elements contained within the `item` dictionary. Each key represents a specific attribute of the legal article, with its associated value providing detailed information.
1. **Basic Information**
- `ref` (string): **Reference** - A reference to the article, combining the title_main and the article `number` (e.g., "Code Général des Impôts, art. 123").
- `texte` (string): **Text Content** - The textual content of the article.
- `dateDebut` (string): **Start Date** - The date when the article came into effect.
- `dateFin` (string): **End Date** - The date when the article was terminated or superseded.
- `num` (string): **Article Number** - The number assigned to the article.
- `id` (string): **Article ID** - Unique identifier for the article.
- `cid` (string): **Chronical ID** - Chronical identifier for the article.
- `type` (string): **Type** - The type or classification of the document (e.g., "AUTONOME").
- `etat` (string): **Legal Status** - The current legal status of the article (e.g., "MODIFIE_MORT_NE").
2. **Content and Notes**
- `nota` (string): **Notes** - Additional notes or remarks associated with the article.
- `version_article` (string): **Article Version** - The version number of the article.
- `ordre` (integer): **Order Number** - A numerical value used to sort articles within their parent section.
3. **Additional Metadata**
- `conditionDiffere` (string): **Deferred Condition** - Specific conditions related to collective agreements.
- `infosComplementaires` (string): **Additional Information** - Extra information pertinent to the article.
- `surtitre` (string): **Subtitle** - A subtitle or additional title information related to collective agreements.
- `nature` (string): **Nature** - The nature or category of the document (e.g., "Article").
- `texteHtml` (string): **HTML Content** - The article's content in HTML format.
4. **Versioning and Extensions**
- `dateFinExtension` (string): **End Date of Extension** - The end date if the article has an extension.
- `versionPrecedente` (string): **Previous Version** - Identifier for the previous version of the article.
- `refInjection` (string): **Injection Reference** - Technical reference to identify the date of injection.
- `idTexte` (string): **Text ID** - Identifier for the legal text to which the article belongs.
- `idTechInjection` (string): **Technical Injection ID** - Technical identifier for the injected element.
5. **Origin and Relationships**
- `origine` (string): **Origin** - The origin of the document (e.g., "LEGI").
- `dateDebutExtension` (string): **Start Date of Extension** - The start date if the article has an extension.
- `idEliAlias` (string): **ELI Alias** - Alias for the European Legislation Identifier (ELI).
- `cidTexte` (string): **Text Chronical ID** - Chronical identifier of the text.
6. **Hierarchical Relationships**
- `sectionParentId` (string): **Parent Section ID** - Technical identifier of the parent section.
- `multipleVersions` (boolean): **Multiple Versions** - Indicates if the article has multiple versions.
- `comporteLiensSP` (boolean): **Contains Public Service Links** - Indicates if the article contains links to public services.
- `sectionParentTitre` (string): **Parent Section Title** - Title of the parent section (e.g., "I : Revenu imposable").
- `infosRestructurationBranche` (string): **Branch Restructuring Information** - Information about branch restructuring.
- `idEli` (string): **ELI ID** - European Legislation Identifier (ELI) for the article.
- `sectionParentCid` (string): **Parent Section Chronical ID** - Chronical identifier of the parent section.
7. **Additional Content and History**
- `numeroBo` (string): **Official Bulletin Number** - Number of the official bulletin where the article was published.
- `infosRestructurationBrancheHtml` (string): **Branch Restructuring Information (HTML)** - Branch restructuring information in HTML format.
- `historique` (string): **History** - Historical context or changes specific to collective agreements.
- `infosComplementairesHtml` (string): **Additional Information (HTML)** - Additional information in HTML format.
- `renvoi` (string): **Reference** - References to content within the article (e.g., "(1)").
- `fullSectionsTitre` (string): **Full Section Titles** - Concatenation of all titles in the parent chain.
- `notaHtml` (string): **Notes (HTML)** - Additional notes or remarks in HTML format.
- `inap` (string): **INAP** - A placeholder for INAP-specific information.
## Feedback
If you have any feedback, please reach out at [louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).
# 《能源法典》非指令式(non-instruct)数据集(2025-09-20)
本项目旨在为研究人员、行业从业者及法学专业学生提供便捷且实时更新的法国全部法律文本获取渠道,同时辅以丰富数据辅助其将文本融入欧盟共同体及欧盟相关项目中。
该数据集每日针对所有法典进行更新,旨在简化训练集制作与标注流程,助力基于公开可获取数据开发免费开源大语言模型(Large Language Model,LLM)。
## LegalKit 配套使用工具
[<img src="https://raw.githubusercontent.com/louisbrulenaudet/ragoon/main/assets/badge.svg" alt="基于RAGoon构建" width="200" height="32"/>](https://github.com/louisbrulenaudet/ragoon)
若需使用LegalKit发布的全部法律数据,可借助RAGoon工具:
bash
pip3 install ragoon
随后可通过以下代码片段加载多个数据集:
python
# -*- coding: utf-8 -*-
from ragoon import load_datasets
req = [
"louisbrulenaudet/code-artisanat",
"louisbrulenaudet/code-action-sociale-familles",
# ...
]
datasets_list = load_datasets(
req=req,
streaming=False
)
dataset = datasets.concatenate_datasets(
datasets_list
)
### 法律条文信息数据结构
本节详细说明`item`字典中包含的各字段信息,每个键对应法律条文的一项专属属性,其关联值则提供该属性的详细说明。
1. **基础信息**
- `ref`(字符串类型):**条文编号** - 由主法典名称与条文编号组合而成的条文引用标识(示例:“《通用税务法典》第123条”)
- `texte`(字符串类型):**条文文本** - 法律条文的具体内容
- `dateDebut`(字符串类型):**生效日期** - 该条文正式施行的日期
- `dateFin`(字符串类型):**失效日期** - 该条文被废止或取代的日期
- `num`(字符串类型):**条文序号** - 分配给该条文的编号
- `id`(字符串类型):**条文唯一标识** - 该条文的唯一识别编号
- `cid`(字符串类型):**时序标识** - 该条文的时序类识别编号
- `type`(字符串类型):**文档类型** - 该条文所属的文档分类(示例:“AUTONOME”)
- `etat`(字符串类型):**法律状态** - 该条文当前的法律状态(示例:“MODIFIE_MORT_NE”)
2. **内容与注释**
- `nota`(字符串类型):**注释内容** - 与该条文关联的附加说明或备注
- `version_article`(字符串类型):**条文版本号** - 该条文的版本编号
- `ordre`(整数类型):**排序序号** - 用于在其父级章节内对条文进行排序的数值
3. **附加元数据**
- `conditionDiffere`(字符串类型):**延迟生效条件** - 与集体协议相关的特定生效条件
- `infosComplementaires`(字符串类型):**附加信息** - 与该条文相关的额外补充内容
- `surtitre`(字符串类型):**子标题** - 与集体协议相关的子标题或附加标题信息
- `nature`(字符串类型):**文档属性** - 该文档的属性或分类(示例:“Article”)
- `texteHtml`(字符串类型):**HTML格式文本** - 以HTML格式呈现的条文内容
4. **版本与扩展**
- `dateFinExtension`(字符串类型):**扩展生效截止日期** - 该条文扩展适用的终止日期
- `versionPrecedente`(字符串类型):**前序版本标识** - 该条文前一版本的识别编号
- `refInjection`(字符串类型):**注入引用标识** - 用于识别数据注入日期的技术引用
- `idTexte`(字符串类型):**所属文本标识** - 该条文所属法律文本的识别编号
- `idTechInjection`(字符串类型):**注入技术标识** - 用于注入元素的技术识别编号
5. **来源与关联关系**
- `origine`(字符串类型):**文档来源** - 该文档的原始出处(示例:“LEGI”)
- `dateDebutExtension`(字符串类型):**扩展生效起始日期** - 该条文扩展适用的起始日期
- `idEliAlias`(字符串类型):**ELI别名** - 欧洲立法标识符(European Legislation Identifier,ELI)的别名
- `cidTexte`(字符串类型):**文本时序标识** - 该法律文本的时序类识别编号
6. **层级关联关系**
- `sectionParentId`(字符串类型):**父章节标识** - 该条文所属父章节的技术识别编号
- `multipleVersions`(布尔类型):**多版本标识** - 标识该条文是否存在多个版本
- `comporteLiensSP`(布尔类型):**含公共服务链接标识** - 标识该条文是否包含公共服务相关链接
- `sectionParentTitre`(字符串类型):**父章节标题** - 该条文所属父章节的标题(示例:“第一部分:应纳税所得额”)
- `infosRestructurationBranche`(字符串类型):**行业重组信息** - 与行业重组相关的信息
- `idEli`(字符串类型):**ELI标识** - 该条文的欧洲立法标识符(European Legislation Identifier,ELI)
- `sectionParentCid`(字符串类型):**父章节时序标识** - 该条文所属父章节的时序类识别编号
7. **附加内容与历史记录**
- `numeroBo`(字符串类型):**官方公报编号** - 该条文刊登所在官方公报的编号
- `infosRestructurationBrancheHtml`(字符串类型):**行业重组信息(HTML格式)** - 以HTML格式呈现的行业重组相关信息
- `historique`(字符串类型):**历史记录** - 与集体协议相关的历史背景或变更内容
- `infosComplementairesHtml`(字符串类型):**附加信息(HTML格式)** - 以HTML格式呈现的额外补充内容
- `renvoi`(字符串类型):**内部引用** - 该条文内部的内容引用标识(示例:“(1)”)
- `fullSectionsTitre`(字符串类型):**完整层级标题** - 父层级所有章节标题的拼接结果
- `notaHtml`(字符串类型):**注释内容(HTML格式)** - 以HTML格式呈现的附加说明或备注
- `inap`(字符串类型):**INAP占位字段** - 用于存储INAP相关专属信息的占位字段
## 反馈
若您有任何反馈意见,请发送至[louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).
提供机构:
maas
创建时间:
2025-10-13



