code-artisanat
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/louisbrulenaudet/code-artisanat
下载链接
链接失效反馈官方服务:
资源简介:
# Code de l'artisanat, non-instruct (2025-09-20)
The objective of this project is to provide researchers, professionals and law students with simplified, up-to-date access to all French legal texts, enriched with a wealth of data to facilitate their integration into Community and European projects.
Normally, the data is refreshed daily on all legal codes, and aims to simplify the production of training sets and labeling pipelines for the development of free, open-source language models based on open data accessible to all.
## Concurrent reading of the LegalKit
[<img src="https://raw.githubusercontent.com/louisbrulenaudet/ragoon/main/assets/badge.svg" alt="Built with RAGoon" width="200" height="32"/>](https://github.com/louisbrulenaudet/ragoon)
To use all the legal data published on LegalKit, you can use RAGoon:
```bash
pip3 install ragoon
```
Then, you can load multiple datasets using this code snippet:
```python
# -*- coding: utf-8 -*-
from ragoon import load_datasets
req = [
"louisbrulenaudet/code-artisanat",
"louisbrulenaudet/code-action-sociale-familles",
# ...
]
datasets_list = load_datasets(
req=req,
streaming=False
)
dataset = datasets.concatenate_datasets(
datasets_list
)
```
### Data Structure for Article Information
This section provides a detailed overview of the elements contained within the `item` dictionary. Each key represents a specific attribute of the legal article, with its associated value providing detailed information.
1. **Basic Information**
- `ref` (string): **Reference** - A reference to the article, combining the title_main and the article `number` (e.g., "Code Général des Impôts, art. 123").
- `texte` (string): **Text Content** - The textual content of the article.
- `dateDebut` (string): **Start Date** - The date when the article came into effect.
- `dateFin` (string): **End Date** - The date when the article was terminated or superseded.
- `num` (string): **Article Number** - The number assigned to the article.
- `id` (string): **Article ID** - Unique identifier for the article.
- `cid` (string): **Chronical ID** - Chronical identifier for the article.
- `type` (string): **Type** - The type or classification of the document (e.g., "AUTONOME").
- `etat` (string): **Legal Status** - The current legal status of the article (e.g., "MODIFIE_MORT_NE").
2. **Content and Notes**
- `nota` (string): **Notes** - Additional notes or remarks associated with the article.
- `version_article` (string): **Article Version** - The version number of the article.
- `ordre` (integer): **Order Number** - A numerical value used to sort articles within their parent section.
3. **Additional Metadata**
- `conditionDiffere` (string): **Deferred Condition** - Specific conditions related to collective agreements.
- `infosComplementaires` (string): **Additional Information** - Extra information pertinent to the article.
- `surtitre` (string): **Subtitle** - A subtitle or additional title information related to collective agreements.
- `nature` (string): **Nature** - The nature or category of the document (e.g., "Article").
- `texteHtml` (string): **HTML Content** - The article's content in HTML format.
4. **Versioning and Extensions**
- `dateFinExtension` (string): **End Date of Extension** - The end date if the article has an extension.
- `versionPrecedente` (string): **Previous Version** - Identifier for the previous version of the article.
- `refInjection` (string): **Injection Reference** - Technical reference to identify the date of injection.
- `idTexte` (string): **Text ID** - Identifier for the legal text to which the article belongs.
- `idTechInjection` (string): **Technical Injection ID** - Technical identifier for the injected element.
5. **Origin and Relationships**
- `origine` (string): **Origin** - The origin of the document (e.g., "LEGI").
- `dateDebutExtension` (string): **Start Date of Extension** - The start date if the article has an extension.
- `idEliAlias` (string): **ELI Alias** - Alias for the European Legislation Identifier (ELI).
- `cidTexte` (string): **Text Chronical ID** - Chronical identifier of the text.
6. **Hierarchical Relationships**
- `sectionParentId` (string): **Parent Section ID** - Technical identifier of the parent section.
- `multipleVersions` (boolean): **Multiple Versions** - Indicates if the article has multiple versions.
- `comporteLiensSP` (boolean): **Contains Public Service Links** - Indicates if the article contains links to public services.
- `sectionParentTitre` (string): **Parent Section Title** - Title of the parent section (e.g., "I : Revenu imposable").
- `infosRestructurationBranche` (string): **Branch Restructuring Information** - Information about branch restructuring.
- `idEli` (string): **ELI ID** - European Legislation Identifier (ELI) for the article.
- `sectionParentCid` (string): **Parent Section Chronical ID** - Chronical identifier of the parent section.
7. **Additional Content and History**
- `numeroBo` (string): **Official Bulletin Number** - Number of the official bulletin where the article was published.
- `infosRestructurationBrancheHtml` (string): **Branch Restructuring Information (HTML)** - Branch restructuring information in HTML format.
- `historique` (string): **History** - Historical context or changes specific to collective agreements.
- `infosComplementairesHtml` (string): **Additional Information (HTML)** - Additional information in HTML format.
- `renvoi` (string): **Reference** - References to content within the article (e.g., "(1)").
- `fullSectionsTitre` (string): **Full Section Titles** - Concatenation of all titles in the parent chain.
- `notaHtml` (string): **Notes (HTML)** - Additional notes or remarks in HTML format.
- `inap` (string): **INAP** - A placeholder for INAP-specific information.
## Feedback
If you have any feedback, please reach out at [louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).
# 《手工业法典(非指令版)》(2025-09-20)
本项目旨在为研究人员、法务从业者及法学专业学生提供简化且实时更新的法国全部法律文本获取渠道,同时附加丰富数据以助力其融入欧盟及欧洲共同体相关项目。
该数据集通常每日针对所有法律法典进行更新,旨在简化基于所有用户均可访问的开源数据开发免费开源大语言模型(Large Language Model)时的训练集制作与标注流程。
## 同步查阅LegalKit
[](https://github.com/louisbrulenaudet/ragoon)
若要使用LegalKit发布的全部法律数据,可借助RAGoon工具:
bash
pip3 install ragoon
随后可通过以下代码片段加载多个数据集:
python
# -*- coding: utf-8 -*-
from ragoon import load_datasets
# 定义需加载的数据集列表
req = [
"louisbrulenaudet/code-artisanat",
"louisbrulenaudet/code-action-sociale-familles",
# ...
]
# 加载指定数据集
datasets_list = load_datasets(
req=req,
streaming=False
)
# 合并所有加载的数据集
dataset = datasets.concatenate_datasets(
datasets_list
)
### 法律条文条目数据结构
本节将详细说明`item`字典中包含的各类元素:每个键对应法律条文的一项专属属性,其关联值则提供该属性的详细信息。
1. **基础信息**
- `ref`(字符串):**条文标识(Reference)**—— 条文的唯一标识,由主法典名称与条文编号组合而成(例如:"税收总法典,第123条")。
- `texte`(字符串):**文本内容(Text Content)**—— 法律条文的正文内容。
- `dateDebut`(字符串):**生效日期(Start Date)**—— 该条文正式生效的日期。
- `dateFin`(字符串):**失效日期(End Date)**—— 该条文被废止或取代的日期。
- `num`(字符串):**条文编号(Article Number)**—— 分配给该条文的编号。
- `id`(字符串):**条文ID(Article ID)**—— 该条文的唯一标识符。
- `cid`(字符串):**时序ID(Chronical ID)**—— 该条文的时序唯一标识符。
- `type`(字符串):**文档类型(Type)**—— 文档的类型或分类(例如:"AUTONOME")。
- `etat`(字符串):**法律状态(Legal Status)**—— 该条文当前的法律状态(例如:"MODIFIE_MORT_NE")。
2. **条文内容与注释**
- `nota`(字符串):**注释(Notes)**—— 与该条文关联的附加说明或备注。
- `version_article`(字符串):**条文版本(Article Version)**—— 该条文的版本号。
- `ordre`(整数):**排序序号(Order Number)**—— 用于对所属章节内的条文进行排序的数值。
3. **附加元数据**
- `conditionDiffere`(字符串):**递延条件(Deferred Condition)**—— 与集体协议相关的特定条件。
- `infosComplementaires`(字符串):**附加信息(Additional Information)**—— 与该条文相关的额外信息。
- `surtitre`(字符串):**副标题(Subtitle)**—— 与集体协议相关的副标题或附加标题信息。
- `nature`(字符串):**文档属性(Nature)**—— 文档的属性或分类(例如:"Article")。
- `texteHtml`(字符串):**HTML格式文本(HTML Content)**—— 该条文的HTML格式内容。
4. **版本管理与延期**
- `dateFinExtension`(字符串):**延期终止日期(End Date of Extension)**—— 若该条文存在延期,则为延期的终止日期。
- `versionPrecedente`(字符串):**前序版本(Previous Version)**—— 该条文前一版本的标识符。
- `refInjection`(字符串):**注入参考(Injection Reference)**—— 用于标识注入日期的技术参考。
- `idTexte`(字符串):**法律文本ID(Text ID)**—— 该条文所属法律文本的标识符。
- `idTechInjection`(字符串):**注入技术ID(Technical Injection ID)**—— 注入元素的技术标识符。
5. **来源与关联关系**
- `origine`(字符串):**文档来源(Origin)**—— 该文档的来源渠道(例如:"LEGI")。
- `dateDebutExtension`(字符串):**延期起始日期(Start Date of Extension)**—— 若该条文存在延期,则为延期的起始日期。
- `idEliAlias`(字符串):**ELI别名(ELI Alias)**—— 欧洲立法标识符(European Legislation Identifier, ELI)的别名。
- `cidTexte`(字符串):**法律文本时序ID(Text Chronical ID)**—— 该法律文本的时序标识符。
6. **层级关联关系**
- `sectionParentId`(字符串):**父章节ID(Parent Section ID)**—— 父章节的技术标识符。
- `multipleVersions`(布尔值):**多版本标识(Multiple Versions)**—— 标识该条文是否存在多个版本。
- `comporteLiensSP`(布尔值):**包含公共服务链接(Contains Public Service Links)**—— 标识该条文是否包含公共服务相关链接。
- `sectionParentTitre`(字符串):**父章节标题(Parent Section Title)**—— 父章节的标题(例如:"I:应纳税所得额")。
- `infosRestructurationBranche`(字符串):**行业重组信息(Branch Restructuring Information)**—— 与行业重组相关的信息。
- `idEli`(字符串):**ELI ID(ELI ID)**—— 该条文的欧洲立法标识符(European Legislation Identifier, ELI)。
- `sectionParentCid`(字符串):**父章节时序ID(Parent Section Chronical ID)**—— 父章节的时序标识符。
7. **附加内容与历史沿革**
- `numeroBo`(字符串):**官方公报编号(Official Bulletin Number)**—— 该条文发布所在的官方公报编号。
- `infosRestructurationBrancheHtml`(字符串):**行业重组信息(HTML格式)(Branch Restructuring Information (HTML))**—— HTML格式的行业重组相关信息。
- `historique`(字符串):**历史沿革(History)**—— 与集体协议相关的历史背景或变更记录。
- `infosComplementairesHtml`(字符串):**附加信息(HTML格式)(Additional Information (HTML))**—— HTML格式的附加信息。
- `renvoi`(字符串):**内部引用(Reference)**—— 条文内部的内容引用(例如:"(1)")。
- `fullSectionsTitre`(字符串):**完整层级标题(Full Section Titles)**—— 父层级所有标题的拼接字符串。
- `notaHtml`(字符串):**HTML格式注释(Notes (HTML))**—— HTML格式的附加说明或备注。
- `inap`(字符串):**INAP字段(INAP)**—— 用于存放INAP专属信息的占位字段。
## 反馈
若您有任何意见或建议,请发送邮件至[louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com)。
提供机构:
maas
创建时间:
2025-10-13



