Annotated Dataset for Named Entity Recognition and Relation Extraction in French Building Technical Specifications (BTS)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13996908
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 233 raw requirements extracted from French Building Technical Specifications (BTS), referred to as "Cahier des Clauses Techniques Particulières (CCTP)", specifically focused on carpentry ("lot menuiserie") in public French construction projects. The requirements have been collected from 72 CCTP documents, resulting in a total of 19,725 sentences and 651,948 words.
The dataset has been annotated using Doccano for Named Entity Recognition (NER) and Relation Extraction (RE). The annotations involve identifying entities and the relationships between them within the domain of building requirements. This dataset is intended for research on Natural Language Processing (NLP) models for Requirements Engineering (RE) in the Architecture, Engineering, and Construction (AEC) sector. Potential applications include requirements extraction, compliance analysis, and knowledge management in construction.
The dataset includes the following components:
CCTP Documents: The original CCTP files from which the raw requirements were extracted.
Annotated Dataset: A JSONLines file containing the annotated dataset, including labels for Named Entity Recognition (NER) and Relation Extraction (RE).
Key features of the dataset:
Language: French
Number of requirements: 233
Number of sentences: 19,725
Number of words: 651,948
Annotation tasks: Named Entity Recognition (NER) and Relation Extraction (RE)
This dataset is relevant for NLP research focused on structured information extraction from domain-specific texts in the construction industry.
创建时间:
2024-11-28



