Engineered dataset for Cross-Project Requirement Traceability in Natural Language Artefacts
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/rmdxf6g7pg
下载链接
链接失效反馈官方服务:
资源简介:
The compiled dataset for cross-project requirement traceability by leveraging contrastive learning techniques on natural language artefacts contains 15,872 total requirements across 37 projects and 7,624 validated cross-project links, with multiple Excel sheets for different data views. Data sources replicated from include: (1) Open source repositories (25 projects); (2) An Industrial dataset (12 proprietary projects) with 3 industry partners with 20-35 requirements per project; and (3) Benchmark Datasets- (a) PURE: 79 smaller research projects (5-15 requirements each) and PROMISE NFR: 15 projects focused on non-functional requirements (40 requirements each) with comprehensive NFR coverage spread across the categories: performance, security, usability, reliability, scalability, and maintainability.
The implemented dataset evaluates traceability links across different projects, thereby contributing to both software engineering and natural language processing domains by establishing a more robust approach to cross-project traceability that can support knowledge transfer and reuse across software projects.
Features of the dataset:
Multiple data sheets in the Excel file for detailed analysis of requirements.
Cross-project relationships with confidence scoring and validation status
Temporal data with creation dates and project timelines
Multi-dimensional classification (functional/non-functional, priority, complexity)
Stakeholder attribution and tagging system
创建时间:
2025-10-02



