Requirements data sets (user stories)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/records/13880060
下载链接
链接失效反馈官方服务:
资源简介:
A collection of 22 data set of 50+ requirements each, expressed as user stories.
The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]
The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light
This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1
Overview of the datasets [data and links added in December 2024]
The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.
Public administration and transparency
g02-federalspending.txt (2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.
g03-loudoun.txt (2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.
g04-recycling.txt(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).
g05-openspending.txt (2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.
g11-nsf.txt (2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.
(Research) data and meta-data management
g08-frictionless.txt (2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.
g14-datahub.txt (2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.
g16-mis.txt (2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.
g17-cask.txt (2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.
g18-neurohub.txt (2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.
g22-rdadmp.txt (2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.
g23-archivesspace.txt (2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, andborn digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its development is hosted in GitHub (Educational Community License, Version 2.0), with existing issues in a board, but the dataset only includes older user stories (not available any more in the board) between August 28, 2012 (the starting date of the community) until February 28, 2013.
g24-unibath.txt (2013) concerns the development of an institutional data repository for the University of Bath. This need was driven by changes in funder and publisher policy, as well as responses from the recent Research360 data management survey sent out to all University of Bath researchers. The purpose of this would be to provide a long-term archive of our research data, with the following benefits: Ensure long-term availability of data to our researchers; fulfil funder and publisher requirements; enable and track increased impact of our research through data re-use and citation by the wider community; encourage new collaborations and deepen existing relationships with industry; enable new types of research, both within the university and the wider sector. The requirements were identified in 2013 and the original document can be found online.
g25-duraspace.txt (2012) is a collection that originates from the development of the Data Dictionary Supplement component of the Data asset management system (DAMS) by DuraSpace (original document). DuraSpace, or DSpace, is an active project that can be found online and is developed as an open source project (BSD 3.0). DSpace stores, preserves and disseminates digital cultural heritage content by supporting ingestion of digital objects and their metadata; management and curation of digital objects; easy access to the digital objects, by both listing and searching; long-term preservation of the digital objects.
g26-racdam.txt (2015) refers to a collection of requirements that were found in an online Trello board for a so-called RAC DAM system. Very limited information exists about the system, but the stories are organized into multiple epics: rights management, asset management, use, discovery, user management, reporting, curation, description, and preservation. It is possible to infer that the collection refers to an archiving system to be used by archivists and researchers. The online identity of the issue contributors makes us hypothesize this has to do with the Rockefeller Archive Center in New York City.
g27-culrepo.txt (2014-2015) is an extract of a document constructed by RepoExec, a group who manages multiple repositories and the overall repositories policy of the Cornell University Library (CUL). The document was publicly displayed and was accessible on their Confuence website as a PDF attachment. The user stories in the document focus on a subset of the CUL’s institutional repository (IR) systems.
g28-zooniverse.txt (2014) originates from MICO – Media In Context - an EU-funded research project to develop an integrated platform for cross-media analysis, metadata publishing, querying and recommendation. The set of US describe in particular two main showcases within the MICO project, with the first one being Zooniverse, a citizen science platform (US27-59). The second showcase that is described by the user stories is InsideOut10. InsideOut10 is a start-up and consulting firm from Italy with extensive experience on media delivery platforms and web publishing (US1-US26). The projects are based on volunteers who have access to the platform and contribute by classifying data such as images, audio and video by performing recognition tasks that cannot easily be performed by a computer. The data has been extracted from a deliverable of the MICO project that is currently available online.
Information systems for specific domains
g12-camperplus.txt (2017) concerns a software application called Camper+ (GitHub, MIT License), which aims to support camp administrators, camp counselors, and parents in the context of camps organized for children. The collection of user stories is retrieved from the project's wiki, which contains the user stories and organizes the roles into personas.
g19-alfred.txt (2015) describes a set of requirements from a European project called ALFRED (Deliverable 2.3), which is about a system that provides support for older people; a “Personal Interactive Assistant for Independent Living and Active Ageing”. One of the main system objectives is to support older people to actively participate in society and act independently. The outputs of this project led to a GitHub repository (unspecified license).
g21-badcamp.txt (~2017) is a set of user stories that originates from a GitHub repository (GPL 2.0 and unspecified licenses) of the BADCamp event's website, an annual conference that celebrates Drupal open source websites. The website gives general information about the event. Moreover, it is used as a supportive tool for all attendees. All sorts of features have been added over time to support all attendees of the event. While the original user stories are not available, the current backlog of the website is accessible.
Examples: first-version websites
g10-scrumalliance.txt(2004) is a collection taken from a product backlog example that is currently published on the Mountain Goat Software website. As stated on that website, these stories were written to describe the functionality of an early version of the Scrum Alliance website.
g13-planningpoker.txt (2010, estimated via the Internet Archive Wayback Machine) is an example -- like g10-scrumalliance.txt -- of a product backlog that is available on the Mountain Goat Software website. This refers to the first version of the Planning Poker website, which allows estimators in different locations to estimate collaboratively.
创建时间:
2025-01-13



