The FAIR Assessment Conundrum: Reflections on Tools and Metrics - Data Set
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10082195
下载链接
链接失效反馈官方服务:
资源简介:
Data sets accompanying the paper "The FAIR Assessment Conundrum: Reflections on Tools and Metrics", an analysis of a comprehensive set of FAIR assessment tools and the metrics used by these tools for the assessment.
The data set "metrics.csv" consists of the metrics collected from several sources linked to the analysed FAIR assessments tools. It is structured into 11 columns: (i) tool_id, (ii) tool_name, (iii) metric_discarded, (iv) metric_fairness_scope_declared, (v) metric_fairness_scope_observed, (vi) metric_id, (vii) metric_text, (viii) metric_technology, (ix) metric_approach, (x) last_accessed_date, and (xi) provenance.
The columns tool_id and tool_name are used for the identifier we assigned to each tool analysed and the full name of the tool respectively.
The metric_discarded column refers to the selection we operated on the collected metrics, since we excluded the metrics created for testing purposes or written in a language different from English. The possible values are boolean. We assigned TRUE if the metric was discarded.
The columns metric_fairness_scope_declared and metric_fairness_scope_observed are used for indicating the declared intent of the metrics, with respect to the FAIR principle assessed, and the one we observed respectively. Possible values are: (a) a letter of the FAIR acronym (for the metrics without a link declared to a specific FAIR principle), (b) one or more identifiers of the FAIR principles (F1, F2…), (c) n/a, if no FAIR references were declared, or (d) none, if no FAIR references were observed.
The metric_id and metric_text columns are used for the identifiers of the metrics and the textual and human-oriented content of the metrics respectively.
The column metric_technology is used for enumerating the technologies (a term used in its widest acceptation) mentioned or used by the metrics for the specific assessment purpose. Such technologies include very diverse typologies ranging from (meta)data formats to standards, semantic technologies, protocols, and services. For tools implementing automated assessments, the technologies listed take into consideration also the available code and documentation, not just the metric text.
The column metric_approach is used for identifying the type of implementation observed in the assessments. The identification of the implementation types followed a bottom-to-top approach applied to the metrics organised by the metric_fairness_scope_declared values. Consequently, while the labels used for creating the implementation type strings are the same, their combination and specialisation varies based on the characteristics of the actual set of metrics analysed. The main labels used are: (a) 3rd party service-based, (b) documentation-centred, (c) format-centred, (d) generic, (e) identifier-centred, (f) policy-centred, (g) protocol-centred, (h) metadata element-centred, (i) metadata schema-centred, (j) metadata value-centred, (k) service-centred, and (l) na.
The columns provenance and last_accessed_date are used for the main source of information about each metric (at least with regard to the text) and the date we last accessed it respectively.
The data set "classified_technologies.csv" consists of the technologies mentioned or used by the metrics for the specific assessment purpose. It is structured into 3 columns: (i) technology, (ii) class, and (iii) discarded.
The column technology is used for the names of the different technologies mentioned or used by the metrics.
The column class is used for specifying the type of technology used. Possible values are: (a) application programming interface, (b) format, (c) identifier, (d) library, (e) licence, (f) protocol, (g) query language, (h) registry, (i) repository, (j) search engine, (k) semantic artefact, and (l) service.
The discarded column refers to the exclusion of the value 'linked data' from the accepted technologies since it is too generic. The possible values are boolean. We assigned TRUE if the technology was discarded.
创建时间:
2024-04-17



