Sharing transdisciplinary semantics in computational science: an overview on Semantic Array Programming
收藏DataCite Commons2021-09-28 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Sharing_transdisciplinary_semantics_in_computational_science_an_overview_on_Semantic_Array_Programming/3472661/2
下载链接
链接失效反馈官方服务:
资源简介:
From: http://mfkp.org/INRMM/article/13769492 Please, cite as part of: de Rigo, D., 2015. <b>Study of a collaborative repository of semantic metadata and models for regional environmental datasets' multivariate transformations</b>. Ph.D. thesis, Politecnico di Milano, Milano, Italy. This is a simplified summary of some core concepts discussed in the aforementioned work. Please, refer to it for any detail. <b>Logics vs. implementation of computational science algorithms: the case of transdisciplinary research. </b>A common experience among computational scientists is to codify even short algorithms – if no out-of-the-box solutions are available – with remarkably longer implementations. Computational science algorithms not rarely deal with large amounts of data with a precise (despite sometime nontrivial) semantic structure. If so, data may be organised in multiple groups with homogeneous semantics. Examples are matrices, time series, tuples, graphs or more generic multi-dimensional arrays. Geospatial problems often associate geographic information to particular arrays – e.g. spatial regular grids of data represented as georeferenced matrices. Domain specific frameworks may offer a convenient option for standard problems within a given sectoral domain, while object oriented approaches may easily support structured information to be more broadly transferred along with default behaviours/assumptions. However, this communication is more difficult to achieve for non-monolithic models using several programming languages and tools, with multiple teams involved and possibly no single expert able to cope with the overall integration complexity. The extent of this challenge might easily grow more than linearly with the growing transdisciplinary knowledge cumulated by experienced and diverse research teams. This is a reality sometime underestimated even from a research management perspective, with potentially deplorable consequences where a fragile integration modelling strategy proves unable to scale and fulfil the accelerating needs of a healthy transdisciplinary research endeavour. Ultimately, failing to realise on time the importance of scalable integration in transdisciplinary research – and its impact on the design of specific sub-models – might prevent a slowly and laboriously built research endeavour to flourish to its full (and <i>demanding</i>) potential. <b>Exploiting data-transformation models. </b>A handy formulation of models as data-transformation models (D-TM) is able to ease the integration of the various conceptual modelling-units even when they are implemented in different programming languages. This is straightforward to achieve if these D-TM units exclusively exchange data (extended to include parameters), with broadly supported formats. Data can be exchanged asynchronously also between D-TMs which physically run in different computational facilities. Therefore, D-TM modelling architectures ease the interaction among multiple research teams. If intermediate data transformations are preserved, reproducible research is simplified even after years. Digital preservation of intermediate/final data layers might be strengthened by e.g. publishing them as open data in established open repositories.<br>However, transdisciplinary research may often incur in a different family of semantic issues. <br> <b>The evaporation of domain-specific common sense at the interface with other disciplines. </b>Within a particular discipline, research team, or specialised modelling approach, a significant part of the overall semantics of data and data-transformation models may be taken for granted. This means that a core base of knowledge might safely remain unexpressed. It is the “obvious” local context. Unfortunately, this is no more the case whenever that particular domain of knowledge has to interact with other domains, perhaps quite far from it. Namely, when a set of practices and knowledge – shared by a certain research community – has to be relativized from universal set of the research activities up to become a simple specialized module within a transdisciplinary context, the local-context common sense evaporates. As a consequence, it should be communicated between local-contexts, different expertise and disciplines in a simple way – but also a compact and unambiguous one. <b>Being understood by busy others: free scientific software, documentation and verbosity</b>. Publishing D-TMs as well-documented free software might be the first logical step for the "actual scholarship" (full software environment, code and data, following Jon Claerbout, J.B. Buckheit and D.L. Donoho) to be exposed as <b>open science</b>. Alas, public availability of the D-TM source code alone - without high quality documentation of it <i>within</i> and outside the code - may not be enough for the logics defining the D-TM to be cogently communicated among different disciplines. Source code may often be extremely verbose compared with its logics. Verbosity might be a cognitive noise in itself, reducing the chances for an otherwise promising code to be fully understood by others than the authors… <b>Terse implementation of models: the potential of array programming.</b> Array Programming (AP) might be helpful to support part of this challenge. AP originated for reducing the gap between mathematical formulation and code implementation by introducing very concise operators and coding patterns to deal with variables potentially composed by billions of elements and considered as atomic (with correspondingly terse manipulation). AP data structures can offer a support <b>1) </b><b>▹</b> already widespread (given the extensive use of AP languages), and <b>2) </b><b>▹</b> less arbitrary/restrictive than a particular choice (within a virtually infinite set of possibilities) of objects to share among multiple and highly heterogeneous modules. <br>However, this support is still poorly exploited. The AP data structures are very general: multi-dimensional arrays where the value of some elements may be infinite or not-a-number (IEEE 754 standard) or even complex-valued. <b>Adding semantics and modularisation to array programming</b>. From this potentially overwhelming generality it follows the basic idea of Semantic Array Programming (SemAP): limiting this generality with array-based semantic constraints. The second key idea of SemAP is to encourage modularisation of data-transformations so as to easily propagate the semantic support to lower-level sub-D-TMs – which might prove helpful even to better explore software uncertainty. Based on the mathematics of arrays, the semantics of the SemAP constraints is inherently portable. A rich set of semantic constraints is natively implemented in GNU Octave/MATLAB by the Mastrave modelling library and is easily accessed in other programming languages via multi-language array programming bridges (e.g. in Python and GNU R). Therefore, the SemAP <i>set of array-based semantic constraints is language neutral</i>. The sole assumption regarding the data-types to check is that an array programming language should offer as basic array-types at least one of the following very general categories: ‣ multi-dimensional arrays of numbers (booleans, integers, reals or complex-valued) or characters; ‣ sets of uneven arrays, indexed either with positive integers (e.g. cell-arrays in GNU Octave/MATLAB) or with strings (e.g. structures in GNU Octave/MATLAB). ‣ Functional programming types are supported by extending basic data to include anonymous functions and function handles. ‣ Numbers are extended with the infinite and the concept of "not a number" denoting undefined (e.g. 1/0) or missing data. This follows the standard IEEE 754 for floating-point arithmetic. <br>Semantics may clearly be characterised by multiple dimensions. The mentioned set of array-based semantic constraints constitutes a neutral, portable dimension of semantics grounded on the mathematics of arrays. Aside from the actual implementation of algorithms, these semantic constraints are quite compact and in most cases easy to understand even without accessing their formal definition. Among the many other dimensions of semantics, the role of geospatial semantics has also been closely integrated with SemAP, by illustrating the SemAP application to geospatial problems (Geospatial Semantic Array Programming, GeoSemAP).
提供机构:
figshare
创建时间:
2016-07-08



