five

Understanding the Software Needs of High End Computer Users with XALT

收藏
DataCite Commons2023-01-10 更新2024-07-13 收录
下载链接:
http://hdl.handle.net/2152/30535
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset is produced by the software XALT, installed on the High Performance Computing (HPC) resource Stampede at the Texas Advanced Computing Center (TACC). XALT tracks and collects job-level information about software libraries and executables on open-science HPC systems, also known as supercomputers. Open science HPC resources are shared via powerful networks by researchers across the country, and are maintained by a handful of supercomputer centers. To use the computations resources, researchers submit jobs, which consist of computational workflows designed to conduct analysis and calculations. The XALT data is used to determine the software libraries that are most often utilized in a given system, a fundamental administrative function for shared HPC resources. Since nodes/memory are finite resources, software libraries must be selected for continued use and maintenance to ensure optimal performance for users. In addition to running on Stampede, XALT software has been tested or installed at The National Institute for Computer Sciences, Oak Ridge Leadership Computing Facility, The National Center for Supercomputing Applications, Baden-Württemberg, The National Energy Research Scientific Computing Center, The Swiss National Supercomputing Centre, The National Oceanic and Atmospheric Administration, and KAUST Supercomputing Centre. Other current uses of the XALT data include debugging software libraries, indirect measurements of performance, and cost analysis based on the time and number of nodes in use. Sociologists, digital anthropologists and scientific software producers have identified possible additional uses for this data such as inferring collaborations, types of relationships and practices of domain scientists working on computational projects. XALT may also be used to gather provenance metadata during computational jobs. Provenance information for the xalt dataset entails the software, associated libraries, and usage metrics that show the initial stage of computational analysis for scientific work. The XALT dataset, in JSON format, contains information on the number of nodes and the libraries and executables used by each user running a given computational job on Stampede. It also includes the science domain that the users identify with for their projects. As part of the publication process, personal identification information is sanitized prior to publication, but all jobs can be related to a particular user through an anonymous user id. This dataset will continue to grow past the initial date of publication. TACC started releasing data in September of 2015. The daily collections of data release as quarterly collections of three files, one for each month in the quarter, in October, January, April, and July for the previous three months. Additional documentation is available to contextualize and understand the dataset. Documents include: the data dictionary describing each data element, a copy of the CC-BY license for the dataset, metadata in xml datacite format, and a listing of software libraries identified from the data. The data may be downloaded as quarterly zipped files with a metadata file from the following url:
提供机构:
Texas Advanced Computing Center
创建时间:
2015-09-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作