five

Benchmark datasets for seriation and patch seriation code

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/b96s5bcfc2
下载链接
链接失效反馈
官方服务:
资源简介:
These datasets are benchmark ones to test seriation. We used the data to test diagonal and patch seriations. The used C code is also included. SIM dataset: The dataset is a good example for data structure, where different set of variables are responsible for each cluster and the other variables of a given cluster are random. The seriation of these type of data seems to be a hard task for most of the methods. Dataset SIM is a semi-randomly simulated one (created by Gergely Tóth). There are 50 objects and 20 variables in this set ordered in 4 clusters and a random group for the objects. Members of the clusters have similar values at some selected variables, but their other data are random. Some of the selected variables are common also with other clusters. At first, we generated [0,1) random numbers for all data and thereafter the groups were recalculated by adding a given random number for a selected variable of the group biased with white noise. No. of rows: 50 (A,B,C,D=clusters, R=non clustered elements) No. of columns: 20 (A-D characters refer to the involvment of a variable into a given cluster) RETSIM dataset The RETSIM dataset is a simulated one (created by Gergely Tóth). We defined three functional groups and created 4 compounds with random linear combination of the three groups. We set 6 mixtures of the 4 compounds. 6 chromatographic columns were set as well with differently randomized partial retention times for the functional groups. The retention times of the compounds were calculated with linear combination of the functional groups therein. Finally, we added uniform broadening for each compound with integrals related to the concentrations. In this way we had 36 chromatograms of the 6 mixtures on the 6 columns. No. of rows: 36=6*6 A-F denotes the chromatographic columns, 1-6 the mixtures No. of columns: 100 The dataset can be used in two dimension (36*100) or in three dimension(6*6*100). REAC dataset 95 reactions of gasoline combustion used in the thesis work of G. Juhász [1]. The table was created by Gergely Tóth. No.of rows: 95 (reactions) No. of columns: 32 (reactants or products) 0/1 mean whether a compound takes part in the reaction (irrespectively from the stochiometry or reactant/product role) [1] Juhász G. Reduction of a biodiesel combustion reaction mechanism. BSc thesis Budapest: Eötvös Loránd University, Institute of Chemistry, Department of Physical Chemistry, 2015. Seriation code in C: see details in the header of the code.
创建时间:
2023-04-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作