five

Space of Optimal Solutions of the Correlation Clustering Problem for Complete Signed Graphs

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6816102
下载链接
链接失效反馈
官方服务:
资源简介:
Description. This is the data used in the experiments of the following paper: N. Arınık, R. Figueiredo, and V. Labatut, “Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs,” Journal of Complex Networks 8(6):cnaa025, 2020. DOI: 10.1093/comnet/cnaa025 ⟨hal-02994011⟩ This dataset contains: Plot files used in the article; Input signed networks; All optimal solutions (i.e. optimal solution space) of the corresponding networks; Evaluation files. Source code. The code source is accessible on GitHub: https://github.com/CompNet/Sosocc Citation. If you use the data or source code, please cite the above article. @Article{Arinik2020,  author    = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent},  title     = {Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs},  journal   = {Journal of Complex Networks},  year      = {2020},  volume    = {8},  number    = {6},  pages     = {cnaa025},  doi       = {10.1093/comnet/cnaa025},} -------------------------------------------- Details. # PLOT FILES* `Figure1.zip`: Figures showing that there might be many distinct optimal solutions of a small-sized network.* `Figure2.zip`: Figures showing that distinct optimal solutions of a given network might be partition-wise very similar or different.* `Figure4: All Results.zip`: Figure 4 in the article contains only a few plots regarding the results for space considerations. This zip file contains all plots, and it is organized by the values of `l0`. In each `l0` folder, the results are shown in three different perspectives:--- Detected Imbalance Percentage vs Graph Order (i.e. number of vertices)--- Prop mispl vs Graph order--- Graph order vs Prop mispl* `workflow.pdf`: The workflow of the methodology used in the article.* `Syrian network With All Solutions.pdf`: Syrian network (on top) with core part information through node colors, and its optimal solutions in which node colors represent partition information (on bottom).  #NETWORKSAll networks are in `Input Signed Networks.tar.gz`.Networks are generated through a simple random model (available in https://github.com/CompNet/SignedBenchmark) designed to produce complete (or uncomplete) unweighted networks with built-in modular structure.There are 3 parameters used for the generation: number of nodes (`n`) initial number of modules (`l0`) proportion of misplaced links, i.e. proportion of frustrated links, (`qm`) Inside `Input Signed Networks.tar.gz`:NETWORKS|__n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000....|__propMispl=PROP_MISPL........|__propNeg=PROP_NEG............|__network=NETWORK_NO- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000The number of nodes, the initial number of modules and the network density are given. The network density is always 1, since we treat only complete signed networks.- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPLProportion of misplaced links is given.- The third hierarchy => the folders are named as follows: propNeg=PROP_NEGProportion of negative links (`qn`) is specified. `qn` changes depending on `n` and `l0`. Since only complete signed networks are studied, this parameter is automatically computed from the other input parameters.- The fourth hierarchy => the folders are named as follows: network=NETWORK_NONetwork numbers are shown.In the end, thre are three file formats describing the same network content: GraphML (.graphml), Pajek NET (.net) or .G format.# PARTITIONSAll partition results are in `Partition Results.tar.gz`. Note that all optimal partitions of a signed network are obtained through an exact partitioning method. The code source is accessible here: https://github.com/arinik9/ExCCInside `Partition Results.tar.gz`:PARTITIONS|__n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000....|__propMispl=PROP_MISPL........|__propNeg=PROP_NEG............|__network=NETWORK_NO................|__"ExCC-all"....................|__"signed-unweighted"- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPL- The third hierarchy => the folders are named as follows: propNeg=PROP_NEG- The fourth hierarchy => the folders are named as follows: network=NETWORK_NO- The fifth hierarchy => the folders are named as follows: "ExCC-all"The name of the partitioning method are shown. Since an exact partitioning method is used to obtain all distinct optimal solutions, it is named as "ExCC-all".- The sixth hierarchy => the folders are named as follows: "signed-unweighted"The type of signed networks are shown: signed and unweighted In the end, the partition results are located, and the file names are named as follows: membership.txt. Note that the first partition result number starts from zero.   # EVALUATIONSEvaluation results related to our plots are in `Evaluation Results.tar.gz. Note that the hierarchy of this folder is the same as that of 'Partitions'. Inside `Evaluation Results.tar.gz`: - `Best-k-for-kmedoids.csv`: It contains three columns. 1) the number of solution classes via kmedoids, 2) the best Silhouette score, 3) the best clustering in terms of Silhouette score, which represents solution classes. - `class-core-part-size-tresh=1.00.csv`. It indicates the proportion of core part size for each solution class. - `exec-time.csv`: It indicates the execution time in seconds. - `imbalance.csv`: It contains the information of imbalance as 1) count and 2) percentage - `nb-solution.csv`: It indicates the total number of solutions-------------------------------------------- Funding: this research benefited from the support of the Agorantic FR 3621, as well as the FMJH Program PGMO and from the support to this program from EDF-THALES-ORANGE-CRITEO.
创建时间:
2024-10-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作