Space of Optimal Solutions of the Correlation Clustering Problem for Complete Signed Graphs
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6816102
下载链接
链接失效反馈官方服务:
资源简介:
Description. This is the data used in the experiments of the following paper:
N. Arınık, R. Figueiredo, and V. Labatut, “Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs,” Journal of Complex Networks 8(6):cnaa025, 2020. DOI: 10.1093/comnet/cnaa025 ⟨hal-02994011⟩
This dataset contains:
Plot files used in the article;
Input signed networks;
All optimal solutions (i.e. optimal solution space) of the corresponding networks;
Evaluation files.
Source code. The code source is accessible on GitHub: https://github.com/CompNet/Sosocc
Citation. If you use the data or source code, please cite the above article.
@Article{Arinik2020, author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs}, journal = {Journal of Complex Networks}, year = {2020}, volume = {8}, number = {6}, pages = {cnaa025}, doi = {10.1093/comnet/cnaa025},}
--------------------------------------------
Details.
# PLOT FILES* `Figure1.zip`: Figures showing that there might be many distinct optimal solutions of a small-sized network.* `Figure2.zip`: Figures showing that distinct optimal solutions of a given network might be partition-wise very similar or different.* `Figure4: All Results.zip`: Figure 4 in the article contains only a few plots regarding the results for space considerations. This zip file contains all plots, and it is organized by the values of `l0`. In each `l0` folder, the results are shown in three different perspectives:--- Detected Imbalance Percentage vs Graph Order (i.e. number of vertices)--- Prop mispl vs Graph order--- Graph order vs Prop mispl* `workflow.pdf`: The workflow of the methodology used in the article.* `Syrian network With All Solutions.pdf`: Syrian network (on top) with core part information through node colors, and its optimal solutions in which node colors represent partition information (on bottom).
#NETWORKSAll networks are in `Input Signed Networks.tar.gz`.Networks are generated through a simple random model (available in https://github.com/CompNet/SignedBenchmark) designed to produce complete (or uncomplete) unweighted networks with built-in modular structure.There are 3 parameters used for the generation:
number of nodes (`n`)
initial number of modules (`l0`)
proportion of misplaced links, i.e. proportion of frustrated links, (`qm`)
Inside `Input Signed Networks.tar.gz`:NETWORKS|__n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000....|__propMispl=PROP_MISPL........|__propNeg=PROP_NEG............|__network=NETWORK_NO- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000The number of nodes, the initial number of modules and the network density are given. The network density is always 1, since we treat only complete signed networks.- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPLProportion of misplaced links is given.- The third hierarchy => the folders are named as follows: propNeg=PROP_NEGProportion of negative links (`qn`) is specified. `qn` changes depending on `n` and `l0`. Since only complete signed networks are studied, this parameter is automatically computed from the other input parameters.- The fourth hierarchy => the folders are named as follows: network=NETWORK_NONetwork numbers are shown.In the end, thre are three file formats describing the same network content: GraphML (.graphml), Pajek NET (.net) or .G format.# PARTITIONSAll partition results are in `Partition Results.tar.gz`. Note that all optimal partitions of a signed network are obtained through an exact partitioning method. The code source is accessible here: https://github.com/arinik9/ExCCInside `Partition Results.tar.gz`:PARTITIONS|__n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000....|__propMispl=PROP_MISPL........|__propNeg=PROP_NEG............|__network=NETWORK_NO................|__"ExCC-all"....................|__"signed-unweighted"- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPL- The third hierarchy => the folders are named as follows: propNeg=PROP_NEG- The fourth hierarchy => the folders are named as follows: network=NETWORK_NO- The fifth hierarchy => the folders are named as follows: "ExCC-all"The name of the partitioning method are shown. Since an exact partitioning method is used to obtain all distinct optimal solutions, it is named as "ExCC-all".- The sixth hierarchy => the folders are named as follows: "signed-unweighted"The type of signed networks are shown: signed and unweighted
In the end, the partition results are located, and the file names are named as follows: membership.txt. Note that the first partition result number starts from zero.
# EVALUATIONSEvaluation results related to our plots are in `Evaluation Results.tar.gz. Note that the hierarchy of this folder is the same as that of 'Partitions'. Inside `Evaluation Results.tar.gz`:
- `Best-k-for-kmedoids.csv`: It contains three columns. 1) the number of solution classes via kmedoids, 2) the best Silhouette score, 3) the best clustering in terms of Silhouette score, which represents solution classes.
- `class-core-part-size-tresh=1.00.csv`. It indicates the proportion of core part size for each solution class.
- `exec-time.csv`: It indicates the execution time in seconds.
- `imbalance.csv`: It contains the information of imbalance as 1) count and 2) percentage
- `nb-solution.csv`: It indicates the total number of solutions--------------------------------------------
Funding: this research benefited from the support of the Agorantic FR 3621, as well as the FMJH Program PGMO and from the support to this program from EDF-THALES-ORANGE-CRITEO.
创建时间:
2024-10-05



