Space of optimal solutions of the Correlation Clustering problem on Complete Signed Graphs
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Space_of_optimal_solutions_of_the_Correlation_Clustering_problem/8233340
下载链接
链接失效反馈官方服务:
资源简介:
This is the data used in the experiments of our paper:
N. Arinik, R. Figueiredo, V. Labatut (2020), Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs, Journal of Complex Networks, DOI: 10.1093/comnet/cnaa025. The code source is accessible here: https://github.com/CompNet/Sosocc
This dataset contains:
* Plot files used in the article
* Input signed networks
* All optimal solutions (i.e. optimal solution space) of the corresponding networks* Evaluation files
# PLOT FILES
* `Figure1.zip`: Figures showing that there might be many distinct optimal solutions of a small-sized network.
* `Figure2.zip`: Figures showing that distinct optimal solutions of a given network might be partition-wise very similar or different.
* `Figure4: All Results.zip`: Figure 4 in the article contains only a few plots regarding the results for space considerations. This zip file contains all plots, and it is organized by the values of `l0`. In each `l0` folder, the results are shown in three different perspectives:
--- Detected Imbalance Percentage vs Graph Order (i.e. number of vertices)
--- Prop mispl vs Graph order
--- Graph order vs Prop mispl
* `workflow.pdf`: The workflow of the methodology used in the article.
* `Syrian network With All Solutions.pdf`: Syrian network (on top) with core part information through node colors, and its optimal solutions in which node colors represent partition information (on bottom).
#NETWORKS
All networks are in `Input Signed Networks.tar.gz`.
Networks are generated through a simple random model (available in https://github.com/CompNet/SignedBenchmark) designed to produce complete (or uncomplete) unweighted networks with built-in modular structure.
There are 3 parameters used for the generation:
- number of nodes (`n`)
- initial number of modules (`l0`)
- proportion of misplaced links, i.e. proportion of frustrated links, (`qm`)
Inside `Input Signed Networks.tar.gz`:
NETWORKS
|__n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000
....|__propMispl=PROP_MISPL
........|__propNeg=PROP_NEG
............|__network=NETWORK_NO
- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000
The number of nodes, the initial number of modules and the network density are given. The network density is always 1, since we treat only complete signed networks.
- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPL
Proportion of misplaced links is given.
- The third hierarchy => the folders are named as follows: propNeg=PROP_NEG
Proportion of negative links (`qn`) is specified. `qn` changes depending on `n` and `l0`. Since only complete signed networks are studied, this parameter is automatically computed from the other input parameters.
- The fourth hierarchy => the folders are named as follows: network=NETWORK_NO
Network numbers are shown.
In the end, thre are three file formats describing the same network content: GraphML (.graphml), Pajek NET (.net) or .G format.
# PARTITIONS
All partition results are in `Partition Results.tar.gz`. Note that all optimal partitions of a signed network are obtained through an exact partitioning method. The code source is accessible here: https://github.com/arinik9/ExCC
Inside `Partition Results.tar.gz`:
PARTITIONS
|__n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000
....|__propMispl=PROP_MISPL
........|__propNeg=PROP_NEG
............|__network=NETWORK_NO
................|__"ExCC-all"
....................|__"signed-unweighted"
- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000
- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPL
- The third hierarchy => the folders are named as follows: propNeg=PROP_NEG
- The fourth hierarchy => the folders are named as follows: network=NETWORK_NO
- The fifth hierarchy => the folders are named as follows: "ExCC-all"
The name of the partitioning method are shown. Since an exact partitioning method is used to obtain all distinct optimal solutions, it is named as "ExCC-all".
- The sixth hierarchy => the folders are named as follows: "signed-unweighted"
The type of signed networks are shown: signed and unweighted
In the end, the partition results are located, and the file names are named as follows: membership.txt. Note that the first partition result number starts from zero.
# EVALUATIONS
Evaluation results related to our plots are in `Evaluation Results.tar.gz. Note that the hierarchy of this folder is the same as that of 'Partitions'. Inside `Evaluation Results.tar.gz`:
- `Best-k-for-kmedoids.csv`: It contains three columns. 1) the number of solution classes via kmedoids, 2) the best Silhouette score, 3) the best clustering in terms of Silhouette score, which represents solution classes.
- `class-core-part-size-tresh=1.00.csv`. It indicates the proportion of core part size for each solution class.
- `exec-time.csv`: It indicates the execution time in seconds.
- `imbalance.csv`: It contains the information of imbalance as 1) count and 2) percentage
- `nb-solution.csv`: It indicates the total number of solutions
创建时间:
2019-10-08



