five

Space of optimal solutions of the Correlation Clustering problem

收藏
DataCite Commons2020-08-27 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/Space_of_optimal_solutions_of_the_Correlation_Clustering_problem/8233340/1
下载链接
链接失效反馈
官方服务:
资源简介:
This is the data used in the experiment of the paper submited to the following conference:<br><i>N. Arinik, R. Figueiredo, V. Labatut, Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem, in: SIAM International Conference on Data Mining, 2020.</i><br>The code source is accessible here: <i>https://github.com/CompNet/Sosocc</i><br><br>This dataset contains:<br>* Plot files used in the article<br>* Input signed networks<br>* All optimal solutions (i.e. optimal solution space) of the corresponding networks<br><br><br><br><b># PLOT FILES</b><br>* `<i>Figure1.zip</i>`: Figures showing that there might be many optimal solutions of a small-sized network.<br>* `<i>Figure2.zip</i>`: Figures showing that optimal solutions of a given network might be partition-wise very similar or different.<br>* `<i>Figure4: All Results.zip</i>`: Figure 4 in the article contains only a few plots regarding the results for space considerations. This zip file contains all plots, and it is organized by the values of `<i>l<sub>0</sub></i>`. In each `<i>l<sub>0</sub></i>` folder, the results are shown in three different perspectives:<br> --- Detected Imbalance Percentage vs Graph Size<br> --- Prop mispl vs Graph size<br> --- Graph size vs Prop mispl<br>* `<i>workflow.pdf</i>`: The workflow of the methodology used in the article.<br>* `<i>Syrian network With All Solutions.pdf</i>`: Syrian network (on top) with core part information through node colors, and its optimal solutions in which node colors represent partition information (on bottom).<br><br><br><br><b>#NETWORKS</b><br>All networks are in `<i>Input Signed Networks.tar.gz</i>`.<br>Networks are generated thanks to a simple random model (available in <i>https://github.com/CompNet/SignedBenchmark</i>) designed to produce complete unweighted networks with built-in modular structure. <br>There are 3 parameters used for generation:<br>- number of nodes (`<i>n</i>`)<br>- initial number of clusters (`<i>l<sub>0</sub></i>`)<br>- proportion of misplaced links (`<i>q<sub>m</sub></i>`)<br><br>Inside `<i>Input Signed Networks.tar.gz</i>`:<br><br>NETWORKS<br>|__n=NB-NODE_l0=INIT_NB_CLUSTER_dens=1.0000<br>....|__propMispl=PROP_MISPL<br> ........|__propNeg=PROP_NEG<br> ............|__network=NETWORK_NO<br><br>- The first hierarchy =&gt; the folders are named as follows: n=NB-NODE_l0=INIT-NB-CLUSTER_dens=1.0000<br> The number of nodes, the initial number of clusters and the network density are given. The network density is always 1, since we treat only complete signed networks.<br>- The second hierarchy =&gt; the folders are named as follows: propMispl=PROP_MISPL<br> Proportion of misplaced links is given.<br>- The third hierarchy =&gt; the folders are named as follows: propNeg=PROP_NEG<br> Proportion of negative links (`<i>q<sub>n</sub></i>`) is specified. `<i>q<sub>n</sub></i>` changes depending on `<i>n</i>` and `<i>l<sub>0</sub></i>`. Since only complet signed networks are studied, this parameter is automatically computed from the other input parameters.<br>- The fourth hierarchy =&gt; the folders are named as follows: network=NETWORK_NO<br> Network numbers are shown.<br>In the end, thre are three file formats describing the same network content: GraphML (.graphml), Pajek NET (.net) or .G format.<br><br><br><br><b># CORRESPONDING PARTITIONS</b><br>All partition results are in `<i>Partition Results.tar.gz</i>`. Note that all optimal partitions of a signed network are obtained through an exact partitioning method.<br>Inside `<i>Partition Results.tar.gz</i>`:<br><br>PARTITIONS<br>|__n=NB-NODE_l0=INIT_NB_CLUSTER_dens=1.0000<br> ....|__propMispl=PROP_MISPL<br> ........|__propNeg=PROP_NEG<br> ............|__network=NETWORK_NO<br> ................|__"<i>ExCC-all</i>"<br> ....................|__"<i>signed-unweighted</i>"<br><br>- The first hierarchy =&gt; the folders are named as follows: n=NB-NODE_l0=INIT-NB-CLUSTER_dens=1.0000<br>- The second hierarchy =&gt; the folders are named as follows: propMispl=PROP_MISPL<br>- The third hierarchy =&gt; the folders are named as follows: propNeg=PROP_NEG<br>- The fourth hierarchy =&gt; the folders are named as follows: network=NETWORK_NO<br>- The fifth hierarchy =&gt; the folders are named as follows: "<i>ExCC-all</i>"<br> The name of the partitioning method are shown. Since an exact partitioning method is used to obtain all optimal solutions, it is named as "<i>ExCC-all</i>".<br>- The sixth hierarchy =&gt; the folders are named as follows: "<i>signed-unweighted</i>"<br> The type of signed networks are shown: signed and unweighted<br>In the end, the partition results are located, and the file names are named as follows: <i>membership.txt</i>. Note that the first partition result number starts from zero.<br><br><br>

本数据集用于投稿至以下会议的论文实验:<br><i>N. Arinik, R. Figueiredo, V. Labatut, 多重性与多样性:相关聚类问题的最优解空间分析(Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem), in: SIAM International Conference on Data Mining, 2020.</i><br>代码开源地址为:<i>https://github.com/CompNet/Sosocc</i><br><br>本数据集包含以下内容:<br>* 论文中使用的绘图文件<br>* 输入符号网络(signed networks)<br>* 对应网络的全部最优解(即最优解空间)<br><br><br><br><b># 绘图文件</b><br>* `<i>Figure1.zip</i>`:展示小型网络可能存在大量最优解的相关图表。<br>* `<i>Figure2.zip</i>`:展示同一网络的最优解在划分层面可能高度相似或差异显著的相关图表。<br>* `<i>Figure4: All Results.zip</i>`:论文中的图4受篇幅限制仅展示了部分结果绘图。本压缩包包含全部绘图,并按`<i>l<sub>0</sub></i>`的取值进行组织。在每个`<i>l<sub>0</sub></i>`文件夹中,结果从三个不同视角展示:<br> --- 检测到的不平衡占比 vs 图规模<br> --- 错误链接占比(Prop mispl) vs 图规模<br> --- 图规模 vs 错误链接占比<br>* `<i>workflow.pdf</i>`:论文中使用的方法论工作流程。<br>* `<i>Syrian network With All Solutions.pdf</i>`:顶部为叙利亚网络(标注核心部分的节点颜色信息),底部为该网络的最优解集合,其中节点颜色代表划分信息。<br><br><br><br><b># 符号网络</b><br>所有网络均存放于`<i>Input Signed Networks.tar.gz</i>`中。<br>本系列网络通过一款简易随机模型(开源地址:<i>https://github.com/CompNet/SignedBenchmark</i>)生成,该模型用于构建带内置模块化结构的完全无权重网络。<br>生成过程使用三个参数:<br>- 节点数(`n`)<br>- 初始聚类数(`<i>l<sub>0</sub></i>`)<br>- 错误链接占比(`<i>q<sub>m</sub></i>`)<br><br>在`<i>Input Signed Networks.tar.gz</i>`内部的目录结构如下:<br><br>NETWORKS<br>|__n=NB-NODE_l0=INIT_NB_CLUSTER_dens=1.0000<br>....|__propMispl=PROP_MISPL<br> ........|__propNeg=PROP_NEG<br> ............|__network=NETWORK_NO<br><br>- 第一层目录:命名格式为`n=NB-NODE_l0=INIT-NB-CLUSTER_dens=1.0000`,分别标注节点数、初始聚类数与网络密度。由于仅研究完全符号网络,网络密度恒为1。<br>- 第二层目录:命名格式为`propMispl=PROP_MISPL`,标注错误链接占比。<br>- 第三层目录:命名格式为`propNeg=PROP_NEG`,标注负链接占比(`<i>q<sub>n</sub></i>`)。`<i>q<sub>n</sub></i>`的取值由节点数与初始聚类数自动计算得到,本研究仅针对完全符号网络展开。<br>- 第四层目录:命名格式为`network=NETWORK_NO`,标注网络编号。<br>最终,每个网络包含三种格式的文件以描述相同的网络内容:GraphML(`.graphml`)、Pajek NET(`.net`)或`.G`格式。<br><br><br><br><b># 对应划分结果</b><br>所有划分结果均存放于`<i>Partition Results.tar.gz</i>`中。需说明的是,符号网络的全部最优划分通过精确划分方法求得。<br>在`<i>Partition Results.tar.gz</i>`内部的目录结构如下:<br><br>PARTITIONS<br>|__n=NB-NODE_l0=INIT_NB_CLUSTER_dens=1.0000<br> ....|__propMispl=PROP_MISPL<br> ........|__propNeg=PROP_NEG<br> ............|__network=NETWORK_NO<br> ................|__"<i>ExCC-all</i>"<br> ....................|__"<i>signed-unweighted</i>"<br><br>- 第一层目录:命名格式为`n=NB-NODE_l0=INIT-NB-CLUSTER_dens=1.0000`<br>- 第二层目录:命名格式为`propMispl=PROP_MISPL`<br>- 第三层目录:命名格式为`propNeg=PROP_NEG`<br>- 第四层目录:命名格式为`network=NETWORK_NO`<br>- 第五层目录:命名格式为`<i>ExCC-all</i>`,代表所用的精确划分方法名称,因该方法可求解所有最优解,故命名为`ExCC-all`。<br>- 第六层目录:命名格式为`<i>signed-unweighted</i>`,标注符号网络的类型:带符号且无权重。<br>最终,划分结果文件存放于对应目录下,文件名为`<i>membership.txt</i>`。需注意,划分结果的编号从0开始。
提供机构:
figshare
创建时间:
2019-10-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作