five

SDCOR Synthetic Datasets

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/p4tx2k852r
下载链接
链接失效反馈
官方服务:
资源简介:
SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets Link to arXiv e-print: https://arxiv.org/pdf/2006.07616.pdf Link to ResearchGate e-print: https://www.researchgate.net/publication/342197681_SDCOR_Scalable_Density-based_Clustering_for_Local_Outlier_Detection_in_Massive-Scale_Datasets This paper presents a method for local outlier detection in massive-scale datasets, which is based on a batch-wise density-based clustering approach. SDCOR consists of three major phases: 1) Sampling; 2) Scalable Clustering; and 3) Scoring. In the Sampling phase, a preliminary random sampling is conducted to obtain an abstraction of the entire data, named temporary clustering model; and also to acquire some information over the necessary parameters for the clustering procedures. Then, the Scalable Clustering phase will commence and the input data will be processed in chunks; as by processing successive chunks, the temporary clustering model gets gradual updates, till it turns into the final clustering model after processing the last chunk. Ultimately, at the last phase of the algorithm, regarding the final clustering model attained through the batch-wise clustering, and by employing the Mahalanobis distance criterion, each object is given an outlying score called SDCOR, which is equal to its local Mahalanobis distance. Each synthetic dataset in this repository is made of some Gaussian clusters with arbitrary mean vectors, far enough from each other, to impede probable overlappings among multidimensional clusters. For each of these artificial datasets, a specific amount of outliers are added around every cluster in the corresponding data; and moreover, the outliers "truth" is available along with each synthetic data. For every artificial dataset, there is a n-by-p matrix of dataset X (as n and p stand for the cardinality and dimensionality of the input data, respectively), along with the n-by-1 vector y of outlier labels, all together as a single binary MAT-file. We have implemented our code in MATLAB 9, which due to becoming reproducible, is accessible through our GitHub page (https://github.com/sana33/SDCOR). Finally, if you are interested in the idea or you are using this data for your research, please cite our paper as: @article{naghavi2021sdcor, title={SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets}, author={Naghavi Nozad, Sayyed Ahmad and Amir Haeri, Maryam and Folino, Gianluigi}, journal={Knowledge-Based Systems}, pages={107256}, year={2021}, publisher={Elsevier} } Thanks a lot ...
创建时间:
2021-08-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作