Data for: Local fuzzy geographically weighted clustering: A new method for geodemographic segmentation.
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://data.mendeley.com/datasets/kd5xprhv65
下载链接
链接失效反馈官方服务:
资源简介:
This dataset (compressed rar file) includes the Matlab code files for "Local Fuzzy Geographically Weighted Clustering" algorithm and a shapefile containing socio-demographic data and cancer incident data across 973 block groups in Manhattan, New York.
The files are:
1. LFGWC.m = The Matlab code of LFGWC (Local Fuzzy Geographically Weighted Clustering)
2. LFGWC_Call.m = The file to run the above code
3. validity.m = The Matlab code for validating the clustering output
4. licence.txt = The file describing the license terms
5. Demo = Dataset folder
Demo folder contains the following:
1. Data.txt = The non-normalized dataset
2. Population.txt = Population for each polygon
3. Distance.txt = Distance among all objects
4. Centroid.txt = Initial cluster centres
5. Shapefile: Manhattan_Data.shp
The shapefile has been originally downloaded from a benchmark dataset of small-area cancer incidence (Boscoe et al. 2016). The benchmark dataset includes 524,503 tumors across 13,823 block groups for the entire New York State diagnosed between 2005 and 2009 (download link: https://www.satscan.org/datasets/nyscancer/index.html).
Manhattan_Data.shp shapefile includes only the county of Manhattan and not the entire NY. Data have undergone slight modifications that are explained in detail in the paper.
Attributes of Manhattan_Data.shp:
DOHREGION Geographic identifier
CODE Unique ID code for joining data
POPULATION Total population (2010 Census)
White_Pop % white alone population (2010 Census)
Black_Pop % black alone population (2010 Census)
Asian_Pop % Asian alone population (2010 Census)
Other_Pop % other race population (2010 Census)
Hispanic % Hispanic population (2010 Census
HH_Size Persons per household (2010 Census)
LT_HS % population less than high school education (25 & over)
Under_Pov % population under poverty (2006-2010 ACS Data)
BC_Rate Incidents of breast cancer per 1000 people
PC_Rate Incidents of prostate cancer per 1000 people
TC_Rate Total cancer incidents per 1000 people
For more information on the original benchmark dataset visit:
https://www.satscan.org/datasets/nyscancer/index.html
本数据集(压缩RAR文件)包含针对“局部模糊地理加权聚类(Local Fuzzy Geographically Weighted Clustering)”算法的Matlab代码文件,以及涵盖纽约曼哈顿973个街区组的社会人口统计数据与癌症发病数据的形状文件(shapefile)。
所包含文件如下:
1. LFGWC.m:局部模糊地理加权聚类算法的Matlab实现代码
2. LFGWC_Call.m:用于运行上述算法的调用脚本
3. validity.m:用于验证聚类结果有效性的Matlab代码
4. licence.txt:说明许可条款的文件
5. Demo:数据集文件夹
Demo文件夹内含以下文件:
1. Data.txt:未标准化的原始数据集
2. Population.txt:各多边形对应的人口统计数据
3. Distance.txt:所有研究对象间的距离矩阵
4. Centroid.txt:初始聚类中心
5. 形状文件:Manhattan_Data.shp
该形状文件最初源自小区域癌症发病基准数据集(Boscoe等,2016)。该基准数据集涵盖2005至2009年间纽约全州13823个街区组内的524503例肿瘤确诊病例,下载链接:https://www.satscan.org/datasets/nyscancer/index.html。
本次提供的Manhattan_Data.shp仅包含曼哈顿县的相关数据,而非纽约全州范围的数据。数据集已进行小幅修改,具体修改细节已在对应论文中详细说明。
Manhattan_Data.shp包含以下字段属性:
- DOHREGION:地理标识符
- CODE:用于数据关联的唯一标识码
- POPULATION:总人口数(2010年美国人口普查数据)
- White_Pop:非西班牙裔白人单独人口占比(2010年美国人口普查数据)
- Black_Pop:非西班牙裔黑人单独人口占比(2010年美国人口普查数据)
- Asian_Pop:非西班牙裔亚裔单独人口占比(2010年美国人口普查数据)
- Other_Pop:其他种族人口占比(2010年美国人口普查数据)
- Hispanic:西班牙裔/拉丁裔人口占比(2010年美国人口普查数据)
- HH_Size:每户平均人数(2010年美国人口普查数据)
- LT_HS:25岁及以上人群中未完成高中学历的人口占比(2010年美国人口普查数据)
- Under_Pov:处于贫困线以下的人口占比(2006-2010年美国社区调查(American Community Survey, ACS)数据)
- BC_Rate:每千人乳腺癌发病例数
- PC_Rate:每千人前列腺癌发病例数
- TC_Rate:每千人癌症总发病例数
如需了解原始基准数据集的更多信息,请访问:https://www.satscan.org/datasets/nyscancer/index.html
创建时间:
2020-07-29



