社区和犯罪预测数据集,涉及分析警察的人均人数和分配
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-25999.html
下载链接
链接失效反馈官方服务:
资源简介:
Data Set Information: Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=122), plus the attribute to be predicted (Per Capita Violent Crimes). The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in incorrect values for per capita violent crime. These cities are not included in the dataset. Many of these omitted communities were from the midwestern USA. Data is described below based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew (hence for example the population attribute has a mean value of 0.06 because most communities are small). E.g. An attribute described as 'mean people per household' is actually the normalized (0-1) version of that value. The normalization preserves rough ratios of values WITHIN an attribute (e.g. double the value for double the population within the available precision - except for extreme values (all values more than 3 SD above the mean are normalized to 1.00; all values more than 3 SD below the mean are nromalized to 0.00)). However, the normalization does not preserve relationships between values BETWEEN attributes (e.g. it would not be meaningful to compare the value for whitePerCap with the value for blackPerCap for a community) A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data. .arff header for Weka: @relation crimepredict @attribute state numeric @attribute county numeric @attribute community numeric @attribute communityname string @attribute fold numeric @attribute population numeric @attribute householdsize numeric @attribute racepctblack numeric @attribute racePctWhite numeric @attribute racePctAsian numeric @attribute racePctHisp numeric @attribute agePct12t21 numeric @attribute agePct12t29 numeric @attribute agePct16t24 numeric @attribute agePct65up numeric @attribute numbUrban numeric @attribute pctUrban numeric @attribute medIncome numeric @attribute pctWWage numeric @attribute pctWFarmSelf numeric @attribute pctWInvInc numeric @attribute pctWSocSec numeric @attribute pctWPubAsst numeric @attribute pctWRetire numeric @attribute medFamInc numeric @attribute perCapInc numeric @attribute whitePerCap numeric @attribute blackPerCap numeric @attribute indianPerCap numeric @attribute AsianPerCap numeric @attribute OtherPerCap numeric @attribute HispPerCap numeric @attribute NumUnderPov numeric @attribute PctPopUnderPov numeric @attribute PctLess9thGrade numeric @attribute PctNotHSGrad numeric @attribute PctBSorMore numeric @attribute PctUnemployed numeric @attribute PctEmploy numeric @attribute PctEmplManu numeric @attribute PctEmplProfServ numeric @attribute PctOccupManu numeric @attribute PctOccupMgmtProf numeric @attribute MalePctDivorce numeric @attribute MalePctNevMarr numeric @attribute FemalePctDiv numeric @attribute TotalPctDiv numeric @attribute PersPerFam numeric @attribute PctFam2Par numeric @attribute PctKids2Par numeric @attribute PctYoungKids2Par numeric @attribute PctTeen2Par numeric @attribute PctWorkMomYoungKids numeric @attribute PctWorkMom numeric @attribute NumIlleg numeric @attribute PctIlleg numeric @attribute NumImmig numeric @attribute PctImmigRecent numeric @attribute PctImmigRec5 numeric @attribute PctImmigRec8 numeric @attribute PctImmigRec10 numeric @attribute PctRecentImmig numeric @attribute PctRecImmig5 numeric @attribute PctRecImmig8 numeric @attribute PctRecImmig10 numeric @attribute PctSpeakEnglonly numeric @attribute PctNotSpeakEnglWell numeric @attribute PctLargHouseFam numeric @attribute PctLargHouseOccup numeric @attribute PersPerOccupHous numeric @attribute PersPerOwnOccHous numeric @attribute PersPerRentOccHous numeric @attribute PctPersOwnOccup numeric @attribute PctPersDenseHous numeric @attribute PctHousLess3BR numeric @attribute MedNumBR numeric @attribute HousVacant numeric @attribute PctHousOccup numeric @attribute PctHousOwnOcc numeric @attribute PctVacantBoarded numeric @attribute PctVacMore6Mos numeric @attribute MedYrHousBuilt numeric @attribute PctHousNoPhone numeric @attribute PctWOFullPlumb numeric @attribute OwnOccLowQuart numeric @attribute OwnOccMedVal numeric @attribute OwnOccHiQuart numeric @attribute RentLowQ numeric @attribute RentMedian numeric @attribute RentHighQ numeric @attribute MedRent numeric @attribute MedRentPctHousInc numeric @attribute MedOwnCostPctInc numeric @attribute MedOwnCostPctIncNoMtg numeric @attribute NumInShelters numeric @attribute NumStreet numeric @attribute PctForeignBorn numeric @attribute PctBornSameState numeric @attribute PctSameHouse85 numeric @attribute PctSameCity85 numeric @attribute PctSameState85 numeric @attribute LemasSwornFT numeric @attribute LemasSwFTPerPop numeric @attribute LemasSwFTFieldOps numeric @attribute LemasSwFTFieldPerPop numeric @attribute LemasTotalReq numeric @attribute LemasTotReqPerPop numeric @attribute PolicReqPerOffic numeric @attribute PolicPerPop numeric @attribute RacialMatchCommPol numeric @attribute PctPolicWhite numeric @attribute PctPolicBlack numeric @attribute PctPolicHisp numeric @attribute PctPolicAsian numeric @attribute PctPolicMinor numeric @attribute OfficAssgnDrugUnits numeric @attribute NumKindsDrugsSeiz numeric @attribute PolicAveOTWorked numeric @attribute LandArea numeric @attribute PopDens numeric @attribute PctUsePubTrans numeric @attribute PolicCars numeric @attribute PolicOperBudg numeric @attribute LemasPctPoliconPatr numeric @attribute LemasGangUnitDeploy numeric @attribute LemasPctOfficDrugUn numeric @attribute PolicBudgPerPop numeric @attribute ViolentCrimesPerPop numeric @data Attribute Information: Attribute Information: (122 predictive, 5 non-predictive, 1 goal) -- state: US state (by number) - not counted as predictive above, but if considered, should be consided nominal (nominal) -- county: numeric code for county - not predictive, and many missing values (numeric) -- community: numeric code for community - not predictive and many missing values (numeric) -- communityname: community name - not predictive - for information only (string) -- fold: fold number for non-random Creator: Michael Redmond (redmond '@' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- culled from 1990 US Census, 1995 US FBI Uniform Crime Report, 1990 US Law Enforcement Management and Administrative Statistics Survey, available from ICPSR at U of Michigan. -- Donor: Michael Redmond (redmond '@' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- Date: July 2009
## 数据集信息
本数据集包含众多变量,以便测试用于属性选择或权重学习的各类算法。本数据集未纳入明显无关的属性;所有入选属性均与犯罪问题存在合理关联(共122个),外加待预测属性——人均暴力犯罪数。
数据集涵盖的变量涉及社区层面与执法层面两类:社区层面变量包括城镇人口占比、家庭收入中位数等;执法层面变量包括人均警察数量、缉毒部门警员占比等。
人均暴力犯罪数变量通过人口总量与美国官方认定的四类暴力犯罪(谋杀、强奸、抢劫、袭击)的总数计算得到。此前美国部分州在强奸案的统计标准上存在争议,导致强奸变量存在缺失值,进而使得人均暴力犯罪数的计算结果出现偏差,因此这类城市未被纳入本数据集。此类被剔除的社区多位于美国中西部地区。
以下基于原始数值对数据集进行说明。本数据集所有数值型数据均通过无监督等间隔分箱法(Unsupervised equal-interval binning)归一化至0.00至1.00的区间。所有属性保留其原始分布与偏态特征(例如,由于多数社区人口规模较小,人口属性的均值为0.06)。例如,标注为"家庭平均人口"的属性,实际为该指标经0-1归一化后的结果。
该归一化方法保留了属性内部数值的近似比例关系(在精度范围内,人口翻倍则对应属性值翻倍,极端值除外:所有高于均值3倍标准差的数值将被归一化为1.00,所有低于均值3倍标准差的数值将被归一化为0.00)。但该归一化方法无法保留属性间的数值关系,例如,无法对某一社区的白人人均指标与黑人人均指标进行有意义的比较。
本数据集存在一项局限性:执法管理与行政统计调查(LEMAS, Law Enforcement Management and Administrative Statistics Survey)仅覆盖了警员数不少于100人的警局,以及小型警局的随机抽样样本。此外,同时未出现在人口普查数据集与犯罪数据集中的社区也被剔除,且多数社区缺失LEMAS相关数据。
## Weka .arff 文件头部
@relation crimepredict
@attribute state numeric
@attribute county numeric
@attribute community numeric
@attribute communityname string
@attribute fold numeric
@attribute population numeric
@attribute householdsize numeric
@attribute racepctblack numeric
@attribute racePctWhite numeric
@attribute racePctAsian numeric
@attribute racePctHisp numeric
@attribute agePct12t21 numeric
@attribute agePct12t29 numeric
@attribute agePct16t24 numeric
@attribute agePct65up numeric
@attribute numbUrban numeric
@attribute pctUrban numeric
@attribute medIncome numeric
@attribute pctWWage numeric
@attribute pctWFarmSelf numeric
@attribute pctWInvInc numeric
@attribute pctWSocSec numeric
@attribute pctWPubAsst numeric
@attribute pctWRetire numeric
@attribute medFamInc numeric
@attribute perCapInc numeric
@attribute whitePerCap numeric
@attribute blackPerCap numeric
@attribute indianPerCap numeric
@attribute AsianPerCap numeric
@attribute OtherPerCap numeric
@attribute HispPerCap numeric
@attribute NumUnderPov numeric
@attribute PctPopUnderPov numeric
@attribute PctLess9thGrade numeric
@attribute PctNotHSGrad numeric
@attribute PctBSorMore numeric
@attribute PctUnemployed numeric
@attribute PctEmploy numeric
@attribute PctEmplManu numeric
@attribute PctEmplProfServ numeric
@attribute PctOccupManu numeric
@attribute PctOccupMgmtProf numeric
@attribute MalePctDivorce numeric
@attribute MalePctNevMarr numeric
@attribute FemalePctDiv numeric
@attribute TotalPctDiv numeric
@attribute PersPerFam numeric
@attribute PctFam2Par numeric
@attribute PctKids2Par numeric
@attribute PctYoungKids2Par numeric
@attribute PctTeen2Par numeric
@attribute PctWorkMomYoungKids numeric
@attribute PctWorkMom numeric
@attribute NumIlleg numeric
@attribute PctIlleg numeric
@attribute NumImmig numeric
@attribute PctImmigRecent numeric
@attribute PctImmigRec5 numeric
@attribute PctImmigRec8 numeric
@attribute PctImmigRec10 numeric
@attribute PctRecentImmig numeric
@attribute PctRecImmig5 numeric
@attribute PctRecImmig8 numeric
@attribute PctRecImmig10 numeric
@attribute PctSpeakEnglonly numeric
@attribute PctNotSpeakEnglWell numeric
@attribute PctLargHouseFam numeric
@attribute PctLargHouseOccup numeric
@attribute PersPerOccupHous numeric
@attribute PersPerOwnOccHous numeric
@attribute PersPerRentOccHous numeric
@attribute PctPersOwnOccup numeric
@attribute PctPersDenseHous numeric
@attribute PctHousLess3BR numeric
@attribute MedNumBR numeric
@attribute HousVacant numeric
@attribute PctHousOccup numeric
@attribute PctHousOwnOcc numeric
@attribute PctVacantBoarded numeric
@attribute PctVacMore6Mos numeric
@attribute MedYrHousBuilt numeric
@attribute PctHousNoPhone numeric
@attribute PctWOFullPlumb numeric
@attribute OwnOccLowQuart numeric
@attribute OwnOccMedVal numeric
@attribute OwnOccHiQuart numeric
@attribute RentLowQ numeric
@attribute RentMedian numeric
@attribute RentHighQ numeric
@attribute MedRent numeric
@attribute MedRentPctHousInc numeric
@attribute MedOwnCostPctInc numeric
@attribute MedOwnCostPctIncNoMtg numeric
@attribute NumInShelters numeric
@attribute NumStreet numeric
@attribute PctForeignBorn numeric
@attribute PctBornSameState numeric
@attribute PctSameHouse85 numeric
@attribute PctSameCity85 numeric
@attribute PctSameState85 numeric
@attribute LemasSwornFT numeric
@attribute LemasSwFTPerPop numeric
@attribute LemasSwFTFieldOps numeric
@attribute LemasSwFTFieldPerPop numeric
@attribute LemasTotalReq numeric
@attribute LemasTotReqPerPop numeric
@attribute PolicReqPerOffic numeric
@attribute PolicPerPop numeric
@attribute RacialMatchCommPol numeric
@attribute PctPolicWhite numeric
@attribute PctPolicBlack numeric
@attribute PctPolicHisp numeric
@attribute PctPolicAsian numeric
@attribute PctPolicMinor numeric
@attribute OfficAssgnDrugUnits numeric
@attribute NumKindsDrugsSeiz numeric
@attribute PolicAveOTWorked numeric
@attribute LandArea numeric
@attribute PopDens numeric
@attribute PctUsePubTrans numeric
@attribute PolicCars numeric
@attribute PolicOperBudg numeric
@attribute LemasPctPoliconPatr numeric
@attribute LemasGangUnitDeploy numeric
@attribute LemasPctOfficDrugUn numeric
@attribute PolicBudgPerPop numeric
@attribute ViolentCrimesPerPop numeric
@data
## 属性信息
(共122个预测属性、5个非预测属性、1个目标属性)
- state:美国州编号——未被纳入上述预测属性范畴,若需考量则应视为名义型变量(nominal)
- county:县的数值编码——非预测属性,存在大量缺失值(数值型)
- community:社区的数值编码——非预测属性,存在大量缺失值(数值型)
- communityname:社区名称——非预测属性,仅用于信息记录(字符串型)
- fold:非随机分组的折数(数值型)
### 数据来源与贡献者
创作者:迈克尔·雷蒙德(Michael Redmond),电子邮箱redmond '@' lasalle.edu;任职于美国宾夕法尼亚州费城拉萨尔大学计算机科学系,邮编19141。
数据集源自1990年美国人口普查、1995年美国联邦调查局统一犯罪报告,以及1990年美国执法管理与行政统计调查(LEMAS, Law Enforcement Management and Administrative Statistics Survey),相关数据可从密歇根大学的校际政治与社会研究联盟(ICPSR)获取。
捐赠者:迈克尔·雷蒙德(Michael Redmond),电子邮箱redmond '@' lasalle.edu;任职于美国宾夕法尼亚州费城拉萨尔大学计算机科学系,邮编19141。
发布日期:2009年7月
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个综合性的社区犯罪预测数据集,包含122个社区特征变量和人均暴力犯罪率目标变量,数据经过标准化处理并保留了原始分布特征。数据集整合了人口普查、犯罪报告和执法统计等多源数据,适用于研究社区特征与犯罪率的关系以及警力资源分配问题。
以上内容由遇见数据集搜集并总结生成



