Communities and Crime|犯罪分析数据集|社区研究数据集
收藏IBA Project: Crime Analysis & Prediction
Dataset
- Filename:
communitiesNcrime-USA.csv
- Source: UCI Machine Learning Repository
- Description: The dataset includes socio-economic, law enforcement, and demographic data for communities in the United States, alongside crime rates.
Analysis Steps
-
Load the Dataset:
- Imported from the CSV file.
- Summary statistics reviewed.
-
Data Cleaning:
- Handled missing values.
- Removed features with excessive missing data.
- Standardized data for regression.
-
Exploratory Data Analysis (EDA):
- Visualized relationships between features and crime rates.
- Identified patterns and key features.
-
Regression Modeling:
- Linear Regression:
- Modeled continuous crime rates based on features.
- Logistic Regression:
- Converted the crime rate into binary classification (e.g., high/low) for logistic analysis.
- Linear Regression:
-
Evaluation:
- Analyzed model performance metrics (e.g., accuracy, R-squared, confusion matrix).
Results
-
Key Insights from EDA:
- Socio-economic factors do significantly influence crime rates.
-
Model Performance:
- Linear regression achieved an R-squared of
47%
. - Logistic regression accuracy:
80%
on the test set.
- Linear regression achieved an R-squared of

中国农村金融统计数据
该数据集包含了中国农村金融的统计信息,涵盖了农村金融机构的数量、贷款余额、存款余额、金融服务覆盖率等关键指标。数据按年度和地区分类,提供了详细的农村金融发展状况。
www.pbc.gov.cn 收录
NIH Chest X-rays
Over 112,000 Chest X-ray images from more than 30,000 unique patients
kaggle 收录
MedDialog
MedDialog数据集(中文)包含了医生和患者之间的对话(中文)。它有110万个对话和400万个话语。数据还在不断增长,会有更多的对话加入。原始对话来自好大夫网。
github 收录
OpenSonarDatasets
OpenSonarDatasets是一个致力于整合开放源代码声纳数据集的仓库,旨在为水下研究和开发提供便利。该仓库鼓励研究人员扩展当前的数据集集合,以增加开放源代码声纳数据集的可见性,并提供一个更容易查找和比较数据集的方式。
github 收录
THUCNews
THUCNews是根据新浪新闻RSS订阅频道2005~2011年间的历史数据筛选过滤生成,包含74万篇新闻文档(2.19 GB),均为UTF-8纯文本格式。本次比赛数据集在原始新浪新闻分类体系的基础上,重新整合划分出14个候选分类类别:财经、彩票、房产、股票、家居、教育、科技、社会、时尚、时政、体育、星座、游戏、娱乐。提供训练数据共832471条。
github 收录