MikeGreen2710/location_with_extra_feature_outlier
收藏Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/MikeGreen2710/location_with_extra_feature_outlier
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: house_front_std
dtype: float64
- name: road_wide_std
dtype: float64
- name: car_area_std
dtype: float64
- name: price_std
dtype: float64
- name: number_of_floors_std
dtype: float64
- name: street
dtype: string
- name: city
dtype: string
- name: district
dtype: string
- name: ward
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: LAN
sequence: string
- name: overlapped
dtype: float64
- name: label
dtype: int64
- name: ngo
dtype: bool
- name: house_location_2
dtype: string
- name: address
dtype: string
- name: duong
dtype: bool
- name: house_front_std_is_filled
dtype: int64
- name: house_front_std_filled
dtype: float64
- name: house_front_std_normed
dtype: float64
- name: road_wide_std_is_filled
dtype: int64
- name: road_wide_std_filled
dtype: float64
- name: road_wide_std_normed
dtype: float64
- name: car_area_std_is_filled
dtype: int64
- name: car_area_std_filled
dtype: float64
- name: car_area_std_normed
dtype: float64
- name: price_std_is_filled
dtype: int64
- name: price_std_filled
dtype: float64
- name: price_std_normed
dtype: float64
- name: number_of_floors_std_is_filled
dtype: int64
- name: number_of_floors_std_filled
dtype: float64
- name: number_of_floors_std_normed
dtype: float64
- name: street_filled
dtype: string
- name: city_filled
dtype: string
- name: district_filled
dtype: string
- name: ward_filled
dtype: string
- name: price_median_by_location
dtype: float64
- name: price_median_by_location_normed
dtype: float64
- name: street_encoded
dtype: float64
- name: city_encoded
dtype: float64
- name: district_encoded
dtype: float64
- name: ward_encoded
dtype: float64
- name: street_encoded_normed
dtype: float64
- name: city_encoded_normed
dtype: float64
- name: district_encoded_normed
dtype: float64
- name: ward_encoded_normed
dtype: float64
- name: extra_data
sequence: float64
- name: final_z_score
dtype: float64
- name: outlier
dtype: float64
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 19288832
num_examples: 11999
download_size: 8562751
dataset_size: 19288832
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
数据集信息:
特征:
- 房屋面宽标准差(house_front_std):64位浮点数(float64)
- 道路宽度标准差(road_wide_std):64位浮点数(float64)
- 车库面积标准差(car_area_std):64位浮点数(float64)
- 房价标准差(price_std):64位浮点数(float64)
- 楼层数标准差(number_of_floors_std):64位浮点数(float64)
- 街道(street):字符串型(string)
- 城市(city):字符串型(string)
- 行政区(district):字符串型(string)
- 街区(ward):字符串型(string)
- 数据编号(id):字符串型(string)
- 标题(title):字符串型(string)
- 文本(text):字符串型(string)
- LAN序列(LAN):字符串序列(sequence of string)
- 重叠度(overlapped):64位浮点数(float64)
- 标签(label):64位整数(int64)
- 非政府组织标识(ngo):布尔型(bool)
- 二级房屋位置(house_location_2):字符串型(string)
- 地址(address):字符串型(string)
- 临街标识(duong):布尔型(bool)
- 房屋面宽标准差是否已填充(house_front_std_is_filled):64位整数(int64)
- 填充后的房屋面宽标准差(house_front_std_filled):64位浮点数(float64)
- 归一化后的房屋面宽标准差(house_front_std_normed):64位浮点数(float64)
- 道路宽度标准差是否已填充(road_wide_std_is_filled):64位整数(int64)
- 填充后的道路宽度标准差(road_wide_std_filled):64位浮点数(float64)
- 归一化后的道路宽度标准差(road_wide_std_normed):64位浮点数(float64)
- 车库面积标准差是否已填充(car_area_std_is_filled):64位整数(int64)
- 填充后的车库面积标准差(car_area_std_filled):64位浮点数(float64)
- 归一化后的车库面积标准差(car_area_std_normed):64位浮点数(float64)
- 房价标准差是否已填充(price_std_is_filled):64位整数(int64)
- 填充后的房价标准差(price_std_filled):64位浮点数(float64)
- 归一化后的房价标准差(price_std_normed):64位浮点数(float64)
- 楼层数标准差是否已填充(number_of_floors_std_is_filled):64位整数(int64)
- 填充后的楼层数标准差(number_of_floors_std_filled):64位浮点数(float64)
- 归一化后的楼层数标准差(number_of_floors_std_normed):64位浮点数(float64)
- 填充后的街道信息(street_filled):字符串型(string)
- 填充后的城市信息(city_filled):字符串型(string)
- 填充后的行政区信息(district_filled):字符串型(string)
- 填充后的街区信息(ward_filled):字符串型(string)
- 按位置统计的房价中位数(price_median_by_location):64位浮点数(float64)
- 归一化后的按位置统计的房价中位数(price_median_by_location_normed):64位浮点数(float64)
- 街道编码(street_encoded):64位浮点数(float64)
- 城市编码(city_encoded):64位浮点数(float64)
- 行政区编码(district_encoded):64位浮点数(float64)
- 街区编码(ward_encoded):64位浮点数(float64)
- 归一化后的街道编码(street_encoded_normed):64位浮点数(float64)
- 归一化后的城市编码(city_encoded_normed):64位浮点数(float64)
- 归一化后的行政区编码(district_encoded_normed):64位浮点数(float64)
- 归一化后的街区编码(ward_encoded_normed):64位浮点数(float64)
- 额外数据序列(extra_data):64位浮点数序列(sequence of float64)
- 最终Z分数(final_z_score):64位浮点数(float64)
- 离群值标记(outlier):64位浮点数(float64)
- 索引列(__index_level_0__):64位整数(int64)
数据集划分:
- 训练集(train):数据字节数19288832,样本量11999
下载大小:8562751字节
数据集总大小:19288832字节
配置信息:
- 配置名称:default
数据文件:
- 划分:train
路径:data/train-*
提供机构:
MikeGreen2710
原始信息汇总
数据集特征概述
主要特征及其数据类型
- house_front_std: 浮点型 (float64)
- road_wide_std: 浮点型 (float64)
- car_area_std: 浮点型 (float64)
- price_std: 浮点型 (float64)
- number_of_floors_std: 浮点型 (float64)
- street: 字符串 (string)
- city: 字符串 (string)
- district: 字符串 (string)
- ward: 字符串 (string)
- id: 字符串 (string)
- title: 字符串 (string)
- text: 字符串 (string)
- LAN: 字符串序列 (sequence: string)
- overlapped: 浮点型 (float64)
- label: 整型 (int64)
- ngo: 布尔型 (bool)
- house_location_2: 字符串 (string)
- address: 字符串 (string)
- duong: 布尔型 (bool)
- house_front_std_is_filled: 整型 (int64)
- house_front_std_filled: 浮点型 (float64)
- house_front_std_normed: 浮点型 (float64)
- road_wide_std_is_filled: 整型 (int64)
- road_wide_std_filled: 浮点型 (float64)
- road_wide_std_normed: 浮点型 (float64)
- car_area_std_is_filled: 整型 (int64)
- car_area_std_filled: 浮点型 (float64)
- car_area_std_normed: 浮点型 (float64)
- price_std_is_filled: 整型 (int64)
- price_std_filled: 浮点型 (float64)
- price_std_normed: 浮点型 (float64)
- number_of_floors_std_is_filled: 整型 (int64)
- number_of_floors_std_filled: 浮点型 (float64)
- number_of_floors_std_normed: 浮点型 (float64)
- street_filled: 字符串 (string)
- city_filled: 字符串 (string)
- district_filled: 字符串 (string)
- ward_filled: 字符串 (string)
- price_median_by_location: 浮点型 (float64)
- price_median_by_location_normed: 浮点型 (float64)
- street_encoded: 浮点型 (float64)
- city_encoded: 浮点型 (float64)
- district_encoded: 浮点型 (float64)
- ward_encoded: 浮点型 (float64)
- street_encoded_normed: 浮点型 (float64)
- city_encoded_normed: 浮点型 (float64)
- district_encoded_normed: 浮点型 (float64)
- ward_encoded_normed: 浮点型 (float64)
- extra_data: 浮点型序列 (sequence: float64)
- final_z_score: 浮点型 (float64)
- outlier: 浮点型 (float64)
- index_level_0: 整型 (int64)
数据集分割
- train: 包含11999个样本,数据大小为19288832字节。
数据集大小
- 下载大小: 8562751字节
- 数据集大小: 19288832字节



