MikeGreen2710/location_with_extra_feature
收藏Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/MikeGreen2710/location_with_extra_feature
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: house_front_std
dtype: float64
- name: road_wide_std
dtype: float64
- name: car_area_std
dtype: float64
- name: price_std
dtype: float64
- name: number_of_floors_std
dtype: float64
- name: street
dtype: string
- name: city
dtype: string
- name: district
dtype: string
- name: ward
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: description
dtype: string
- name: LAN
sequence: string
- name: overlapped
dtype: float64
- name: house_location
dtype: float64
- name: ngo
dtype: bool
- name: house_location_2
dtype: string
- name: address
dtype: string
- name: duong
dtype: bool
- name: house_front_std_is_filled
dtype: int64
- name: house_front_std_filled
dtype: float64
- name: house_front_std_normed
dtype: float64
- name: road_wide_std_is_filled
dtype: int64
- name: road_wide_std_filled
dtype: float64
- name: road_wide_std_normed
dtype: float64
- name: car_area_std_is_filled
dtype: int64
- name: car_area_std_filled
dtype: float64
- name: car_area_std_normed
dtype: float64
- name: price_std_is_filled
dtype: int64
- name: price_std_filled
dtype: float64
- name: price_std_normed
dtype: float64
- name: number_of_floors_std_is_filled
dtype: int64
- name: number_of_floors_std_filled
dtype: float64
- name: number_of_floors_std_normed
dtype: float64
- name: street_filled
dtype: string
- name: city_filled
dtype: string
- name: district_filled
dtype: string
- name: ward_filled
dtype: string
- name: price_median_by_location
dtype: float64
- name: price_median_by_location_normed
dtype: float64
- name: street_encoded
dtype: float64
- name: city_encoded
dtype: float64
- name: district_encoded
dtype: float64
- name: ward_encoded
dtype: float64
- name: street_encoded_normed
dtype: float64
- name: city_encoded_normed
dtype: float64
- name: district_encoded_normed
dtype: float64
- name: ward_encoded_normed
dtype: float64
- name: extra_data
sequence: float64
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 1601745164
num_examples: 2718388
download_size: 463492243
dataset_size: 1601745164
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset primarily contains multiple features related to real estate, such as house front standard, road width standard, car area standard, price standard, etc., along with detailed geographical location information including street, city, district, ward, etc. The dataset also includes some normalized and filled features, as well as encoded geographical location information. The dataset is divided into a training set, containing 2718388 samples with a total byte size of 1601745164.
提供机构:
MikeGreen2710
原始信息汇总
数据集特征概述
基本特征
- house_front_std: 数据类型为 float64
- road_wide_std: 数据类型为 float64
- car_area_std: 数据类型为 float64
- price_std: 数据类型为 float64
- number_of_floors_std: 数据类型为 float64
- street: 数据类型为 string
- city: 数据类型为 string
- district: 数据类型为 string
- ward: 数据类型为 string
- id: 数据类型为 string
- title: 数据类型为 string
- description: 数据类型为 string
- LAN: 数据类型为 sequence: string
- overlapped: 数据类型为 float64
- house_location: 数据类型为 float64
- ngo: 数据类型为 bool
- house_location_2: 数据类型为 string
- address: 数据类型为 string
- duong: 数据类型为 bool
标准化和填充特征
- house_front_std_is_filled: 数据类型为 int64
- house_front_std_filled: 数据类型为 float64
- house_front_std_normed: 数据类型为 float64
- road_wide_std_is_filled: 数据类型为 int64
- road_wide_std_filled: 数据类型为 float64
- road_wide_std_normed: 数据类型为 float64
- car_area_std_is_filled: 数据类型为 int64
- car_area_std_filled: 数据类型为 float64
- car_area_std_normed: 数据类型为 float64
- price_std_is_filled: 数据类型为 int64
- price_std_filled: 数据类型为 float64
- price_std_normed: 数据类型为 float64
- number_of_floors_std_is_filled: 数据类型为 int64
- number_of_floors_std_filled: 数据类型为 float64
- number_of_floors_std_normed: 数据类型为 float64
地理位置编码特征
- street_filled: 数据类型为 string
- city_filled: 数据类型为 string
- district_filled: 数据类型为 string
- ward_filled: 数据类型为 string
- price_median_by_location: 数据类型为 float64
- price_median_by_location_normed: 数据类型为 float64
- street_encoded: 数据类型为 float64
- city_encoded: 数据类型为 float64
- district_encoded: 数据类型为 float64
- ward_encoded: 数据类型为 float64
- street_encoded_normed: 数据类型为 float64
- city_encoded_normed: 数据类型为 float64
- district_encoded_normed: 数据类型为 float64
- ward_encoded_normed: 数据类型为 float64
其他特征
- extra_data: 数据类型为 sequence: float64
- index_level_0: 数据类型为 int64
数据集分割
- train:
- 数据量: 2718388 条记录
- 数据大小: 1601745164 字节
数据集大小
- 下载大小: 463492243 字节
- 数据集大小: 1601745164 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*



