five

MikeGreen2710/location_with_extra_feature

收藏
Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/MikeGreen2710/location_with_extra_feature
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: house_front_std dtype: float64 - name: road_wide_std dtype: float64 - name: car_area_std dtype: float64 - name: price_std dtype: float64 - name: number_of_floors_std dtype: float64 - name: street dtype: string - name: city dtype: string - name: district dtype: string - name: ward dtype: string - name: id dtype: string - name: title dtype: string - name: description dtype: string - name: LAN sequence: string - name: overlapped dtype: float64 - name: house_location dtype: float64 - name: ngo dtype: bool - name: house_location_2 dtype: string - name: address dtype: string - name: duong dtype: bool - name: house_front_std_is_filled dtype: int64 - name: house_front_std_filled dtype: float64 - name: house_front_std_normed dtype: float64 - name: road_wide_std_is_filled dtype: int64 - name: road_wide_std_filled dtype: float64 - name: road_wide_std_normed dtype: float64 - name: car_area_std_is_filled dtype: int64 - name: car_area_std_filled dtype: float64 - name: car_area_std_normed dtype: float64 - name: price_std_is_filled dtype: int64 - name: price_std_filled dtype: float64 - name: price_std_normed dtype: float64 - name: number_of_floors_std_is_filled dtype: int64 - name: number_of_floors_std_filled dtype: float64 - name: number_of_floors_std_normed dtype: float64 - name: street_filled dtype: string - name: city_filled dtype: string - name: district_filled dtype: string - name: ward_filled dtype: string - name: price_median_by_location dtype: float64 - name: price_median_by_location_normed dtype: float64 - name: street_encoded dtype: float64 - name: city_encoded dtype: float64 - name: district_encoded dtype: float64 - name: ward_encoded dtype: float64 - name: street_encoded_normed dtype: float64 - name: city_encoded_normed dtype: float64 - name: district_encoded_normed dtype: float64 - name: ward_encoded_normed dtype: float64 - name: extra_data sequence: float64 - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 1601745164 num_examples: 2718388 download_size: 463492243 dataset_size: 1601745164 configs: - config_name: default data_files: - split: train path: data/train-* ---

The dataset primarily contains multiple features related to real estate, such as house front standard, road width standard, car area standard, price standard, etc., along with detailed geographical location information including street, city, district, ward, etc. The dataset also includes some normalized and filled features, as well as encoded geographical location information. The dataset is divided into a training set, containing 2718388 samples with a total byte size of 1601745164.
提供机构:
MikeGreen2710
原始信息汇总

数据集特征概述

基本特征

  • house_front_std: 数据类型为 float64
  • road_wide_std: 数据类型为 float64
  • car_area_std: 数据类型为 float64
  • price_std: 数据类型为 float64
  • number_of_floors_std: 数据类型为 float64
  • street: 数据类型为 string
  • city: 数据类型为 string
  • district: 数据类型为 string
  • ward: 数据类型为 string
  • id: 数据类型为 string
  • title: 数据类型为 string
  • description: 数据类型为 string
  • LAN: 数据类型为 sequence: string
  • overlapped: 数据类型为 float64
  • house_location: 数据类型为 float64
  • ngo: 数据类型为 bool
  • house_location_2: 数据类型为 string
  • address: 数据类型为 string
  • duong: 数据类型为 bool

标准化和填充特征

  • house_front_std_is_filled: 数据类型为 int64
  • house_front_std_filled: 数据类型为 float64
  • house_front_std_normed: 数据类型为 float64
  • road_wide_std_is_filled: 数据类型为 int64
  • road_wide_std_filled: 数据类型为 float64
  • road_wide_std_normed: 数据类型为 float64
  • car_area_std_is_filled: 数据类型为 int64
  • car_area_std_filled: 数据类型为 float64
  • car_area_std_normed: 数据类型为 float64
  • price_std_is_filled: 数据类型为 int64
  • price_std_filled: 数据类型为 float64
  • price_std_normed: 数据类型为 float64
  • number_of_floors_std_is_filled: 数据类型为 int64
  • number_of_floors_std_filled: 数据类型为 float64
  • number_of_floors_std_normed: 数据类型为 float64

地理位置编码特征

  • street_filled: 数据类型为 string
  • city_filled: 数据类型为 string
  • district_filled: 数据类型为 string
  • ward_filled: 数据类型为 string
  • price_median_by_location: 数据类型为 float64
  • price_median_by_location_normed: 数据类型为 float64
  • street_encoded: 数据类型为 float64
  • city_encoded: 数据类型为 float64
  • district_encoded: 数据类型为 float64
  • ward_encoded: 数据类型为 float64
  • street_encoded_normed: 数据类型为 float64
  • city_encoded_normed: 数据类型为 float64
  • district_encoded_normed: 数据类型为 float64
  • ward_encoded_normed: 数据类型为 float64

其他特征

  • extra_data: 数据类型为 sequence: float64
  • index_level_0: 数据类型为 int64

数据集分割

  • train:
    • 数据量: 2718388 条记录
    • 数据大小: 1601745164 字节

数据集大小

  • 下载大小: 463492243 字节
  • 数据集大小: 1601745164 字节

配置

  • config_name: default
  • data_files:
    • split: train
    • path: data/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作