five

mariakmurphy55/titanicdata

收藏
Hugging Face2023-11-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mariakmurphy55/titanicdata
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en pretty_name: titanic data size_categories: - 1K<n<10K --- # Dataset Card for Titanic Data Training and testing data for Titanic passengers' survival. ## Dataset Details ### Dataset Description Train: - Dimensions --> 891x12 - Column names --> "PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", and "Embarked" Test: - Dimensions --> 418x11 - Column names --> "PassengerId", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", and "Embarked" ### Dataset Sources Kaggle Titanic dataset https://www.kaggle.com/competitions/titanic ## Uses Raw datasets being used in introduction to DVC and Amazon's S3 buckets. ## Dataset Structure # Column definitions: - "PassengerId" --> key for each passenger (int64) - "Survived" --> binary variable indicating survival (int64) - "Pclass" --> first, second, or third class (int64) - "Name" --> passenger name; maiden name in parentheses for married women (object) - "Sex" --> male or female (object) - "Age" --> passenger age (float64) - "SibSp" --> unknown meaning (int64) - "Parch" --> unknown meaning (int64) - "Ticket" --> ticket identifier (object) - "Fare" --> float variable (float64) - "Cabin" --> cabin identifier (object) - "Embarked" --> C, Q, or S (object) Categorical columns: "Name", "Sex", "Ticket", "Cabin", "Embarked" Continuous columns: "PassengerId", "Pclass", "SibSp", "Parch", "Age", "Fare" # Quick Facts: Train: - PassengerID, Survived, Pclass, Name, Sex, SibSp, Parch, Ticket, and Fare have no NA values - Age not documented for 177 passengers (19.8653% NA) - Cabin not documented for 687 passengers (77.1044% NA) - Embarked not documented for 2 passengers (0.2245% NA) Test: - PassengerID, Pclass, Name, Sex, SibSp, Parch, Ticket, and Embarked have no NA values - Age not documented for 86 passengers (20.5742% NA) - Fare not documented for 1 passenger (0.2392% NA) - Cabin not documented for 387 passengers (78.2297% NA) # Summary Statistics: Train: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65119c3f02dbe541c92539d4/AJLNDr1mDXEiTLn_JAH0h.png) Test: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65119c3f02dbe541c92539d4/PEnS25wxm6ymjgsI3QKtv.png) ## Dataset Card Author Maria Murphy
提供机构:
mariakmurphy55
原始信息汇总

数据集卡片:Titanic数据

数据集详情

数据集描述

训练集:

  • 维度:891x12
  • 列名:"PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"

测试集:

  • 维度:418x11
  • 列名:"PassengerId", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"

数据集来源

Kaggle Titanic数据集 Kaggle Titanic竞赛

数据集结构

列定义:

  • "PassengerId":每个乘客的唯一标识(int64)
  • "Survived":生存状态(int64)
  • "Pclass":舱位等级(int64)
  • "Name":乘客姓名,已婚女性括号内为娘家姓(object)
  • "Sex":性别(object)
  • "Age":年龄(float64)
  • "SibSp":未知含义(int64)
  • "Parch":未知含义(int64)
  • "Ticket":票号(object)
  • "Fare":票价(float64)
  • "Cabin":舱位号(object)
  • "Embarked":登船港口(object)

分类列:"Name", "Sex", "Ticket", "Cabin", "Embarked"

连续列:"PassengerId", "Pclass", "SibSp", "Parch", "Age", "Fare"

快速事实

训练集:

  • PassengerID, Survived, Pclass, Name, Sex, SibSp, Parch, Ticket, Fare 无缺失值
  • Age 缺失177条记录(19.8653% NA)
  • Cabin 缺失687条记录(77.1044% NA)
  • Embarked 缺失2条记录(0.2245% NA)

测试集:

  • PassengerID, Pclass, Name, Sex, SibSp, Parch, Ticket, Embarked 无缺失值
  • Age 缺失86条记录(20.5742% NA)
  • Fare 缺失1条记录(0.2392% NA)
  • Cabin 缺失387条记录(78.2297% NA)

摘要统计

训练集和测试集的摘要统计图表链接如下:

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作