mariakmurphy55/titanicdata
收藏Hugging Face2023-11-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mariakmurphy55/titanicdata
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
pretty_name: titanic data
size_categories:
- 1K<n<10K
---
# Dataset Card for Titanic Data
Training and testing data for Titanic passengers' survival.
## Dataset Details
### Dataset Description
Train:
- Dimensions --> 891x12
- Column names --> "PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", and "Embarked"
Test:
- Dimensions --> 418x11
- Column names --> "PassengerId", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", and "Embarked"
### Dataset Sources
Kaggle Titanic dataset
https://www.kaggle.com/competitions/titanic
## Uses
Raw datasets being used in introduction to DVC and Amazon's S3 buckets.
## Dataset Structure
# Column definitions:
- "PassengerId" --> key for each passenger (int64)
- "Survived" --> binary variable indicating survival (int64)
- "Pclass" --> first, second, or third class (int64)
- "Name" --> passenger name; maiden name in parentheses for married women (object)
- "Sex" --> male or female (object)
- "Age" --> passenger age (float64)
- "SibSp" --> unknown meaning (int64)
- "Parch" --> unknown meaning (int64)
- "Ticket" --> ticket identifier (object)
- "Fare" --> float variable (float64)
- "Cabin" --> cabin identifier (object)
- "Embarked" --> C, Q, or S (object)
Categorical columns: "Name", "Sex", "Ticket", "Cabin", "Embarked"
Continuous columns: "PassengerId", "Pclass", "SibSp", "Parch", "Age", "Fare"
# Quick Facts:
Train:
- PassengerID, Survived, Pclass, Name, Sex, SibSp, Parch, Ticket, and Fare have no NA values
- Age not documented for 177 passengers (19.8653% NA)
- Cabin not documented for 687 passengers (77.1044% NA)
- Embarked not documented for 2 passengers (0.2245% NA)
Test:
- PassengerID, Pclass, Name, Sex, SibSp, Parch, Ticket, and Embarked have no NA values
- Age not documented for 86 passengers (20.5742% NA)
- Fare not documented for 1 passenger (0.2392% NA)
- Cabin not documented for 387 passengers (78.2297% NA)
# Summary Statistics:
Train:

Test:

## Dataset Card Author
Maria Murphy
提供机构:
mariakmurphy55
原始信息汇总
数据集卡片:Titanic数据
数据集详情
数据集描述
训练集:
- 维度:891x12
- 列名:"PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"
测试集:
- 维度:418x11
- 列名:"PassengerId", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"
数据集来源
Kaggle Titanic数据集 Kaggle Titanic竞赛
数据集结构
列定义:
- "PassengerId":每个乘客的唯一标识(int64)
- "Survived":生存状态(int64)
- "Pclass":舱位等级(int64)
- "Name":乘客姓名,已婚女性括号内为娘家姓(object)
- "Sex":性别(object)
- "Age":年龄(float64)
- "SibSp":未知含义(int64)
- "Parch":未知含义(int64)
- "Ticket":票号(object)
- "Fare":票价(float64)
- "Cabin":舱位号(object)
- "Embarked":登船港口(object)
分类列:"Name", "Sex", "Ticket", "Cabin", "Embarked"
连续列:"PassengerId", "Pclass", "SibSp", "Parch", "Age", "Fare"
快速事实
训练集:
- PassengerID, Survived, Pclass, Name, Sex, SibSp, Parch, Ticket, Fare 无缺失值
- Age 缺失177条记录(19.8653% NA)
- Cabin 缺失687条记录(77.1044% NA)
- Embarked 缺失2条记录(0.2245% NA)
测试集:
- PassengerID, Pclass, Name, Sex, SibSp, Parch, Ticket, Embarked 无缺失值
- Age 缺失86条记录(20.5742% NA)
- Fare 缺失1条记录(0.2392% NA)
- Cabin 缺失387条记录(78.2297% NA)
摘要统计
训练集和测试集的摘要统计图表链接如下:



