five

titanic

收藏
阿里云天池2026-05-15 更新2024-12-07 收录
下载链接:
https://tianchi.aliyun.com/dataset/192460
下载链接
链接失效反馈
官方服务:
资源简介:
titanic数据集The data has been split into two groups: training set (train.csv) test set (test.csv) The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features. The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic. We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like. Data Dictionary Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex Age Age in years sibsp # of siblings / spouses aboard the Titanic parch # of parents / children aboard the Titanic ticket Ticket number fare Passenger fare cabin Cabin number embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton Variable Notes pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5 sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored) parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

本数据集为泰坦尼克号(Titanic)数据集,数据已被划分为两个分组: 训练集(training set,对应文件train.csv)与测试集(test set,对应文件test.csv)。 训练集应用于构建机器学习模型,该数据集为每名乘客提供了生存结果(亦称为“基准真值(ground truth)”)。你的模型可基于乘客性别、客舱等级等“特征”进行训练,同时也可通过特征工程(feature engineering)创建新特征。 测试集用于评估模型在未见数据上的表现,该数据集未提供每名乘客的真实生存标签,你需要基于训练完成的模型,预测测试集中每名乘客是否在泰坦尼克号沉没事件中幸存。 此外我们提供了gender_submission.csv文件,该文件假设仅女性乘客幸存,可作为提交文件格式的参考样例。 ### 数据字典 | 变量名 | 变量定义 | 取值说明 | | ---- | ---- | ---- | | survival | 生存状态 | 0 = 未幸存,1 = 幸存 | | pclass | 客舱等级 | 1 = 一等舱,2 = 二等舱,3 = 三等舱 | | sex | 乘客性别 | 无 | | Age | 乘客年龄(单位:年) | 若年龄小于1则为小数形式;若为估算年龄,则格式为xx.5 | | sibsp | 同船兄弟姐妹/配偶数量 | 无 | | parch | 同船父母/子女数量 | 无 | | ticket | 船票编号 | 无 | | fare | 乘客票价 | 无 | | cabin | 客舱编号 | 无 | | embarked | 登船港口 | C = 瑟堡(Cherbourg),Q = 昆斯敦(Queenstown),S = 南安普顿(Southampton) | ### 变量注释 1. pclass:可作为社会经济地位(socio-economic status, SES)的代理变量,其中1代表上层阶级,2代表中层阶级,3代表下层阶级。 2. Age:若年龄小于1则以小数形式表示;若为估算年龄,则格式为xx.5。 3. sibsp:本数据集对家庭亲属关系定义如下: - 兄弟姐妹:兄弟、姐妹、继兄弟、继姐妹 - 配偶:丈夫、妻子(情妇与未婚夫/未婚妻不计入统计) 4. parch:本数据集对家庭亲属关系定义如下: - 父母:母亲、父亲 - 子女:女儿、儿子、继女、继子 部分儿童仅由保姆陪同登船,因此这类乘客的parch取值为0。
提供机构:
阿里云天池
创建时间:
2024-12-05
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是泰坦尼克号乘客生存预测的经典机器学习数据集,包含训练集和测试集,用于构建和评估分类模型。数据集提供了乘客的性别、年龄、船票等级等特征,以及生存情况的真实标签,旨在预测乘客在沉船事件中的生存结果。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作