jab13/mlb-statcast-dataset
收藏Hugging Face2025-11-29 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/jab13/mlb-statcast-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: pitch_type
dtype: string
- name: game_date
dtype: timestamp[ms]
- name: release_speed
dtype: float64
- name: release_pos_x
dtype: float64
- name: release_pos_z
dtype: float64
- name: player_name
dtype: string
- name: batter
dtype: int64
- name: pitcher
dtype: int64
- name: events
dtype: string
- name: description
dtype: string
- name: spin_dir
dtype: float64
- name: spin_rate_deprecated
dtype: float64
- name: break_angle_deprecated
dtype: float64
- name: break_length_deprecated
dtype: float64
- name: zone
dtype: int64
- name: des
dtype: string
- name: game_type
dtype: string
- name: stand
dtype: string
- name: p_throws
dtype: string
- name: home_team
dtype: string
- name: away_team
dtype: string
- name: type
dtype: string
- name: hit_location
dtype: float64
- name: bb_type
dtype: string
- name: balls
dtype: int64
- name: strikes
dtype: int64
- name: game_year
dtype: int64
- name: pfx_x
dtype: float64
- name: pfx_z
dtype: float64
- name: plate_x
dtype: float64
- name: plate_z
dtype: float64
- name: on_3b
dtype: float64
- name: on_2b
dtype: float64
- name: on_1b
dtype: float64
- name: outs_when_up
dtype: int64
- name: inning
dtype: int64
- name: inning_topbot
dtype: string
- name: hc_x
dtype: float64
- name: hc_y
dtype: float64
- name: tfs_deprecated
dtype: float64
- name: tfs_zulu_deprecated
dtype: float64
- name: umpire
dtype: float64
- name: sv_id
dtype: float64
- name: vx0
dtype: float64
- name: vy0
dtype: float64
- name: vz0
dtype: float64
- name: ax
dtype: float64
- name: ay
dtype: float64
- name: az
dtype: float64
- name: sz_top
dtype: float64
- name: sz_bot
dtype: float64
- name: hit_distance_sc
dtype: float64
- name: launch_speed
dtype: float64
- name: launch_angle
dtype: float64
- name: effective_speed
dtype: float64
- name: release_spin_rate
dtype: float64
- name: release_extension
dtype: float64
- name: game_pk
dtype: int64
- name: fielder_2
dtype: int64
- name: fielder_3
dtype: int64
- name: fielder_4
dtype: int64
- name: fielder_5
dtype: int64
- name: fielder_6
dtype: int64
- name: fielder_7
dtype: int64
- name: fielder_8
dtype: int64
- name: fielder_9
dtype: int64
- name: release_pos_y
dtype: float64
- name: estimated_ba_using_speedangle
dtype: float64
- name: estimated_woba_using_speedangle
dtype: float64
- name: woba_value
dtype: float64
- name: woba_denom
dtype: float64
- name: babip_value
dtype: float64
- name: iso_value
dtype: float64
- name: launch_speed_angle
dtype: float64
- name: at_bat_number
dtype: int64
- name: pitch_number
dtype: int64
- name: pitch_name
dtype: string
- name: home_score
dtype: int64
- name: away_score
dtype: int64
- name: bat_score
dtype: int64
- name: fld_score
dtype: int64
- name: post_away_score
dtype: int64
- name: post_home_score
dtype: int64
- name: post_bat_score
dtype: int64
- name: post_fld_score
dtype: int64
- name: if_fielding_alignment
dtype: string
- name: of_fielding_alignment
dtype: string
- name: spin_axis
dtype: float64
- name: delta_home_win_exp
dtype: float64
- name: delta_run_exp
dtype: float64
- name: bat_speed
dtype: float64
- name: swing_length
dtype: float64
- name: estimated_slg_using_speedangle
dtype: float64
- name: delta_pitcher_run_exp
dtype: float64
- name: hyper_speed
dtype: float64
- name: home_score_diff
dtype: int64
- name: bat_score_diff
dtype: int64
- name: home_win_exp
dtype: float64
- name: bat_win_exp
dtype: float64
- name: age_pit_legacy
dtype: int64
- name: age_bat_legacy
dtype: int64
- name: age_pit
dtype: int64
- name: age_bat
dtype: int64
- name: n_thruorder_pitcher
dtype: int64
- name: n_priorpa_thisgame_player_at_bat
dtype: int64
- name: pitcher_days_since_prev_game
dtype: float64
- name: batter_days_since_prev_game
dtype: float64
- name: pitcher_days_until_next_game
dtype: float64
- name: batter_days_until_next_game
dtype: float64
- name: api_break_z_with_gravity
dtype: float64
- name: api_break_x_arm
dtype: float64
- name: api_break_x_batter_in
dtype: float64
- name: arm_angle
dtype: float64
- name: attack_angle
dtype: float64
- name: attack_direction
dtype: float64
- name: swing_path_tilt
dtype: float64
- name: intercept_ball_minus_batter_pos_x_inches
dtype: float64
- name: intercept_ball_minus_batter_pos_y_inches
dtype: float64
- name: VRA
dtype: float64
- name: HRA
dtype: float64
- name: PitchesThrown
dtype: int32
- name: IsStrike
dtype: int32
- name: IsGB
dtype: float64
- name: BatterName
dtype: string
splits:
- name: train
num_bytes: 3043941146
num_examples: 2966881
download_size: 441054634
dataset_size: 3043941146
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
数据集信息:
特征列表:
- 字段名称:投球类型(pitch_type),数据类型:字符串(string)
- 字段名称:比赛日期(game_date),数据类型:毫秒级时间戳(timestamp[ms])
- 字段名称:出手速度(release_speed),数据类型:双精度浮点数(float64)
- 字段名称:出手位置X坐标(release_pos_x),数据类型:双精度浮点数(float64)
- 字段名称:出手位置Z坐标(release_pos_z),数据类型:双精度浮点数(float64)
- 字段名称:球员姓名(player_name),数据类型:字符串(string)
- 字段名称:击球手(batter),数据类型:64位整数(int64)
- 字段名称:投手(pitcher),数据类型:64位整数(int64)
- 字段名称:赛事事件(events),数据类型:字符串(string)
- 字段名称:事件描述(description),数据类型:字符串(string)
- 字段名称:旋转方向(spin_dir),数据类型:双精度浮点数(float64)
- 字段名称:已弃用旋转速率(spin_rate_deprecated),数据类型:双精度浮点数(float64)
- 字段名称:已弃用轨迹角度(break_angle_deprecated),数据类型:双精度浮点数(float64)
- 字段名称:已弃用轨迹长度(break_length_deprecated),数据类型:双精度浮点数(float64)
- 字段名称:好球带区域(zone),数据类型:64位整数(int64)
- 字段名称:事件详情(des),数据类型:字符串(string)
- 字段名称:比赛类型(game_type),数据类型:字符串(string)
- 字段名称:击球手站位(stand),数据类型:字符串(string)
- 字段名称:投手投球臂侧(p_throws),数据类型:字符串(string)
- 字段名称:主场球队(home_team),数据类型:字符串(string)
- 字段名称:客场球队(away_team),数据类型:字符串(string)
- 字段名称:球种类型(type),数据类型:字符串(string)
- 字段名称:击球落点(hit_location),数据类型:双精度浮点数(float64)
- 字段名称:击球球型(bb_type),数据类型:字符串(string)
- 字段名称:坏球数(balls),数据类型:64位整数(int64)
- 字段名称:好球数(strikes),数据类型:64位整数(int64)
- 字段名称:比赛年份(game_year),数据类型:64位整数(int64)
- 字段名称:X方向运动偏移(pfx_x),数据类型:双精度浮点数(float64)
- 字段名称:Z方向运动偏移(pfx_z),数据类型:双精度浮点数(float64)
- 字段名称:本垒板X坐标(plate_x),数据类型:双精度浮点数(float64)
- 字段名称:本垒板Z坐标(plate_z),数据类型:双精度浮点数(float64)
- 字段名称:三垒跑者标识(on_3b),数据类型:双精度浮点数(float64)
- 字段名称:二垒跑者标识(on_2b),数据类型:双精度浮点数(float64)
- 字段名称:一垒跑者标识(on_1b),数据类型:双精度浮点数(float64)
- 字段名称:打者上场时出局数(outs_when_up),数据类型:64位整数(int64)
- 字段名称:局数(inning),数据类型:64位整数(int64)
- 字段名称:半局标识(inning_topbot),数据类型:字符串(string)
- 字段名称:击球落地X坐标(hc_x),数据类型:双精度浮点数(float64)
- 字段名称:击球落地Y坐标(hc_y),数据类型:双精度浮点数(float64)
- 字段名称:已弃用时间戳(tfs_deprecated),数据类型:双精度浮点数(float64)
- 字段名称:已弃用祖鲁时间戳(tfs_zulu_deprecated),数据类型:双精度浮点数(float64)
- 字段名称:裁判标识(umpire),数据类型:双精度浮点数(float64)
- 字段名称:比赛录像ID(sv_id),数据类型:双精度浮点数(float64)
- 字段名称:初始X方向速度(vx0),数据类型:双精度浮点数(float64)
- 字段名称:初始Y方向速度(vy0),数据类型:双精度浮点数(float64)
- 字段名称:初始Z方向速度(vz0),数据类型:双精度浮点数(float64)
- 字段名称:X方向加速度(ax),数据类型:双精度浮点数(float64)
- 字段名称:Y方向加速度(ay),数据类型:双精度浮点数(float64)
- 字段名称:Z方向加速度(az),数据类型:双精度浮点数(float64)
- 字段名称:好球带上沿高度(sz_top),数据类型:双精度浮点数(float64)
- 字段名称:好球带下沿高度(sz_bot),数据类型:双精度浮点数(float64)
- 字段名称:击球飞行距离(hit_distance_sc),数据类型:双精度浮点数(float64)
- 字段名称:击球初速度(launch_speed),数据类型:双精度浮点数(float64)
- 字段名称:击球仰角(launch_angle),数据类型:双精度浮点数(float64)
- 字段名称:有效球速(effective_speed),数据类型:双精度浮点数(float64)
- 字段名称:出手旋转速率(release_spin_rate),数据类型:双精度浮点数(float64)
- 字段名称:出手延伸距离(release_extension),数据类型:双精度浮点数(float64)
- 字段名称:比赛唯一标识(game_pk),数据类型:64位整数(int64)
- 字段名称:捕手标识(fielder_2),数据类型:64位整数(int64)
- 字段名称:一垒手标识(fielder_3),数据类型:64位整数(int64)
- 字段名称:二垒手标识(fielder_4),数据类型:64位整数(int64)
- 字段名称:三垒手标识(fielder_5),数据类型:64位整数(int64)
- 字段名称:游击手标识(fielder_6),数据类型:64位整数(int64)
- 字段名称:左外野手标识(fielder_7),数据类型:64位整数(int64)
- 字段名称:中外野手标识(fielder_8),数据类型:64位整数(int64)
- 字段名称:右外野手标识(fielder_9),数据类型:64位整数(int64)
- 字段名称:出手位置Y坐标(release_pos_y),数据类型:双精度浮点数(float64)
- 字段名称:基于击球速度与仰角的预估安打率(estimated_ba_using_speedangle),数据类型:双精度浮点数(float64)
- 字段名称:基于击球速度与仰角的预估加权上垒率(estimated_woba_using_speedangle),数据类型:双精度浮点数(float64)
- 字段名称:加权上垒率数值(woba_value),数据类型:双精度浮点数(float64)
- 字段名称:加权上垒率分母(woba_denom),数据类型:双精度浮点数(float64)
- 字段名称:场内安打率(BABIP,babip_value),数据类型:双精度浮点数(float64)
- 字段名称:纯长打率数值(iso_value),数据类型:双精度浮点数(float64)
- 字段名称:击球速度仰角组合参数(launch_speed_angle),数据类型:双精度浮点数(float64)
- 字段名称:打席编号(at_bat_number),数据类型:64位整数(int64)
- 字段名称:本次打席投球编号(pitch_number),数据类型:64位整数(int64)
- 字段名称:投球名称(pitch_name),数据类型:字符串(string)
- 字段名称:主场比分(home_score),数据类型:64位整数(int64)
- 字段名称:客场比分(away_score),数据类型:64位整数(int64)
- 字段名称:打者方比分(bat_score),数据类型:64位整数(int64)
- 字段名称:防守方比分(fld_score),数据类型:64位整数(int64)
- 字段名称:赛后客场比分(post_away_score),数据类型:64位整数(int64)
- 字段名称:赛后主场比分(post_home_score),数据类型:64位整数(int64)
- 字段名称:赛后打者方比分(post_bat_score),数据类型:64位整数(int64)
- 字段名称:赛后防守方比分(post_fld_score),数据类型:64位整数(int64)
- 字段名称:内野防守站位(if_fielding_alignment),数据类型:字符串(string)
- 字段名称:外野防守站位(of_fielding_alignment),数据类型:字符串(string)
- 字段名称:旋转轴(spin_axis),数据类型:双精度浮点数(float64)
- 字段名称:主场胜率变化值(delta_home_win_exp),数据类型:双精度浮点数(float64)
- 字段名称:跑垒分变化值(delta_run_exp),数据类型:双精度浮点数(float64)
- 字段名称:击球速度(bat_speed),数据类型:双精度浮点数(float64)
- 字段名称:挥棒长度(swing_length),数据类型:双精度浮点数(float64)
- 字段名称:基于击球速度与仰角的预估长打率(estimated_slg_using_speedangle),数据类型:双精度浮点数(float64)
- 字段名称:投手造成的跑垒分变化值(delta_pitcher_run_exp),数据类型:双精度浮点数(float64)
- 字段名称:极速球速(hyper_speed),数据类型:双精度浮点数(float64)
- 字段名称:主场比分差(home_score_diff),数据类型:64位整数(int64)
- 字段名称:打者方比分差(bat_score_diff),数据类型:64位整数(int64)
- 字段名称:赛前主场胜率(home_win_exp),数据类型:双精度浮点数(float64)
- 字段名称:打者方胜率(bat_win_exp),数据类型:双精度浮点数(float64)
- 字段名称:投手传统年龄(age_pit_legacy),数据类型:64位整数(int64)
- 字段名称:打者传统年龄(age_bat_legacy),数据类型:64位整数(int64)
- 字段名称:投手当前年龄(age_pit),数据类型:64位整数(int64)
- 字段名称:打者当前年龄(age_bat),数据类型:64位整数(int64)
- 字段名称:投手对阵打者轮次(n_thruorder_pitcher),数据类型:64位整数(int64)
- 字段名称:打者本场比赛前打数(n_priorpa_thisgame_player_at_bat),数据类型:64位整数(int64)
- 字段名称:投手距上一场比赛天数(pitcher_days_since_prev_game),数据类型:双精度浮点数(float64)
- 字段名称:打者距上一场比赛天数(batter_days_since_prev_game),数据类型:双精度浮点数(float64)
- 字段名称:投手距下一场比赛天数(pitcher_days_until_next_game),数据类型:双精度浮点数(float64)
- 字段名称:打者距下一场比赛天数(batter_days_until_next_game),数据类型:双精度浮点数(float64)
- 字段名称:考虑重力的Z方向轨迹偏移(api_break_z_with_gravity),数据类型:双精度浮点数(float64)
- 字段名称:投手手臂侧X方向轨迹偏移(api_break_x_arm),数据类型:双精度浮点数(float64)
- 字段名称:打者方向X方向轨迹偏移(api_break_x_batter_in),数据类型:双精度浮点数(float64)
- 字段名称:投球手臂角度(arm_angle),数据类型:双精度浮点数(float64)
- 字段名称:挥棒攻击角(attack_angle),数据类型:双精度浮点数(float64)
- 字段名称:挥棒攻击方向(attack_direction),数据类型:双精度浮点数(float64)
- 字段名称:挥棒路径倾斜度(swing_path_tilt),数据类型:双精度浮点数(float64)
- 字段名称:击球时球与打者X方向位置差(英寸)(intercept_ball_minus_batter_pos_x_inches),数据类型:双精度浮点数(float64)
- 字段名称:击球时球与打者Y方向位置差(英寸)(intercept_ball_minus_batter_pos_y_inches),数据类型:双精度浮点数(float64)
- 字段名称:垂直旋转角度(VRA),数据类型:双精度浮点数(float64)
- 字段名称:水平旋转角度(HRA),数据类型:双精度浮点数(float64)
- 字段名称:总投球数(PitchesThrown),数据类型:32位整数(int32)
- 字段名称:是否为好球(IsStrike),数据类型:32位整数(int32)
- 字段名称:是否为滚地球(IsGB),数据类型:双精度浮点数(float64)
- 字段名称:打者姓名(BatterName),数据类型:字符串(string)
数据集拆分:
- 拆分名称:训练集(train),数据字节数:3043941146,样本数量:2966881
下载大小:441054634,数据集总大小:3043941146
配置信息:
- 配置名称:默认配置(default),数据文件:
- 拆分:训练集(train),文件路径:data/train-*
提供机构:
jab13



