下载链接：

https://modelscope.cn/datasets/nv-community/PhysicalAI-Autonomous-Vehicles

下载链接

链接失效反馈

官方服务：

资源简介：

# PhysicalAI Autonomous Vehicles ## Dataset Description The PhysicalAI-Autonomous-Vehicles dataset provides one of the largest, most geographically diverse collections of multi-sensor data empowering AV researchers to build the next generation of Physical AI based end-to-end driving systems. ![mosaic_4x4](https://cdn-uploads.huggingface.co/production/uploads/667f467e563b0640e37fca79/t_yqQTwuTRFiiK8--mzm3.gif) This dataset has a total of 1727 hours of driving recorded from planned data-collection drives in 25 countries and 2500+ cities. The data captures diverse traffic, weather conditions, obstacles, and pedestrians in the environment. It consists of 310,895 clips that are each 20 seconds long. The sensor data includes multi-camera and LiDAR coverage for all clips, and radar coverage for 163,850 clips. ### Geographic Coverage Approximately 50% of the data comes from throughout the US and the remaining 50% comes from 24 EU countries. ![USA](https://cdn-uploads.huggingface.co/production/uploads/667f467e563b0640e37fca79/7QZ4mgPg4hCb3JgGG4mTN.png) ![EU](https://cdn-uploads.huggingface.co/production/uploads/667f467e563b0640e37fca79/OW-l98gxSFGyxdrPZxpm9.png) | country | count | |:-------------------------|--------:| | United States | 155360 | | Germany | 45673 | | France | 10911 | | Italy | 9082 | | Sweden | 7451 | | Spain | 6815 | | Portugal | 6220 | | Greece | 6123 | | Austria | 5586 | | Finland | 5237 | | Netherlands | 5048 | | Croatia | 4990 | | Denmark | 4688 | | Slovenia | 4341 | | Estonia | 4186 | | Slovakia | 4165 | | Belgium | 3951 | | Czechia | 3706 | | Lithuania | 3426 | | Poland | 3340 | | Romania | 2807 | | Luxembourg | 2652 | | Latvia | 2187 | | Hungary | 1996 | | Bulgaria | 954 | ### Environmental and Traffic Diversity - Traffic density patterns: no traffic, light traffic, medium traffic, and heavy traffic - Road types: highways, urban, residential, and rural roads - Weather: clear, rain, snow, fog - Surface conditions: dry, wet, snow/ice - Time-of-day: daytime, nighttime - Infrastructure elements such as tunnels, bridges, roundabouts, railway crossings, toll booths, inclines, and more ## Dataset Owner(s) NVIDIA Corporation ## Dataset Creation Date 10/28/2025 ## License/Terms of Use [NVIDIA Autonomous Vehicle Dataset License Agreement](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles/blob/main/LICENSE.pdf) ## Intended Usage This dataset can be used for **autonomous vehicle related use cases only** which can be both **commercial or non-commercial** as long as the mentioned license terms are abided by. The size and diversity of this multi-sensor dataset makes it great for research on end-to-end driving, neural reconstruction, synthetic data generation, scenario mining, and many other autonomous vehicle applications. ## Dataset Characterization - Data Collection Method<br> - Automatic/Sensor <br> - Labeling Method<br> - Automatic/Sensor <br> ## Data Format We store the data separately for each sensor (camera, LiDAR and radar). Besides these sensors we also provide ego motion, calibration data, autogenerated (non-GT) machine labels, and other metadata. Because of the significant size of this dataset, we provide all features (sensor data and autolabels) in chunks of up to 100 clips each. The exception to this chunking is clip-level metadata which we intend for researchers to use to identify which subset of chunks they are interested in downloading according to their target application. Significant storage space and bandwidth savings may be achieved by downloading only chunks corresponding to a subset of sensors, country of collection, dataset split, etc. A python developer kit to support such workflows and additional data format documentation will be made available at https://github.com/NVlabs/physical_ai_av (**COMING SOON**). ### Camera Data This sensor captures visual RGB data (i.e., videos) from multiple viewpoints around the vehicle. In our dataset the following seven cameras are included: - Cross left 120 fov - Cross right 120 fov - Front wide 120 fov - Front tele 30 fov - Rear left 70 fov - Rear right 70 fov - Rear tele 30 fov Directory structure ``` camera/ ├─ camera_front_wide_120fov/ │ ├─ camera_front_wide_120fov.chunk_0000.zip │ └─ ... └─ camera_cross_left_120fov/ └─ ... ``` Each `chunk_xxxx.zip` contains approximately 100 1080p mp4 files recorded at 30fps. Each mp4 will be named `<clip_uuid>.camera_<field_of_view>.mp4`. Users can use this [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) to map across different corresponding views and sensors (provided there is coverage) under the designated sensor directories. The chunks also contain frame timestamps parquets corresponding to the camera mp4 files with a UUID tag in the name. ### LiDAR Data This directory contains 3D point cloud data recorded using a top 360 degree rotating LiDAR. ``` ├─ lidar/ └─ lidar_top_360fov/ ├─ lidar_top_360fov_clip_0000.zip ├─ ... └─ lidar_top_360fov_clip_XXXX.zip ``` Inside `lidar_top_360fov_clip_0000.zip`, there are approximately 100 lidar parquet files. Each parquet will be named `<clip_uuid>.lidar_top360_fov.parquet` and contains approximately 200 lidar spins (i.e. 10Hz capture rate for a 20sec clip). **Parquet Schema** ``` { 'spin_index': int64, # Spin number (0, 1, 2, ...199) 'reference_timestamp': int64, # Reference timestamp (microseconds) 'draco_encoded_pointcloud': binary, # Draco-encoded point cloud } ``` The point cloud can be decoded, e.g., by using the [DracoPy](https://pypi.org/project/DracoPy/) library. ### Radar Data This folder contains 3D radar point clouds data recorded using (up to) 10 radars located in the front bumper center, front left corner, front right corner, left side, right side, rear left corner, rear right corner, rear left, and rear right. ``` radar/ ├─ radar_corner_front_left_srr_0/ │ ├─ radar_corner_front_left_srr_0.chunk_0000.zip │ ├─ ... │ └─ radar_corner_front_left_srr_0.chunk_xxxx.zip ├─ radar_corner_front_right_srr_0/ └─ ... ``` Inside `chunk_XXXX.zip`, there are approximately 100 radar parquet files. Each parquet will be named `<clip_uuid>.radar_<field_of_view>_<configuration>.parquet`. The letters `srr` stand for short range radar, `mrr` for medium range radar, and `lrr` for long range radar. Unlike other sensors, for a clip with radar data coverage, the radar sensors for each field of view can have varying model types, depending on the clip. Therefore, the zip files accompany the numerical reference like in `srr_0`, `srr_3` at the end to denote the radar model reference. **Parquet Schema** ``` { # Index 'scan_index': int64, # Sequential scan number # Timestamps 'timestamp': int64, # System timestamp in microseconds 'sensor_timestamp': int64, # Sensor timestamp in microseconds # Scan Information 'num_returns': int64, # Number of detections in scan 'doppler_ambiguity': float32, # Doppler ambiguity value 'max_returns': float64, # Maximum # of returns (NaN if inapplicable) 'detection_index': int64, # Detection index within scan 'radar_model': uint8, # Radar model identifier # Detection Spatial Data 'azimuth': float32, # Horizontal angle in radians 'elevation': float32, # Vertical angle in radians 'distance': float32, # Distance to target in meters # Detection Kinematics 'radial_velocity': float32, # Radial velocity in m/s # Detection Quality 'rcs': float32, # Radar cross-section in dBsm 'snr': float32, # Signal-to-noise ratio in dB 'exist_probb': uint8, # Existence probability } ``` ### Calibration Data **Camera intrinsics:** Parquet files which contain parameters including [f-theta camera model](https://cdck-file-uploads-global.s3.dualstack.us-west-2.amazonaws.com/nvidia/original/3X/5/0/5043fdcfd10bd984224ac6b4d0d9b6563c685f01.pdf) polynomial coefficients. ``` { # Index (multi-level) 'clip_id': str, # Unique clip identifier UUID 'camera_name': str, # Camera sensor name (7 types) # Image Dimensions 'width': int64, # Image width in pixels 'height': int64, # Image height in pixels # Principal Point (optical center) 'cx': float64, # Principal point X coordinate (pixels) 'cy': float64, # Principal point Y coordinate (pixels) # Backward (Undistortion) f-theta Polynomial Coefficients 'bw_poly_0': float64, # Distortion polynomial coefficient 0 'bw_poly_1': float64, # Distortion polynomial coefficient 1 'bw_poly_2': float64, # Distortion polynomial coefficient 2 'bw_poly_3': float64, # Distortion polynomial coefficient 3 'bw_poly_4': float64, # Distortion polynomial coefficient 4 # Forward (Distortion) f-theta Polynomial Coefficients 'fw_poly_0': float64, # Distortion polynomial coefficient 0 'fw_poly_1': float64, # Distortion polynomial coefficient 1 (focal length) 'fw_poly_2': float64, # Distortion polynomial coefficient 2 'fw_poly_3': float64, # Distortion polynomial coefficient 3 'fw_poly_4': float64, # Distortion polynomial coefficient 4 } ``` **Sensor extrinsics:** sensor pose, i.e., quaternion rotation and x,y,z position, for 7 cameras, 1 LiDAR, and (up to) 10 radars. ``` { 'qx': float64 # Quarternions 'qy': float64 'qz': float64 'qw': float64 'x' : float64 # x,y,z positions for rig coordinate frame 'y' : float64 'z' : float64 } #Rig coordinate origin: Center of the rear axle, projected onto the ground plane. #X-axis: Points forward #Y-axis: Points left (when looking forward) #Z-axis: Points up ``` **Vehicle dimensions** for respective clips in each chunk. ``` { # Index 'clip_id': str, # Unique clip identifier UUID # Vehicle Dimensions (all in meters) 'length': float64, # Vehicle length (front to back) 'width': float64, # Vehicle width (left to right) 'height': float64, # Vehicle height (bottom to top) 'rear_axle_to_bbox_center': float64, # Distance from rear axle to geometric center 'wheelbase': float64, # Distance between front/rear axles 'track_width': float64, # Wheel track width (left to right) } ``` ### Labels **Ego Motion:** in a local coordinate frame consistent across all timestamps with the origin located at the ego vehicle's position at timestamp 0, oriented such that there is 0 yaw at timestamp 0 but otherwise attitude (pitch and roll) are estimated with respect to gravity. ``` { # Timing 'timestamp': int64, # Absolute timestamp in microseconds # Pose - Orientation (Quaternion) 'qx': float64, # Quaternion X component for orientation 'qy': float64, # Quaternion Y component for orientation 'qz': float64, # Quaternion Z component for orientation 'qw': float64, # Quaternion W (scalar) for orientation # Pose - Position in World Frame (meters) 'x': float64, # X position 'y': float64, # Y position 'z': float64, # Z position # Velocity in World Frame (m/s) 'vx': float64, # X velocity 'vy': float64, # Y velocity 'vz': float64, # Z velocity # Acceleration in World Frame (m/s²) 'ax': float64, # X acceleration 'ay': float64, # Y acceleration 'az': float64, # Z acceleration # Vehicle Rotation 'curvature': float64, # Path curvature (1/meters, inverse radius) } ``` **Objects and Road Elements:** (COMING SOON) ### Metadata **Sensor presence parquet:** captures the sensor availability per clip ``` { # Index 'clip_id': str, # Unique clip identifier UUID # Camera Sensors (all bool - True = present, False = absent) 'camera_cross_left_120fov': bool, # Left cross-traffic camera (120° FOV) 'camera_cross_right_120fov': bool, # Right cross-traffic camera (120° FOV) 'camera_front_tele_30fov': bool, # Front telephoto camera (30° FOV) 'camera_front_wide_120fov': bool, # Front wide camera (120° FOV) 'camera_rear_left_70fov': bool, # Rear left camera (70° FOV) 'camera_rear_right_70fov': bool, # Rear right camera (70° FOV) 'camera_rear_tele_30fov': bool, # Rear telephoto camera (30° FOV) # LiDAR Sensor (bool) 'lidar_top_360fov': bool, # Top-mounted 360° LiDAR # Radar Sensors - Corner (SRR = Short Range Radar) 'radar_corner_front_left_srr_0': bool, # Front left corner radar (model type 0) 'radar_corner_front_left_srr_3': bool, # Front left corner radar (model type 3) 'radar_corner_front_right_srr_0': bool, # Front right corner radar (model type 0) 'radar_corner_front_right_srr_3': bool, # Front right corner radar (model type 3) 'radar_corner_rear_left_srr_0': bool, # Rear left corner radar (model type 0) 'radar_corner_rear_left_srr_3': bool, # Rear left corner radar (model type 3) 'radar_corner_rear_right_srr_0': bool, # Rear right corner radar (model type 0) 'radar_corner_rear_right_srr_3': bool, # Rear right corner radar (model type 3) # Radar Sensors - Front Center (LRR = Long Range Radar, MRR = Medium Range Radar) 'radar_front_center_imaging_lrr_1': bool, # Front imaging LRR (model type 1) 'radar_front_center_mrr_2': bool, # Front MRR (model type 2) 'radar_front_center_srr_0': bool, # Front center SRR (model type 0) # Radar Sensors - Rear 'radar_rear_left_mrr_2': bool, # Rear left medium range (model type 2) 'radar_rear_left_srr_0': bool, # Rear left short range (model type 0) 'radar_rear_right_mrr_2': bool, # Rear right medium range (model type 2) 'radar_rear_right_srr_0': bool, # Rear right short range (model type 0) # Radar Sensors - Side 'radar_side_left_srr_0': bool, # Left side short range (model type 0) 'radar_side_left_srr_3': bool, # Left side short range (model type 3) 'radar_side_right_srr_0': bool, # Right side short range (model type 0) 'radar_side_right_srr_3': bool, # Right side short range (model type 3) # Radar Configuration Level 'radar_config': str, # Radar config ('NA', 'low', 'med', 'high') } ``` The final `"radar_config"` column summarizes the fact that there are 4 possible instantiations of row values for the radar sensors collectively, specifically - NA (no radars present) - low (all `srr_0` radars) - med (all `srr_3` radars except for side radars, all `mrr_2` and `lrr_1` radars) - high (all `srr_3` radars, all `mrr_2` and `lrr_1` radars) **Data collection parquet:** contains fields to filter clips by, e.g. country where clip was recorded, the month of the year and time of day. ``` { # Index 'clip_id': str, # Unique clip identifier UUID # Geographic Information 'country': str, # Country where data was collected # Temporal Information 'month': int64, # Month of collection (1-12) 'hour_of_day': int64, # Hour when clip recorded (0-23) # Vehicle Platform 'platform_class': str, # Vehicle platform type (hyperion_8/8.1) } ``` ## Dataset Quantification - Record Count: 1727 hours / 310,895 clips of driving data organized into 20s long clips - Feature Count: 7 cameras, 1 lidar, (up to) 10 radar, ego motion, calibration, machine labels - Measurement of Total Data Storage: ~100TB ## References - [Developer kit and additional documentation on GitHub](https://github.com/NVlabs/physical_ai_av) - For data mining and curation, NVIDIA also provides tools like [Cosmos Dataset Search (CDS)](https://github.com/NVIDIA-Omniverse-blueprints/cosmos-dataset-search) for multimodal semantic search with text and video queries. A subset of this dataset will be explorable through a CDS Preview Experience (coming soon). ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

应用场景：