RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/5202742

下载链接

链接失效反馈

官方服务：

资源简介：

A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering. File descriptions RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures. RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths. geometry.zip (2 Mb) - Geometry CSV files used for scenes 01 to 20. The Bezier curve defines the geometry of the rail profile (10 mm intervals). PhysicalDataset.7z (2.0 Gb) - A smaller, secondary dataset of 320 manually annotated photographs of railway environments; only the railway profiles are annotated. 01.7z-20.7z (2.0 Gb each) - Archive of each scene (01 through 20). all_list.txt, training_list.txt, validation_list.txt - Text files containing the all the scene names, together with those used for validation (validation_list.txt) and training (training_list.txt), used by MVSNet index.csv - CSV file provides a convenient reference for all the sample files, linking the corresponding file and relative data path. NOTE: Only 20 of the original 40 scenes are made available owing to size limitations of the data repository. This is still adequate for the purposes of training MVS neural networks. The Blender file is made available specifically to render out the scenes for different applications or adapt the camera framework altogether for different applications. Please refer to the corresponding manuscript for additional details. Steps to reproduce The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices. import numpy as np from scipy.spatial.transform import Rotation as R #The intrinsic matrix K is constructed using the following formulation: focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm K = [[focalLengthPixel, 0, dimX/2], [0, focalPixel, dimY/2], [0, 0, 1]] #The rotation vector as provided by Blender was first transformed to a rotation matrix: r = R.from_euler('xyz', vectR, degrees=True) matR = r.as_matrix() #Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system: R_world2bcam = np.transpose(matR) #The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is: R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]]) #Thus the representation from WORLD to CV/STANDARD coordinates is: R_world2cv = R_bcam2cv.dot(R_world2bcam) #The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates: T_world2bcam = -1 * R_world2bcam.dot(vectC) T_world2cv = R_bcam2cv.dot(T_world2bcam) The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided. Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.

创建时间：

2024-07-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集