End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting

Name: End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting
Creator: figshare
Published: 2025-11-09 00:10:09
License: 暂无描述

DataCite Commons2025-11-09 更新2026-04-25 收录

下载链接：

https://figshare.com/articles/dataset/End-to-end_example-based_sim-to-real_RL_policy_transfer_based_on_neural_stylisation_with_application_to_robotic_cutting/28983659

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains the source datasets used to generate the variational autoencoder model and datasets for style transfer for the paper "End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting" submitted to Nature Scientific Reports.Summary of contents of this repository:vae.zipThis archive contains csvs split into training (train) and validation (val) datasets used to train the variational autoencoder and its conditional variant, as well as CycleGAN generator and discriminator networks. The train and val folders are respectively split into simulated (sim) and real datasets. The contents are as follows:train:sim: 680 itemsreal: 118 itemsval:sim: 20 itemsreal: 30 itemsEach item contains episodic trajectories from contact cutting experiments using a rotary slitting saw tool with different process parameter selection strategies. In simulation, these were taken with pitch angle 0.126rad, radius 25mm, cutter width 0.0005m, 50 cutting elements (flutes) and spindle speed 1000rpm with variable radial depth of cut and feed rates, material mechanistic constants and geometries. In the real world, the same conditions were used with 100 cutting elements and spindle speed 500rpm. The contents of each item are as follows:ee_pos_X - XYZ end-effector position (located at cutter tip)ee_quat_X - WXYZ (scalar-first) quaternion representing end-effector orientationtime_reward - reward for time elapsed prior to task completiondev_reward - reward for deviation from reference pathproductivity_reward - reward heuristic for material removal volumeforce_reward - reward for tool load, measured by force-torque sensorobs_0 - component of velocity parallel to reference pathobs_1-3 - error relative to reference pathobs_4-6 - end-effector velocityobs_7-9 - measured force at force-torque sensorobs_10-12 - measured torque at force-torque sensorobs_13 - time offset to nominal path (represented as time-parameterised B-spline)obs_14 - depth of cut offset to nominal pathobs_15-17 - operational space control stiffnessaction_0 - feed rate adjustment ([0.1x-2x], normalised to range [0-1]) to relative to nominal feed rate (1.5m/min for sim, 0.75m/min for real)action_1 - time derivative of depth of cut offsetaction_2-4 - operational space control stiffness (normalised to range [0,1])terminal_observation_X - terminal observation for obs_X at end of episodeItem suffixies denote the strategy used to the collect the trajectory. Real item suffixes:_origpolicy / _policy - taken with expert policy trained in simulation environment_identif_xxx - taken with fixed process parameters, variable feed rate_bc / _dagger - taken with expert policy adapted with GP force model, trained with BC / DAggerSim item suffixes:rand_baseline - taken with baseline (fixed 1mm depth of cut, 1.5m/min feed rate)rand_dummy - taken with random process parameters, fixed throughout trialrand_policy - taken with expert policyrand_randX - taken with random actions every timestepN.B. Columns obs_0 through obs_12 were used for VAE trainingNOTE: The reward columns in the real folders do not contain meaningful data!policy/This folder contains pickled trajectories, in the form of a Python list.The list's elements are TrajWithRew dataclass objects from the Imitation Python library (https://imitation.readthedocs.io/en/latest/)TrajWithRew contains 4 main fields obs - the (unnormalised) observations, in the form of a [WINDOW_LENGTH * NUM_CHANNELS] array acts - the actions in the form of a [WINDOW_LENGTH - 1 * NUM_ACTS] array infos - the info values at each timestep, as a [WINDOW_LENGTH - 1] array of dicts terminals - boolean indicating if that trajectory segment is a terminal segment rews - the rewards as a [WINDOW_LENGTH - 1] arrayEach TrajWithRew represents not a full episodic trajectory, as is usually the case with Imitiation - rather they represent segments of a full episodic trajectory, of length WINDOW_LENGTH. The observations are of WINDOW_LENGTH, the remaining fields are of length WINDOW_LENGTH - 1. This is to allow a next observation (s') to be given for all transitions (each trajectory can be further decomposed into an array of transitions) which are simply the Markov Decision Process (s,a,r,s') tuples.The filename prefix contains the name of the model used to perform style transfer:st - neural style transfercvae - conditional variational autoencodergan - CycleGANStyle transfer is only carried out on the first 12 observations. The last 5 observations (13:18) are action observations and are left unmodified. These are zeroed during policy re-training to avoid over-fitting. While the observations are "style transferred", the actions are those from the original policy, as rolled out in the simulation environment in which it was trained.<i>rollout_trajectories_x50_denormed</i> contains the raw episodic trajectories from 50 simulation rollouts, containing unnormalised observations, prior to windowing and style transfer.

提供机构：

figshare

创建时间：

2025-05-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集