five

Vision Language Actions using latent representation as action tokens

收藏
DataCite Commons2026-02-04 更新2026-05-04 收录
下载链接:
https://orkg.org/comparison/R1582192
下载链接
链接失效反馈
官方服务:
资源简介:
Vision Language Action (VLA) models use latent representations as action tokens, meaning actions are not hard-coded or discretized beforehand, but learned as compact latent vectors. These latents are generated by the model (often via an autoencoder or diffusion model) and treated like tokens in a sequence, enabling unified reasoning over perception, language, and control while improving generalization and scalability across tasks and embodiments.
提供机构:
Open Research Knowledge Graph
创建时间:
2026-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作