Vision Language Actions using latent representation as action tokens
收藏DataCite Commons2026-02-04 更新2026-05-04 收录
下载链接:
https://orkg.org/comparison/R1582192
下载链接
链接失效反馈官方服务:
资源简介:
Vision Language Action (VLA) models use latent representations as action tokens, meaning actions are not hard-coded or discretized beforehand, but learned as compact latent vectors. These latents are generated by the model (often via an autoencoder or diffusion model) and treated like tokens in a sequence, enabling unified reasoning over perception, language, and control while improving generalization and scalability across tasks and embodiments.
提供机构:
Open Research Knowledge Graph
创建时间:
2026-02-04



