five

Fuzzy Completion and Multimodal Reasoning for 3D Human Pose Estimation via Large Language Models

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/fuzzy-completion-and-multimodal-reasoning-3d-human-pose-estimation-large-language-models
下载链接
链接失效反馈
官方服务:
资源简介:
3D human pose estimation methods primarily rely on regression-based modeling or graph-based representations. However, they still face to challenges such as missing keypoint occlusions, pose ambiguities in 2D-to-3D mapping, or the effective fusion of visual and linguistic modalities, resulting in limited generalization and reduced accuracy in complex scenarios. Based on the above-mentioned issues, this paper presents FC-PoseLM, a novel framework which integrates a neuro-fuzzy structure with a multimodal large language model(LLM). Specifically, FC-Net is designed as a fuzzy-enhanced neural network with a parallel multi-scale receptive fields architecture. This architecture enables the model to capture local-to-global spatial dependencies of the human body. This module effectively reconstructs incomplete 2D skeletons and generates keypoint heatmaps to enhance visual representations. Besides the fuzzy experts are proposed to improve the representation of fuzzy spatial mapping through hierarchical structural modeling. The structure-aware visual features, along with natural language descriptions, are jointly embedded into the SMPL parameter space through a multimodal LLM, enabling direct 3D human pose inference via joint semantic-structural understanding. Experimental results show that our proposed FC-PoseLM outperforms existing multimodal LLM on mainstream 3D pose datasets such as Human3.6M and 3DPW, demonstrating its ability to understand and generate 3D human poses through complex inference mechanisms, and providing new inspirations for human pose analysis. 
提供机构:
han wen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作