Fuzzy Completion and Multimodal Reasoning for 3D Human Pose Estimation via Large Language Models
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/fuzzy-completion-and-multimodal-reasoning-3d-human-pose-estimation-large-language-models
下载链接
链接失效反馈官方服务:
资源简介:
3D human pose estimation methods primarily rely on regression-based modeling or graph-based representations. However, they still face to challenges such as missing keypoint occlusions, pose ambiguities in 2D-to-3D mapping, or the effective fusion of visual and linguistic modalities, resulting in limited generalization and reduced accuracy in complex scenarios. Based on the above-mentioned issues, this paper presents FC-PoseLM, a novel framework which integrates a neuro-fuzzy structure with a multimodal large language model(LLM). Specifically, FC-Net is designed as a fuzzy-enhanced neural network with a parallel multi-scale receptive fields architecture. This architecture enables the model to capture local-to-global spatial dependencies of the human body. This module effectively reconstructs incomplete 2D skeletons and generates keypoint heatmaps to enhance visual representations. Besides the fuzzy experts are proposed to improve the representation of fuzzy spatial mapping through hierarchical structural modeling. The structure-aware visual features, along with natural language descriptions, are jointly embedded into the SMPL parameter space through a multimodal LLM, enabling direct 3D human pose inference via joint semantic-structural understanding. Experimental results show that our proposed FC-PoseLM outperforms existing multimodal LLM on mainstream 3D pose datasets such as Human3.6M and 3DPW, demonstrating its ability to understand and generate 3D human poses through complex inference mechanisms, and providing new inspirations for human pose analysis.
提供机构:
han wen



