Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://digitallibrary.usc.edu/asset-management/2A3BF1MZ8OLM
下载链接
链接失效反馈官方服务:
资源简介:
In this dissertation we propose contributions that address the problems in behavioral signal processing for small-group interactions from three important perspectives. We propose algorithmic contributions to general statistical inference methods for interacting dynamical systems, in particular multi-object tracking problems. We propose multi-modal, multi-channel signal processing methods to address particular aspects of the small group interaction, with emphasis on speaker segmentation and speaker/participant tracking. Finally, we present a recording environment, a collected dyadic interaction database and propose methods for estimation of approach-avoidance behavior labels based on non-verbal interaction cues. ❧ In the first part of this dissertation we present a class of sequential block sampling algorithms for tracking unknown and variable number of objects. Proposed algorithms are applicable to multi-object tracking scenarios in which only available observations are detector outputs, and also to scenarios where both detector outputs and more complex observations which figure in the data-association free likelihood models. Proposed algorithms provide a way to construct block proposal distributions using detection based observations. Key parts of the proposed algorithms are methods for sampling block proposal distributions. We propose two novel methods for this purpose, one is based on a variational approximation scheme and the other represents an adaptive MCMC sampling scheme. Samples from block proposal distributions are further used in the sequential MCMC (or SMC) framework. We tested proposed schemes on two synthetic datasets. Results demonstrate benefits of processing longer observation sequences in multi-object tracking problems in a more efficient manner that the classical sequential sampling schemes. ❧ In the second part, we present a multi-target tracking algorithm for algorithm for tracking multiple speakers by a microphone array. As the microphone array observations do not provide an easy way to design speaker location detectors we propose a mixture particle filter for tracing multiple acoustic sources track-before-detection (TbD) framework. This method belongs to the same class of sequential signal processing algorithms (SMC or MCMC) as the block sampler proposed in the first part, while the major difference is that block sampler belongs to the detect-before-tracking class of algorithms. The sound source trajectories reconstructed by by the mixture particle filter do not necessarily correspond to speech only. Therefore, we apply an adapted optimal change point algorithm to segment obtained sound source trajectories into speech and non-speech segments. The algorithm is tested on a multi-participant meeting database as a separate module and as a part of a multi-modal system for automatic meeting monitoring. In both cases it provided significant improvements on the speaker detection and segmentation tasks. ❧ In the third part, we present a modality fusion algorithm that exploits complementary properties of video tracking, microphone array localization and speaker identification and solves the problem of speaker segmentation in presence of the overlapped speech. In this paper we address improvements to our multimodal system for ❧ tracking of meeting participants and speaker segmentation with a focus on the microphone array modality. We propose an algorithm that uses Directions-of-Arrival estimated for each microphone pair as observations and performs tracking of an unknown number of acoustically-active meeting participants and subsequent speaker ❧ segmentation. The proposed algorithm is unique from multiple perspectives. First, we suggest a hidden Markov model architecture that performs fusion of three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel likelihood model for the microphone array observations for dealing with overlapped speech. We propose a modification of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function that takes into the account possible microphone occlusions. We employ the multi-object detect-before-tracking approach and use the local maxima of the modified SPR-GCC-PHAT functions as sound source detectors. Multiple detection locations are fused into the joint likelihood by the joint probabilistic data association. This transforms an original speaker segmentation problem in the multi-object tracking framework where it is solved using Bayesian filtering/smoothing methods. ❧ This concludes exposition on the core signal processing algorithms closely related to the multi-object tracking and the last part of the dissertation is dedicated to the analysis and automation of human behavior coding in small group interactions. ❧ We present a new multi-modal database for analysis of participant behaviors in dyadic interactions. This database contains multiple channels with close- and far-field audio, a high definition camera array and motion capture data. Presence of the motion capture allows precise analysis of the body language low-level descriptors and its comparison with similar descriptors derived from video data. Data is manually labeled by multiple human annotators using psychology-informed guides. We analyzed relation between approach-avoidance (A-A) behavior and various non-verbal body language and acoustic features, and influence of the audio and video channels on experts' labeling decisions. Also we analyzed dependency of the statistical interaction descriptors and A-A labels on participants' roles. ❧ At the end, we propose an ordinal regression (OR) algorithm and its extension applicable to time series for estimation the approach-and-avoidance (AA) behavior quantifiers (lables) in human dyadic interactions. The proposed algorithm transforms the ordinal regression to multiple binary classification problems, solves them by independent score-outputting classifiers and fits the cumulative logit logistic regression model with proportional odds (CLLRMP) the classifier score vectors. The time series extension treats labels as states of the hidden Markov model with likelihood based on the probabilistic CLLRMP output. We compare performances of the proposed algorithm applying the weighted binary SVMs the second step (SVM-OLR), its extension (HMM-SVM-OLR) and the baseline multi-class SVM. The HMM-SVM-OLR achieves the highest estimation accuracy.
创建时间:
2024-01-31



