Spatiotemporal processing of real faces is supported by dissociable visual-sensing-modulated neural circuitry
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.rv15dv4j4
下载链接
链接失效反馈官方服务:
资源简介:
Real faces elicit both unique patterns of visual sensing behavior and of neural activity. To investigate the relationship between these phenomena, twenty participants underwent simultaneous acquisition of functional near infrared spectroscopy (fNIRS), electroencephalography (EEG), and eye-tracking while they viewed a real human face or a control robot face. We hypothesized that neural processing of real faces is influenced by patterns of visual acquisition. Regression analyses of fNIRS and eye-tracking revealed real-face-specific modulation of the right lateral (peak-t=3.68, p=0.001) and dorsal (peak-t=3.85, p<0.001) visual streams by fixation duration and dwell time, respectively. Standardized low-resolution brain electromagnetic tomography (sLORETA) identified significant alpha (8-13hz) oscillatory activity in the lateral and dorsal parietal clusters during human face viewing, suggesting a role for temporal binding in processing faces. These findings are consistent with our hypotheses and point to dissociable roles for the lateral and dorsal visual streams in live face processing.
Methods
1.1. Face viewing paradigm
Participants completed two runs of a face viewing paradigm with two partners (Figure 1). The paradigm is similar to methods described previously35–37,63–65. The face viewing paradigm alternated periods of face viewing and no face viewing (Figure 1A). Participants were seated 140 centimeters (cm) in front of and facing a partner with a “Smart Glass” divider (Smart Glass Country, Vancouver, Canada) at the midpoint between them. The divider can be made transparent or opaque under computer control, with a switching latency of ~10ms. Participants faced straight ahead, keeping a consistent head posture while the opacity of the smart glass was toggled between transparent and opaque, allowing or preventing them from seeing their partner. When they could see their partner, participants were instructed to look at the face, allowing their eyes to move as felt comfortable and natural, and when they could not see it, they were instructed to keep their eyes open and focused on a dot centered on the divider.
Runs (Figure 1B) consisted of five 16-second (s) long task blocks alternating with 15-s rest blocks. Task blocks were divided into three 3.4-s face-viewing events (orange lines) which alternated with 3-s no-face-viewing periods (short blue lines). Task blocks were subdivided because extended face viewing can cause social discomfort that confounds results. Events were further subdivided into epochs, described in section 7.2.
1.2. Face viewing epochs and “flicker” disruption to face view
The previously developed face viewing paradigm35–37 was modified by segmenting the face viewing events into two 1600-milisecond (ms) face viewing epochs separated by a 200-ms (opaque divider) disruption to face view (Figure 1, grey inset). This leverages the high temporal resolution of EEG, enabling comparison of the average initial and subsequent epoch to each other. This serves two exploratory purposes. First, it allows built-in replication of results if the same pattern emerges across both epochs. Second, it allows assessment of repetition suppression. While the second epoch on its own is not interpretable due to the inability to dissociate the neural impact of face viewing, disruption, and autocorrelation, decrease in activity during the second epoch would suggest sensitivity to face viewing continuity or repetition suppression.
1.3. Human and robot partners
Participants performed the task with two partners: a real human and a dynamic robot called Maki36,66,67. Partner order was counterbalanced, and conditions with partners were completed on different days. The human partner allowed their eyes to move over the participant’s face as felt comfortable and natural, while maintaining a neutral expression; a steady breathing and blinking pattern; and a still posture. The clothing, hairstyle, and make-up of the human partner were allowed to vary. In all cases, the human partner was a young adult female.
The robot partner Maki (Figure 1, black inset) was a 3D-printed bust designed by HelloRobo (Atlanta, Georgia) which simulates human head and eye-movements but otherwise lacks human features, movements, or reactive capabilities. Maki was chosen due to the design emphasis on eye movements, as well as its similarity to the overall size and organization of the human face, controlling for the appearance, immediacy, and movements of a human face36,68. Maki’s eyes have comparable components to human eyes: whites surrounding a colored iris with a black “pupil.” An Arduino IDE controlling six servo motors were used to drive Maki’s movements, creating six degrees of freedom: head turn left-right and tilt up-down; left eye moves left-right and up-down; right eye move left-right and up-down; and eye-lids open-close. During runs, Maki engaged in a pseudorandom pattern of naturalistic blinks69 and saccade- and fixation-like eye movements. “Fixations” occurred at one of nine points of a three-by-three grid, with “saccades” from one point to another determined pseudo-randomly. Movements were driven by custom scripts written using MATLAB 2019a (MathWorks, Massachusetts, USA). Before performing the task, participants were introduced to Maki through a practice run, familiarizing them with its movements. This was to minimize neural effects related to novelty or surprise. Maki’s head angle was also manually adjusted until the participant reported that the robot was looking at them.
1.4. Participants
Twenty-one participants (11 female, mean age = 35.6 years) were recruited using publicly posted flyers, internet postings, and word of mouth. They were enrolled in the order that they expressed interest and passed a finger thumb tapping screening procedure to establish fNIRS signal validity70. Nineteen subjects completed the task with both human and robot partner, while two completed it with a single partner—one with human and one with robot, resulting in a total of n=20 per partner. The subject pool was further limited due to the following: one subject was excluded from all regression analyses as the eye tracker could not detect their eyes; two subjects were excluded from human regression analyses due to, in one case, insufficient fNIRS optode-scalp contact during human runs and, in the other, technical error in eye tracking data collection during human runs.
1.5. Environment
During the experiment, the overhead lights of the room were extinguished. An experimenter was present out of the view of the participant during each run. Two directed lights were used to fully luminate the partner’s face and eliminate shadows. Two diffuse bar-lights were used to softly illuminate the opaque SmartGlass during no-face-viewing periods in order to suppress the participant’s reflection. Participants only reported being able to see the rough outline of their head in the SmartGlass, with no internal features visible. These were on throughout run durations. Partners had comparable luminance (Human partner: 42.4 lumens; Robot partner: 42.6 lumens), with slightly lower luminance than the opaque smart glass (44.0 lumens). Luminance was measured with a luxmeter positioned at a typical eye height, placed 70cm from and facing the smart glass divider and 140cm from the partner.
1.6. Multimodal data collection
FNIRS, EEG and eye-tracking were collected simultaneously in order to assess cortical hemodynamics, neural oscillations, and eye behavior during the face viewing task.
1.6.1. Functional Near Infrared Spectroscopy (fNIRS)
FNIRS signal acquisition, optode localization, and signal processing were similar to methods described previously35–37,63,64. Data were collected via a multichannel continuous-wave system (LABNIRS, Shimadzu Corporation, Kyoto, Japan) consisting of forty emitter-detector optode pairs. During the task, optodes were connected to a cap placed on the participant’s head based on size to fit comfortably. For consistency in cortical coverage, the middle anterior optode was placed 1 cm above the nasion; the middle posterior optode was placed in line with the inion; and the CZ optode was aligned with the anatomical CZ. After cap placement, hair was cleared from optode holders using a lighted fiber-optic probe (Daiso, Hiroshima, Japan) prior to optode placement. Optodes were arranged in a matrix, contacting the scalp, enabling acquisition of 128 channels. After optode placement and prior to beginning the experiment, signal-to-noise ratio was assessed by measuring attenuation of light for each channel, with adjustments made as needed71,72.
1.6.2. Eye-tracking
Eye-movements were recorded using a desk-mounted Tobii Pro (Stockholm, Sweden) X3-120 eye-tracking system placed 70cm in front and slightly below the participant’s face. Eye behavior was recorded at 120 Hz. The Eye tracker was calibrated for each participant using a transparent plane with three dots placed around the face of the partner. Participants were instructed to look at each dot in turn, and each gaze angle was recorded. Calibration was confirmed by having participants look at each eye and the nose of the partner and confirming alignment. Synchronized scene video capturing the participant’s view of the partner was recorded at 30 Hz with a resolution of 1280x720 pixels using a Logitech c920 camera (Lausanne, Switzerland) positioned directly behind and above the participants’ head. This enabled tagging of participant looking behavior within a manually placed “face box” (Section 7.7.2).
1.6.3. Electroencephalography (EEG)
EEG data were acquired via a 256-Hz, 32-electrode dual-bioamplifer g.USB Amp system (G. Tec Medical Engineering, Austria). The electrode layout was adapted from the 10-10 system to accommodate optode placement on the fNIRS cap. Saline conducting gel was manually placed for each electrode after optode placement to ensure scalp contact. Scalp contact was manually reviewed per electrode using an oscilloscope, and adjustments were made as needed.
1.6.4. Recording of individual optode and electrode placement
After completion of the tasks, locations of optodes and electrodes were recorded for each participant using the Structure Sensor scanner (XRPro LLC, Saratov, RU) which creates a 3D model of the participant’s head and cap73,74. Locations of the standard anatomical landmarks nasion, inion, cz, t3, and t4 as well as optode locations were manually placed on the 3D model. Electrode locations were then determined by calculating the midpoint between surrounding optodes. Locations were then corrected for cap drift using custom MATLAB scripts which rotated optode and electrode locations around the Montreal Neurological Institute (MNI) X-axis from left ear towards the midline. This was done to bring the cz optode in line with the anatomical cz according to original placement in order to account for stereotyped tilting of the cap towards the left ear that could occur during optode removal. Participant scans were normalized to MNI coordinates75 using NIRS-SPM76.
1.7. Preprocessing and analyses
A flow chart of preprocessing and analysis steps is shown in supplemental Figure S1.
1.7.1. fNIRS Preprocessing and main effect analysis
Analyses were conducted using NIRS-SPM76 and custom scripts in MATLAB 2019a. Raw fNIRS optical density data were converted to changes in relative chromophore concentrations using a Beer-Lambert equation77,78. Baseline drift was removed using NIRS-SPM wavelet detrending79. Global components attributable to blood pressure and other systemic effects80 were removed using a principal component analysis spatial filter81,82. Main effect general linear models (GLM)83 of face viewing > no-face viewing were constructed by convolving a boxcar model of events and rests (Figure 1B) convolved with the canonical hemodynamic response function provided in SPM884. Face viewing events consisted of both the initial and subsequent face viewing separated by the 200-ms disruption (Figure 1, grey inset). GLMs were subsequently fit to preprocessed data, providing beta values for each channel per participant and partner. Individual channel beta values were projected into normalized voxel space, and voxels were group averaged then rendered on the MNI brain. Analyses were performed on the combined OxyHb-deOxyHb signal85,86 which is calculated by adding the absolute values of concentration change of OxyHb and deOxyHb. This combined signal reflects through a single value the expected and well-established task-related anticorrelated increase in OxyHb and concurrent decrease in deOxyHb87. For each partner, task related activity was determined by contrasting voxel-wise activity during task blocks (Figure 1B, orange lines) with that during rest blocks (blue lines), which identifies cortical regions which show more activity during partner face viewing than no face viewing. The main effect GLM was used to assess replication of prior findings of increased right supramarginal gyrus activity occurring during real human face viewing as compared to robot face viewing36.
1.7.2. Eye tracking preprocessing and behavioral analyses
Eye tracking data were processed using The Tobii Velocity-Threshold Identification Gaze Filter with default parameters88. Eye behavior was calculated from the average of both eyes. Noise reduction was completed using a moving median filter over three samples. Velocity was calculated using a window length of 20ms. Interpolation of gaps in data were not used. Looking behavior was determined using a “face-box” manually drawn around each partner’s face. The face box was an oval that encompassed eyes, nose, mouth, forehead, cheeks, and chin, but excluded hair and ears as much as possible. When the eyes of the participant fell within the bounds of the face-box of their partner, it was considered a “face hit.” Data points in which neither eye was detected due to technical issues or eye-blinks were considered invalid and excluded from subsequent calculations. Face-viewing-events with more than one-third of data points being invalid were excluded from analyses.
1.7.3. Fixation duration and dwell time calculation
Fixations were identified using the Tobii Pro Lab (version 1.171, Stockholm, Sweden) feature detection algorithm with the default settings. Fixation threshold was 30°/s and physically adjacent datapoints were treated as a single fixation if they occurred within 75ms and 0.5° of visual angle. Fixations which began during a face viewing event were identified and their durations averaged per event. Dwell time—that is, cumulative face viewing time—was calculated from the amount of time per face viewing event that the eyes were within the partner’s face box, whether in fixation or not. To compare dwell time behavior, total time that the eyes of the participant were within the face box of the partner was calculated as a proportion of total time the face was visible per run. For the linear regression, event dwell time was calculated as the proportion of time spent in the face box per each event.
1.7.4. Linear regression analyses combining fNIRS and eye tracking
Two regression analyses were conducted: one with mean fixation duration and one with dwell time. Each was added as a regressor to the GLM. Regressors were a boxcar model for each eye metric with the height of each event boxcar reflecting either the dwell time (ms) or the average fixation duration (ms) for that event. Values were then demeaned per run and events with insufficient data (>1/3 of data points invalid) were set to zero. The resulting model was then convolved with the canonical hemodynamic response function. The regressions were fit to each run to calculate model fit. Group averages were calculated per partner to identify regions of brain activity which were best explained by eye behavior during face viewing events. Resulting positive clusters can be interpreted as increasing in a manner that correlates with the eye behavior when viewing a face but not during rests. Regions which show a distinct relationship to behavioral metrics based on partner are interpreted as being involved in stimulus-specific visual sensing.
1.7.5. EEG preprocessing.
Preprocessing was conducted using MATLAB, EEGLAB 2021 (Swartz Center for Computational Neuroscience, California, USA)89 and Brainstorm 3 90. Data were band pass filtered to 1-100Hz. 60hz line noise was removed using the CleanLine EEGLAB extension. Large, non-blink artifacts and bad channels were manually identified and removed. Data were re-referenced to the average channel (Miyakoshi, n.d.). Independent component analysis was conducted using the MATLAB runica function and ICLabel91 was used to identify component sources. Components with <5% chance of having a neural source, as well as any which were manually identified as blink or eye movement components, were removed. Data were filtered to the alpha frequency band (8-13hz) for the analyses of interest: alpha frequency changes have been related to and co-localized with changes in blood oxygen signals and thus could feasibly drive differences in hemodynamics92–94. Data were then epoched to -1.5 – 5s from the onset of the face-viewing events and were averaged per person. This epoch length was chosen to enable temporal localization of alpha fluctuations and encompasses both the initial and subsequent face viewing epoch described in section 7.2. Similar treatment was performed on data filtered to the theta frequency band (4-8hz) in order to assess whether significant differences were specific to the alpha frequency or tied to more general processing found in multiple frequency bands. The theta band was chosen for comparison because it has also been tied to face processing95–97 and to spatial binding50. We reasoned that if differences are simply the result of real faces being more engaging—as opposed to being due to unique temporal binding demands—then we would see real face specific engagement of general binding processes and thus would see significant differences between human and robot in both alpha and theta frequency bands.
1.7.6. Source estimation analysis
Source localization was conducted using Brainstorm’s Standardized low-resolution brain electromagnetic tomography (sLORETA)98 function. SLORETA was chosen as it is highly accurate and stable98,99. The default Montreal Neurological Institute (MNI) anatomy with boundary element method (BEM) modelling was used for all subjects, as anatomical MRI scans were not obtained. The MNI head model was linearly warped to each subject’s anatomical landmarks: T3, T4, Cz, inion, and nasion. Noise covariance was calculated using a pre-stimulus resting state baseline from -1000-0ms100. The data covariance matrix was calculated using 0-1600ms. Default regularization parameters were used: 0.1 for noise and 3 for signal to noise ratio101. Minimum norm imaging sLORETA, constrained normal to the cortex, was used to create a whole cortex current flow model per subject. Single subject models were then projected to the un-warped MNI brain template, smoothed with a 3mm kernel, and z-score normalized to each subject’s baseline of -1000-0ms. Group averages were then calculated per partner.
1.7.7. Regions of interest
The largest continuous clusters from each of the linear regression analyses (section 2.7.3) were used to define anatomical regions of interest (ROIs) for assessing oscillatory activity. MNI coordinates for the centroid of each anatomical subregion for the two clusters—one corresponding to the lateral cortex and one to the dorsal parietal cortex—are as follows. The right dorsal parietal cortex cluster was made up of the superior parietal lobule [30, -51, 69], inferior parietal lobule [59,-50,47], and dorsal post-central gyrus [26, -33, 75]. The right lateral cortex cluster was made up of the supramarginal gyrus [66, -28, 32], ventral postcentral gyrus [68, -12, 24], and ventral precentral gyrus [64,9,20].
1.7.8. Significance determination and comparison of partners and epochs
Source estimated current flow was extracted from anatomical ROIs, absolute valued, and summed with the other cluster regions to form the dorsal and lateral EEG traces. Analyses were conducted on the absolute values to ensure that phases, which are arbitrarily allocated in sLORETA, did not cancel out and thus artificially suppress results. The alpha current flow envelope was calculated using the MATLAB function envelope with the type “peak” and a window of 100ms. It was then converted to log significance values reflecting change from baseline as described below, and these values were then used to determine local maxima that were significantly different between human and robot partner as well as initial and subsequent epoch.
A local maximum was defined as significantly greater than the comparable time point in the comparator if it met the following criteria: 1) The maximum was significantly greater than its own baseline (z>2.33; p<.01); and 2) the p-value of the maximum was more than an order of magnitude larger than the comparable time point and all time points within 100ms in the comparator. For example, if a maximum occurred at 100ms at an p-value=0.002, then in order to be significantly different than the comparator according to the above metrics, all time points from 0-200ms within the comparator would have to have a p-value>0.02.
Differences were calculated between the initial epoch (Figure 1, grey inset) of human and robot face viewing in order to test the hypothesis that alpha activity was significantly greater during human than robot face viewing. Comparisons were also conducted between the initial and subsequent epoch of human face viewing to assess exploratory questions about repetition suppression. If a local maximum is significantly greater during the initial than at the comparable timepoint in the subsequent epoch, then we can conclude that the signal in that region may be sensitive to repetition. Local maxima that are significantly greater than baseline and comparable across both epochs would suggest the region is not susceptible to repetition. Such a result would also give greater evidence of real-face specificity as it is a built-in replication of results.
创建时间:
2025-07-09



