five

Dual-Modality Dataset of English-Chinese Simultaneous Interpreting at the 80th UN Anniversary High-Level Plenary: Speech Features and Text Quality

收藏
DataCite Commons2026-02-12 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=0ed11cd14c7342a1885fc5773e55f5f1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is derived from the authentic English-Chinese simultaneous interpreting (SI) corpus of the High-level Plenary Meeting Commemorating the 80th Anniversary of the United Nations. It aims to address the unimodal bias in existing diplomatic SI assessment by providing integrated dual-modality data of "speech features + text quality".The dataset includes 46 minutes and 30 seconds of valid audio (44.1kHz/16bit, .wav format) and aligned text, segmented into 90 analysis fragments from three professional UN interpreters. Speech data was preprocessed via Audacity (SNR ≥40dB) and annotated using Praat, covering 6 core speech features (e.g., disfluency rate, intonation alignment, breath control) with inter-annotator consistency ≥95%. Text data was transcribed (Chinese accuracy ≥95%) and evaluated by expert raters across 5 quality dimensions (information integrity, linguistic accuracy, cultural-affective equivalence, pragmatic appropriateness, textual fluency), with inter-annotator reliability Cohen’s κ=0.89.Supplementary files include source language feature annotations (terminology density, syntactic complexity), interpreter strategy classifications (efficiency-, accuracy-, stability-oriented), and statistical analysis protocols. This dataset supports research on speech-text interaction mechanisms, diplomatic SI quality assessment, and interpreter training, with high ecological and construct validity (KMO=0.823).
提供机构:
Science Data Bank
创建时间:
2026-02-12
二维码
社区交流群
二维码
科研交流群
商业服务