Dual-Modality Dataset of English-Chinese Simultaneous Interpreting at the 80th UN Anniversary High-Level Plenary: Speech Features and Text Quality

Name: Dual-Modality Dataset of English-Chinese Simultaneous Interpreting at the 80th UN Anniversary High-Level Plenary: Speech Features and Text Quality
Creator: Science Data Bank
Published: 2026-02-12 04:42:31
License: 暂无描述

DataCite Commons2026-02-12 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=0ed11cd14c7342a1885fc5773e55f5f1

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is derived from the authentic English-Chinese simultaneous interpreting (SI) corpus of the High-level Plenary Meeting Commemorating the 80th Anniversary of the United Nations. It aims to address the unimodal bias in existing diplomatic SI assessment by providing integrated dual-modality data of "speech features + text quality".The dataset includes 46 minutes and 30 seconds of valid audio (44.1kHz/16bit, .wav format) and aligned text, segmented into 90 analysis fragments from three professional UN interpreters. Speech data was preprocessed via Audacity (SNR ≥40dB) and annotated using Praat, covering 6 core speech features (e.g., disfluency rate, intonation alignment, breath control) with inter-annotator consistency ≥95%. Text data was transcribed (Chinese accuracy ≥95%) and evaluated by expert raters across 5 quality dimensions (information integrity, linguistic accuracy, cultural-affective equivalence, pragmatic appropriateness, textual fluency), with inter-annotator reliability Cohen’s κ=0.89.Supplementary files include source language feature annotations (terminology density, syntactic complexity), interpreter strategy classifications (efficiency-, accuracy-, stability-oriented), and statistical analysis protocols. This dataset supports research on speech-text interaction mechanisms, diplomatic SI quality assessment, and interpreter training, with high ecological and construct validity (KMO=0.823).

提供机构：

Science Data Bank

创建时间：

2026-02-12