Dataset for "Capturing Formality in Speech Across Domains and Languages"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13298509
下载链接
链接失效反馈官方服务:
资源简介:
We share the data used in our paper "Capturing Formality in Speech Across Domains and Languages", previously hosted on Google Drive. Corpora available in this dataset release include:
All-India Radio (Hindi) [all_india_radio-20240812T152206Z-001.zip]
Bangor Miami (Spanish-English) [bangor_miami_clean-20240812T152204Z-001.zip]
CallHome (English; Spanish) [callhome-20240812T152201Z-001.zip; callhome-20240812T152201Z-002.zip]
CALLFriend (Hindi) [cf_hindi-20240812T152159Z-001.zip]
HUB4-SE (Spanish) [hub4_se-20240812T152156Z-001.zip]
HUB5 (Mandarin) [hub5_transcript-20240812T152154Z-001.zip]
Multilingual TEDx (English; Spanish) [mtedx_es-en-20240812T152152Z-001.zip, mtedx_es-en-20240812T152152Z-002.zip, mtedx_es-en-20240812T152152Z-003.zip, mtedx_es-en-20240812T152152Z-004.zip, mtedx_es-en-20240812T152152Z-005.zip, mtedx_es-en-20240812T152152Z-006.zip, mtedx_es-en-20240812T152152Z-007.zip, mtedx_es-en-20240812T152152Z-008.zip, mtedx_es-en-20240812T152152Z-009.zip, mtedx_es-en-20240812T152152Z-010.zip, mtedx_es-en-20240812T152152Z-011.zip, mtedx_es-en-20240812T152152Z-012.zip, mtedx_es-en-20240812T152152Z-013.zip, mtedx_es-en-20240812T152152Z-014.zip, mtedx_es-en-20240812T152152Z-015.zip, mtedx_es-en-20240812T152152Z-016.zip, mtedx_es-en-20240812T152152Z-017.zip, mtedx_es-en-20240812T152152Z-018.zip, mtedx_es-en-20240812T152152Z-019.zip]
Multitarget TED (English; Mandarin) [multitarget-ted-20240812T152149Z-001.zip]
IIT-B (Hindi) [parallel-n-20240812T042826Z-001.zip]
TDT4 (English; Mandarin) [tdt4_multilingual_news-20240812T152131Z-001.zip]
TED Talks India (Hindi) [ted_talks_hindi-20240812T042346Z-001.zip]
UN (Mandarin) [UNv1.0.en-zh-002.en.zip; UN-20240812T040956Z-002.zip; UN-20240812T040956Z-003.zip]
YouTube (English; Spanish; Hindi; Mandarin) [youtube-20240812T040730Z-001.zip; youtube-20240812T040730Z-002.zip; youtube-20240812T040730Z-003.zip]
All-CS (Hindi-English) [All-CS.json]
Europarl v7 (Spanish) [europarl-v7.es-en.es]
If using our YouTube and/or TED Talks India corpora, please cite our paper:
Bhattacharya, D., Chi, J., Hirschberg, J., Bell, P. (2023) Capturing Formality in Speech Across Domains and Languages. Proc. INTERSPEECH 2023, 1030-1034, doi: 10.21437/Interspeech.2023-1852
创建时间:
2024-08-12



