five

Dataset for "Capturing Formality in Speech Across Domains and Languages"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13298509
下载链接
链接失效反馈
官方服务:
资源简介:
We share the data used in our paper "Capturing Formality in Speech Across Domains and Languages", previously hosted on Google Drive. Corpora available in this dataset release include: All-India Radio (Hindi) [all_india_radio-20240812T152206Z-001.zip] Bangor Miami (Spanish-English) [bangor_miami_clean-20240812T152204Z-001.zip] CallHome (English; Spanish) [callhome-20240812T152201Z-001.zip; callhome-20240812T152201Z-002.zip] CALLFriend (Hindi) [cf_hindi-20240812T152159Z-001.zip] HUB4-SE (Spanish) [hub4_se-20240812T152156Z-001.zip] HUB5 (Mandarin) [hub5_transcript-20240812T152154Z-001.zip] Multilingual TEDx (English; Spanish) [mtedx_es-en-20240812T152152Z-001.zip, mtedx_es-en-20240812T152152Z-002.zip, mtedx_es-en-20240812T152152Z-003.zip, mtedx_es-en-20240812T152152Z-004.zip, mtedx_es-en-20240812T152152Z-005.zip, mtedx_es-en-20240812T152152Z-006.zip, mtedx_es-en-20240812T152152Z-007.zip, mtedx_es-en-20240812T152152Z-008.zip, mtedx_es-en-20240812T152152Z-009.zip, mtedx_es-en-20240812T152152Z-010.zip, mtedx_es-en-20240812T152152Z-011.zip, mtedx_es-en-20240812T152152Z-012.zip, mtedx_es-en-20240812T152152Z-013.zip, mtedx_es-en-20240812T152152Z-014.zip, mtedx_es-en-20240812T152152Z-015.zip, mtedx_es-en-20240812T152152Z-016.zip, mtedx_es-en-20240812T152152Z-017.zip, mtedx_es-en-20240812T152152Z-018.zip, mtedx_es-en-20240812T152152Z-019.zip] Multitarget TED (English; Mandarin) [multitarget-ted-20240812T152149Z-001.zip] IIT-B (Hindi) [parallel-n-20240812T042826Z-001.zip] TDT4 (English; Mandarin) [tdt4_multilingual_news-20240812T152131Z-001.zip] TED Talks India (Hindi) [ted_talks_hindi-20240812T042346Z-001.zip] UN (Mandarin) [UNv1.0.en-zh-002.en.zip; UN-20240812T040956Z-002.zip; UN-20240812T040956Z-003.zip] YouTube (English; Spanish; Hindi; Mandarin) [youtube-20240812T040730Z-001.zip; youtube-20240812T040730Z-002.zip; youtube-20240812T040730Z-003.zip] All-CS (Hindi-English) [All-CS.json] Europarl v7 (Spanish) [europarl-v7.es-en.es] If using our YouTube and/or TED Talks India corpora, please cite our paper: Bhattacharya, D., Chi, J., Hirschberg, J., Bell, P. (2023) Capturing Formality in Speech Across Domains and Languages. Proc. INTERSPEECH 2023, 1030-1034, doi: 10.21437/Interspeech.2023-1852
创建时间:
2024-08-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作