five

Fhrozen/CABankSakura

收藏
Hugging Face2022-12-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Fhrozen/CABankSakura
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - crowdsourced - expert-generated language: - ja license: - cc multilinguality: - monolingual size_categories: - 100K<n<1M source_datasets: - found task_categories: - audio-classification - automatic-speech-recognition task_ids: - speaker-identification pretty_name: banksakura tags: - speech-recognition --- # CABank Japanese Sakura Corpus - Susanne Miyata - Department of Medical Sciences - Aichi Shukotoku University - smiyata@asu.aasa.ac.jp - website: https://ca.talkbank.org/access/Sakura.html ## Important This data set is a copy from the original one located at https://ca.talkbank.org/access/Sakura.html. ## Details - Participants: 31 - Type of Study: xxx - Location: Japan - Media type: audio - DOI: doi:10.21415/T5M90R ## Citation information Some citation here. In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references. ## Project Description This corpus of 18 conversations is the product of six graduation theses on gender differences in students' group talk. Each conversation lasted between 12 and 35 minutes (avg. 25 minutes) resulting in an overall time of 7 hours and 30 minutes. 31 Students (19 female, 12 male) participated in the study (Table 1). The participants gathered in groups of 4 students, either of the same or the opposite sex (6 conversations with a group of 4 female students, 6 with 4 male students, and 6 conversations with 2 male and 2 female students), according to age (first and third year students) and affiliation (two academic departments). In addition, the participants of each conversation came from the same small-sized class and were well acquainted. The participants were informed that their conversations may be transcribed and a video recorded for use in possible publication when recruited. Additionally, permission was asked once more after the transcription in cases where either private information had been displayed, or a misunderstanding concerning the nature and degree of the publication of the conversations became apparent during the conversation. The recordings took place in a small conference room at the university between or after lectures. The participants were given a card with a conversation topic to start with, but were free to vary (topic 1 "What do you expect from an opposite sex friend?" [isee ni motomeru koto]; topic 2 "Are you a dog lover or a cat lover?" [inuha ka nekoha ka]; topic 3 "About part-time work" [arubaito ni tsuite]). The investigator was not present during the recording. The combination of participants, the topic, and the duration of the 18 conversations are given in Table 2. The participants produced 15,449 utterances overall (female: 8,027 utterances, male: 7,422 utterances). All utterances were linked to video and transcribed in regular Japanese orthography and Latin script (Wakachi2002), and provided with morphological tags (JMOR04.1). Proper names were replaced by pseudonyms. ## Acknowledgements Additional contributors: Banno, Kyoko; Konishi, Saya; Matsui, Ayumi; Matsumoto, Shiori; Oogi, Rie; Takahashi, Akane; Muraki, Kyoko.
提供机构:
Fhrozen
原始信息汇总

CABank Japanese Sakura Corpus 概述

数据集基本信息

  • 语言: 日语 (ja)
  • 许可证: 知识共享 (cc)
  • 多语言性: 单语种
  • 大小: 100K<n<1M
  • 来源: 发现
  • 任务类别:
    • 音频分类
    • 自动语音识别
  • 任务ID: 说话人识别
  • 美观名称: banksakura
  • 标签: 语音识别

详细描述

  • 参与者数量: 31
  • 研究类型: xxx
  • 地理位置: 日本
  • 媒体类型: 音频
  • DOI: doi:10.21415/T5M90R

数据集构成

  • 对话数量: 18
  • 对话时长: 总共7小时30分钟
  • 参与者分布:
    • 19名女性,12名男性
    • 分组情况:
      • 6组全女性
      • 6组全男性
      • 6组混合性别
  • 对话主题:
    • 异性朋友期望
    • 宠物偏好
    • 兼职工作讨论

数据处理

  • 录音环境: 大学的小会议室
  • 录音时间: 课间或课后
  • 数据转录: 使用常规日语正字法和拉丁文脚本,附带形态学标签
  • 隐私保护: 真实姓名替换为假名

引用信息

  • 使用此数据集时,必须至少引用上述参考文献之一,以遵守TalkBank规则。
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CABankSakura是一个日语语音识别数据集,专注于说话人识别任务,包含18个对话视频,总时长7.5小时,涉及31名日本大学学生参与者,用于研究性别差异。该数据集规模较小(<1K),具有音频分类和自动语音识别功能,适用于语音处理研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作