JEIDA/JCSD-Channel 0 City Names

Name: JEIDA/JCSD-Channel 0 City Names
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:37:08
License: 暂无描述

DataCite Commons2021-07-01 更新2024-07-13 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC96S64-1

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3> The Japanese Electronic Industry Development Association's (JEIDA) Common Speech Data Corpus (JCSD) was prepared by Jonathan Hamaker, Richard J. Duncan and Joe Picone of the Institute for Signal and Information Processing at Mississippi State University. <h3>Data</h3> This collection consists of high-fidelity recordings of 150 native speakers of Japanese; each speaker produces four repetitions of 323 short prompts, including city names, control words, monosyllabic words, isolated digits and strings of four digits. Each reading session was recorded with two microphones, yielding two channels that differ in audio quality for each utterance. Channel 0 (<a href="http://catalog.ldc.upenn.edu/LDC96S64" rel="nofollow">LDC96S64</a>) contains data recorded with a standard dynamic microphone---a Sanken MU-2C microphone. Channel 1 (<a href="http://catalog.ldc.upenn.edu/LDC96S65" rel="nofollow">LDC96S65</a>) contains data recorded simultaneously with a condenser microphone that presumably varied from site to site and is available separately. A summary of the size and content of the corpus is given below: number of speakers 150 speakers males 75 females 75 range of speaker age 10 yrs. to 70 yrs. number of items per speaker 323 items isolated digits 15 four digit sequences 35 city names 100 monosyllables 110 control words (set A) 13 control words (set B) 24 control words (set C) 26 number of repetitions per item 4 repetitions total number of utterances 193,763 utterances (per channel) sample frequency 16 kHz sample type 16-bit linear number of microphones 2 (dynamic and condenser) For purposes of publication by the LDC, the corpus was originally organized onto 40 CD-ROMs; the partitioning of the data files was done primarily by channel (20 CD-ROMs each for channel 0 and channel 1) and secondarily by category of prompts. These prompts include: Description Number of items Control Words: Banking Services 13 Word Processors 24 Home Electronic Equipment 26 Digits: Isolated Digits 15 Four Digit Sequences 35 City Names: 100 a phonetically-rich subset of common Japanese city names Monosyllables: 110 all Japanese monosyllables plus several used to pronounce foreign words JEIDA/JCSD-Channel 0 and JEIDA/JCSD-Channel 1 can each be ordered as complete sets. Components of the corpus can also be purchased as outlined below: Price Set-of Description Catalog ID 2000 5 JEIDA/JCSD-Channel 0 (Complete) <a href="http://catalog.ldc.upenn.edu/LDC96S64" rel="nofollow">LDC96S64</a> 600 1 JEIDA/JCSD-Channel 0 City Names LDC96S64-1 400 1 JEIDA/JCSD-Channel 0 Control Words <a href="http://catalog.ldc.upenn.edu/LDC96S64-2" rel="nofollow">LDC96S64-2</a> 100 1 JEIDA/JCSD-Channel 0 Isolated Digits <a href="http://catalog.ldc.upenn.edu/LDC96S64-3" rel="nofollow">LDC96S64-3</a> 300 1 JEIDA/JCSD-Channel 0 Four Digit Seq. <a href="http://catalog.ldc.upenn.edu/LDC96S64-4" rel="nofollow">LDC96S64-4</a> 600 1 JEIDA/JCSD-Channel 0 Monosyllables <a href="http://catalog.ldc.upenn.edu/LDC96S64-5" rel="nofollow">LDC96S64-5</a> 2000 20 JEIDA/JCSD-Channel 1 (Complete) <a href="http://catalog.ldc.upenn.edu/LDC96S65" rel="nofollow">LDC96S65</a> 600 6 JEIDA/JCSD-Channel 1 City Names <a href="http://catalog.ldc.upenn.edu/LDC96S65-1" rel="nofollow">LDC96S65-1</a> 500 4 JEIDA/JCSD-Channel 1 Control Words <a href="http://catalog.ldc.upenn.edu/LDC96S65-2" rel="nofollow">LDC96S65-2</a> 100 1 JEIDA/JCSD-Channel 1 Isolated Digits <a href="http://catalog.ldc.upenn.edu/LDC96S65-3" rel="nofollow">LDC96S65-3</a> 300 3 JEIDA/JCSD-Channel 1 Four Digit Seq. <a href="http://catalog.ldc.upenn.edu/LDC96S65-4" rel="nofollow">LDC96S65-4</a> 600 6 JEIDA/JCSD-Channel 1 Monosyllables <a href="http://catalog.ldc.upenn.edu/LDC96S65-5" rel="nofollow">LDC96S65-5</a> <h3>Updates</h3> There are no updates at this time.

<h3>简介</h3> 日本电子工业发展协会（Japanese Electronic Industry Development Association, JEIDA）通用语音数据集语料库（Common Speech Data Corpus, JCSD）由密西西比州立大学信号与信息处理研究所的Jonathan Hamaker、Richard J. Duncan与Joe Picone共同编制。 <h3>数据集概况</h3> 该语料库包含150名日语母语者的高保真录音：每名说话者需对323条简短提示语重复朗读4次，提示语涵盖城市名称、控制词、单音节词、孤立数字以及四位数字串。每条朗读内容均使用两支麦克风录制，得到两个声道的音频，二者的音质存在差异。声道0（<a href="http://catalog.ldc.upenn.edu/LDC96S64" rel="nofollow">LDC96S64</a>）采用标准动圈麦克风——Sanken MU-2C麦克风录制。声道1（<a href="http://catalog.ldc.upenn.edu/LDC96S65" rel="nofollow">LDC96S65</a>）同步采用电容麦克风录制，该麦克风因录制站点不同而存在差异，且可单独获取。 以下为该语料库的规模与内容概要： 说话者总数：150名，其中男性75名、女性75名；说话者年龄范围：10岁至70岁；每名说话者的朗读条目数：323条，其中孤立数字15条、四位数字序列35条、城市名称100条、单音节词110条、控制词集A 13条、控制词集B 24条、控制词集C 26条；每条提示语的重复次数：4次；总录音条数（单声道）：193,763条；采样频率：16 kHz；采样格式：16位线性量化；麦克风类型：2种（动圈麦克风与电容麦克风）。 为便于语言数据联盟（Linguistic Data Consortium, LDC）出版发行，该语料库最初被分装至40张CD-ROM中：数据文件的划分主要以声道为依据（声道0与声道1各占20张CD-ROM），次要依据为提示语类别。提示语类别包括： 类别说明与条目数： 控制词：银行服务相关13条、文字处理相关24条、家用电子设备相关26条； 数字类：孤立数字15条、四位数字序列35条； 城市名称：100条常见日语城市名称的语音丰富性子集； 单音节词：110个日语原生单音节词，以及若干用于音译外来词的单音节词。 JEIDA/JCSD-声道0与JEIDA/JCSD-声道1均可作为完整套装订购。语料库组件也可按如下方式单独购买： 价格数量产品描述目录编号 2000 5 JEIDA/JCSD-声道0（完整套装） <a href="http://catalog.ldc.upenn.edu/LDC96S64" rel="nofollow">LDC96S64</a> 600 1 JEIDA/JCSD-声道0 城市名称子集 LDC96S64-1 400 1 JEIDA/JCSD-声道0 控制词子集 <a href="http://catalog.ldc.upenn.edu/LDC96S64-2" rel="nofollow">LDC96S64-2</a> 100 1 JEIDA/JCSD-声道0 孤立数字子集 <a href="http://catalog.ldc.upenn.edu/LDC96S64-3" rel="nofollow">LDC96S64-3</a> 300 1 JEIDA/JCSD-声道0 四位数字序列子集 <a href="http://catalog.ldc.upenn.edu/LDC96S64-4" rel="nofollow">LDC96S64-4</a> 600 1 JEIDA/JCSD-声道0 单音节词子集 <a href="http://catalog.ldc.upenn.edu/LDC96S64-5" rel="nofollow">LDC96S64-5</a> 2000 20 JEIDA/JCSD-声道1（完整套装） <a href="http://catalog.ldc.upenn.edu/LDC96S65" rel="nofollow">LDC96S65</a> 600 6 JEIDA/JCSD-声道1 城市名称子集 <a href="http://catalog.ldc.upenn.edu/LDC96S65-1" rel="nofollow">LDC96S65-1</a> 500 4 JEIDA/JCSD-声道1 控制词子集 <a href="http://catalog.ldc.upenn.edu/LDC96S65-2" rel="nofollow">LDC96S65-2</a> 100 1 JEIDA/JCSD-声道1 孤立数字子集 <a href="http://catalog.ldc.upenn.edu/LDC96S65-3" rel="nofollow">LDC96S65-3</a> 300 3 JEIDA/JCSD-声道1 四位数字序列子集 <a href="http://catalog.ldc.upenn.edu/LDC96S65-4" rel="nofollow">LDC96S65-4</a> 600 6 JEIDA/JCSD-声道1 单音节词子集 <a href="http://catalog.ldc.upenn.edu/LDC96S65-5" rel="nofollow">LDC96S65-5</a> <h3>更新情况</h3> 目前暂无更新内容。

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集