marinone94/nst_sv
收藏Hugging Face2022-05-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/marinone94/nst_sv
下载链接
链接失效反馈官方服务:
资源简介:
"This database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Swedish. In this updated version, the organization of the data have been altered to improve the usefulness of the database.
In the original version of the material, the files were organized in a specific folder structure where the folder names were meaningful. However, the file names were not meaningful, and there were also cases of files with identical names in different folders. This proved to be impractical, since users had to keep the original folder structure in order to use the data. The files have been renamed, such that the file names are unique and meaningful regardless of the folder structure. The original metadata files were in spl format. These have been converted to JSON format. The converted metadata files are also anonymized and the text encoding has been converted from ANSI to UTF-8.
See the documentation file for a full description of the data and the changes made to the database." - dataset originally available at https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-56/
In 🤗 datasets, this dataset will have a structure similar to common_voice. TO BE UPDATED.
提供机构:
marinone94
原始信息汇总
数据集概述
数据集来源
- 创建者:Nordic Language Technology
- 用途:用于瑞典语的自动语音识别和听写系统的开发
数据集更新
- 组织结构调整:为了提高数据库的实用性,数据集的组织结构已经进行了更改。
- 文件命名改进:原始数据集中文件名不具有意义,且存在同名文件在不同文件夹中的情况。更新后的数据集对文件进行了重命名,确保文件名唯一且有意义,不再依赖于原始文件夹结构。
- 元数据格式转换:原始的spl格式元数据文件已转换为JSON格式,并进行了匿名化处理。
- 文本编码变更:文本编码从ANSI转换为UTF-8。
文档说明
- 详细信息:关于数据集的完整描述和数据库的更改,请参阅文档文件。



