Nexdata/200475_Sentences_Chinese_Text_Normalization_Data

Name: Nexdata/200475_Sentences_Chinese_Text_Normalization_Data
Creator: Nexdata
Published: 2024-04-16 03:15:29
License: 暂无描述

Hugging Face2024-04-16 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/Nexdata/200475_Sentences_Chinese_Text_Normalization_Data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-nd-4.0 --- ## Description 200,475 Sentences - Chinese Text Normalization Data. Annotate the special symbols and Arabic numerals in the sentences as Chinese characters. For more details, please refer to the link: https://www.nexdata.ai/dataset/1102?source=Huggingface # Specifications ## Data content 200,475 sentences of text were transcribed in Chinese characters; ## Data scale 200,475 original texts with 457,832 annotations; ## Content source Sentences extracted from various types of news, articles, novels, etc. ## Language Chinese; ## Annotation Annotate the special symbols and Arabic numerals in the sentences as Chinese characters; ## Applications TTS, Text normalization; # Licensing Information Commercial License

--- 许可协议：知识共享署名-非商业性使用-禁止演绎4.0（CC BY-NC-ND 4.0） --- ## 数据集描述 200,475条语句——中文文本归一化（Chinese Text Normalization）数据集。需将语句中的特殊符号与阿拉伯数字标注为中文汉字。如需了解更多详情，请访问链接：https://www.nexdata.ai/dataset/1102?source=Huggingface # 数据集规格 ## 数据内容已使用汉字转录200,475条文本语句； ## 数据规模共计200,475条原始文本，包含457,832处标注； ## 内容来源取自各类新闻、文章、小说等公开文本中的语句； ## 语言中文； ## 标注规则将语句中的特殊符号与阿拉伯数字转换为中文汉字进行标注； ## 应用场景可用于语音合成（Text-to-Speech, TTS）与文本归一化任务； # 许可信息商业许可（Commercial License）

提供机构：

Nexdata

原始信息汇总