LocalDoc/AzTC
收藏Hugging Face2025-08-14 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/LocalDoc/AzTC
下载链接
链接失效反馈官方服务:
资源简介:
AzTC(阿塞拜疆文本语料库)是最大的阿塞拜疆语文本语料库的第一版,包含了大约5100万(大约10亿个标记)的非重复句子。这些句子是从网站、新闻、书籍、维基百科、立法文件、科学文章等多种资源中收集而来的。语料库适用于文本生成和文本到文本生成等自然语言处理任务。
The AzTC (Azerbaijan Text Corpus) is the first version of the largest text corpus in the Azerbaijani language, containing approximately 51 million non-recurring sentences (about 1 billion tokens). These sentences are collected from various sources such as websites, news, books, Wikipedia, legislation, scientific articles, etc. The corpus is suitable for natural language processing tasks such as text generation and text-to-text generation.
提供机构:
LocalDoc



