LocalDoc/azerbaijani_words_frequency
收藏Hugging Face2024-11-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/LocalDoc/azerbaijani_words_frequency
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含阿塞拜疆语单词及其在文本语料库分析中的频率列表。使用的语料库是来自https://huggingface.co/datasets/LocalDoc/AzTC的AzTC(阿塞拜疆语文本语料库)。AzTC包含5100万条非重复句子(约10亿个词元)。数据收集自多种来源,包括网站、新闻文章、书籍、维基百科文章、立法文件、科学文章和其他资源。该词频列表代表了每个阿塞拜疆语单词在整个AzTC语料库中出现的频率。
The dataset contains a list of Azerbaijani words and their frequencies from the AzTC (Azerbaijan Text Corpus). The AzTC corpus includes 51 million non-recurring sentences (approximately 1 billion tokens), collected from various sources such as websites, news articles, books, Wikipedia articles, legislative documents, scientific articles, and other resources. This dataset represents the frequency of each Azerbaijani word across the entire AzTC corpus.
提供机构:
LocalDoc



