nyuuzyou/engime
收藏Hugging Face2024-11-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/engime
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含2,677,221页的教育内容,主要以哈萨克语(kk)为主,部分内容为俄语(ru)。这些内容是从engime.org网站提取的,涵盖了学术和教育材料,特别是技术和科学主题。数据集的结构包括URL、标题和正文文本字段。所有数据都在一个单一的分割中。数据集采用CC0许可证,允许用户自由使用、修改和分发,无需许可或署名。
This dataset contains 2,677,221 pages of educational content primarily in Kazakh language with some Russian content extracted from the engime.org website. The content includes academic and educational materials, with a focus on technical and scientific topics. The dataset includes the following fields: URL of the webpage, title of the page/article, and main content text extracted from the page. All examples are in a single split. The dataset is released under the CC0 license, which means it can be used for any purpose, including commercial projects, can be modified and distributed freely without asking for permission.
提供机构:
nyuuzyou



