nyuuzyou/emirsaba
收藏Hugging Face2024-11-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/emirsaba
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含2,189,980页主要使用哈萨克语(kk)编写的教育内容,部分内容为俄语(ru),这些内容是从emirsaba.org网站提取的。内容包括学术和教育材料,重点关注技术和科学主题。数据集的结构包括以下字段:`url`(网页的URL)、`title`(页面/文章的标题)和`text`(从页面提取的主要内容文本)。所有示例都在一个单一的分割中。该数据集在Creative Commons Zero (CC0)许可证下发布,允许用户自由使用、修改和分发,无需注明出处。
This dataset contains 2,189,980 pages of educational content primarily in Kazakh language with some Russian content extracted from the emirsaba.org website. The content includes academic and educational materials, with a focus on technical and scientific topics. The dataset includes the following fields: url (URL of the webpage), title (Title of the page/article), and text (Main content text extracted from the page). All examples are in a single split. The dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license, meaning it can be used for any purpose, including commercial projects, modified and distributed without permission, and no attribution is required.
提供机构:
nyuuzyou



