nyuuzyou/muhaz
收藏Hugging Face2024-11-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/muhaz
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含501,323页的教育内容,主要语言为土耳其语和阿塞拜疆语,部分内容为俄语,内容涵盖学术和教育材料,特别是技术和科学主题。数据集包括以下字段:`url`(网页的URL)、`title`(页面/文章的标题)和`text`(从页面提取的主要内容文本)。所有示例都在一个单一的分割中。该数据集在Creative Commons Zero (CC0)许可证下发布,允许用户自由使用、修改和分发,无需许可或归属。
This dataset contains 501,323 pages of educational content primarily in Turkish (tr) and Azerbaijani (az) languages with some Russian (ru) content extracted from the muhaz.org website. The content includes academic and educational materials, with a focus on technical and scientific topics. The dataset includes the following fields: URL of the webpage, title of the page/article, and main content text extracted from the page. All examples are in a single split. The dataset is released under the CC0 license, meaning it can be used for any purpose, including commercial projects, can be modified and distributed freely without asking for permission.
提供机构:
nyuuzyou



