five

NeuML/wikipedia-20260401

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/NeuML/wikipedia-20260401
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language: - en language_creators: - found license: - cc-by-sa-3.0 - gfdl multilinguality: - monolingual pretty_name: Wikipedia English April 2026 size_categories: - 1M<n<10M source_datasets: [] tags: - pretraining - language modelling - wikipedia - web task_categories: [] task_ids: [] --- # Dataset Card for Wikipedia English April 2026 Dataset created using this [repo](https://huggingface.co/datasets/NeuML/wikipedia) with a [April 2026 Wikipedia snapshot](https://dumps.wikimedia.org/enwiki/20260401/). This repo also has precomputed domain labels and a pageviews database. The pageviews database has the aggregated number of views for each page in Wikipedia. This file is built using the Wikipedia [Pageview complete dumps](https://dumps.wikimedia.org/other/pageview_complete/readme.html)
提供机构:
NeuML
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作