NeuML/wikipedia-20260401
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/NeuML/wikipedia-20260401
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language:
- en
language_creators:
- found
license:
- cc-by-sa-3.0
- gfdl
multilinguality:
- monolingual
pretty_name: Wikipedia English April 2026
size_categories:
- 1M<n<10M
source_datasets: []
tags:
- pretraining
- language modelling
- wikipedia
- web
task_categories: []
task_ids: []
---
# Dataset Card for Wikipedia English April 2026
Dataset created using this [repo](https://huggingface.co/datasets/NeuML/wikipedia) with a [April 2026 Wikipedia snapshot](https://dumps.wikimedia.org/enwiki/20260401/).
This repo also has precomputed domain labels and a pageviews database. The pageviews database has the aggregated number of views for each page in Wikipedia. This file is built using the Wikipedia [Pageview complete dumps](https://dumps.wikimedia.org/other/pageview_complete/readme.html)
提供机构:
NeuML



