OzLabs/hebrew-wiki-articles
收藏Hugging Face2026-03-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/OzLabs/hebrew-wiki-articles
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- he
license: cc-by-sa-3.0
task_categories:
- text-generation
- fill-mask
tags:
- wikipedia
- hebrew
- text
- dump
size_categories:
- 100K<n<1M
pretty_name: Hebrew Wikipedia Articles
dataset_info:
features:
- name: id
dtype: int64
description: MediaWiki page ID
- name: title
dtype: string
description: Article title
- name: text
dtype: string
description: Cleaned article text (templates removed, [[x|y]]→y, main ns, no redirects)
splits:
- name: train
num_bytes: null
num_examples: null
---
# Hebrew Wikipedia Articles
Hebrew Wikipedia article dump (main namespace only, redirects excluded), exported 2024-09-01.
## Data
- **Source**: [hewiki-20240901-pages-articles-multistream](https://dumps.wikimedia.org/hewiki/)
- **Schema**: `id` (int64), `title` (string), `text` (string, cleaned: templates removed, links simplified)
- **License**: CC BY-SA 3.0 (Wikipedia)
## Usage
```python
from datasets import load_dataset
# After uploading to Hub (replace ORG/REPO with your repo id):
ds = load_dataset("parquet", data_files="https://huggingface.co/datasets/ORG/REPO/resolve/main/data/train.parquet", split="train")
# or
ds = load_dataset("ORG/REPO", trust_remote_code=True)
```
## Notes
- Only main-namespace (ns=0) articles; redirects excluded. Text is cleaned: `{{...}}` removed, `[[link|display]]`→display.
提供机构:
OzLabs



