enzoescipy/wikipedia-longest-stride-chunked-500
收藏Hugging Face2026-03-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/enzoescipy/wikipedia-longest-stride-chunked-500
下载链接
链接失效反馈官方服务:
资源简介:
---
license:
- cc-by-sa-3.0
- gfdl
---
# Wikipedia-Longest-Stride-Chunked-500
This is the chunked version of [Wikipedia HF](https://huggingface.co/datasets/wikimedia/wikipedia) Dataset.
```json
{
"article_hash": "... hashes ...",
"language": 'en',
"chunks": ["chunks1", "chunks2", "chunks3"],
"num_chunks": 3
}
```
chunking strategy is in the `scripts/chunking.py`.
detailed description will be provided later. Please stay tuned!
All licence reserved to the original author.
提供机构:
enzoescipy



