joshuaDami/hsk-dataset
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/joshuaDami/hsk-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: HSK Dataset (CSV)
tags:
- hsk
- chinese
- csv
license: cc-by-4.0
language:
- zh
dataset_info:
features:
- name: level
dtype: int32
- name: hanzi
dtype: string
- name: pinyin
dtype: string
- name: pinyin_tone
dtype: string
- name: pinyin_num
dtype: string
- name: english
dtype: string
- name: pos
dtype: string
- name: tts_url
dtype: string
---
# HSK Dataset (CSV)
A curated CSV export of HSK vocabulary (here: selected levels) produced with
the **Chinese2PDF** tooling.
- GitHub repository: [https://github.com/willfliaw/Chinese2PDF](https://github.com/willfliaw/Chinese2PDF)
- Builder script: [https://github.com/willfliaw/Chinese2PDF/blob/main/scripts/build_hsk_dataset.py](https://github.com/willfliaw/Chinese2PDF/blob/main/scripts/build_hsk_dataset.py)
This dataset is generated by running the builder (example):
```bash
python ./scripts/build_hsk_dataset.py --out ./scripts/hsk_words.csv
```
## Files
- `data/hsk_words.csv`
## Schema
| column | type | description |
|---------------|---------|----------------------------------------------|
| level | int | HSK level |
| hanzi | string | Chinese word (Hanzi) |
| pinyin | string | Pinyin (without tone marks) |
| pinyin_tone | string | Pinyin with diacritics (if available) |
| pinyin_num | string | Numeric-tone pinyin (if available) |
| english | string | English translation |
| pos | string | Part of speech (if available) |
| tts_url | string | Audio URL (if available) |
> Note: Some optional columns may be empty for certain rows depending on the source.
## Quickstart (🤗 Datasets)
```python
from datasets import load_dataset
ds = load_dataset(
"csv",
data_files="https://huggingface.co/datasets/willfliaw/hsk-dataset/resolve/main/data/hsk_words.csv",
)
print(ds)
print(ds["train"][0])
```
## License
This dataset is published under **cc-by-4.0**. Verify original source terms if you plan to redistribute or use commercially.
---
*Generated with ❤️ by Chinese2PDF helpers.*
提供机构:
joshuaDami



