five

joshuaDami/hsk-dataset

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/joshuaDami/hsk-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: HSK Dataset (CSV) tags: - hsk - chinese - csv license: cc-by-4.0 language: - zh dataset_info: features: - name: level dtype: int32 - name: hanzi dtype: string - name: pinyin dtype: string - name: pinyin_tone dtype: string - name: pinyin_num dtype: string - name: english dtype: string - name: pos dtype: string - name: tts_url dtype: string --- # HSK Dataset (CSV) A curated CSV export of HSK vocabulary (here: selected levels) produced with the **Chinese2PDF** tooling. - GitHub repository: [https://github.com/willfliaw/Chinese2PDF](https://github.com/willfliaw/Chinese2PDF) - Builder script: [https://github.com/willfliaw/Chinese2PDF/blob/main/scripts/build_hsk_dataset.py](https://github.com/willfliaw/Chinese2PDF/blob/main/scripts/build_hsk_dataset.py) This dataset is generated by running the builder (example): ```bash python ./scripts/build_hsk_dataset.py --out ./scripts/hsk_words.csv ``` ## Files - `data/hsk_words.csv` ## Schema | column | type | description | |---------------|---------|----------------------------------------------| | level | int | HSK level | | hanzi | string | Chinese word (Hanzi) | | pinyin | string | Pinyin (without tone marks) | | pinyin_tone | string | Pinyin with diacritics (if available) | | pinyin_num | string | Numeric-tone pinyin (if available) | | english | string | English translation | | pos | string | Part of speech (if available) | | tts_url | string | Audio URL (if available) | > Note: Some optional columns may be empty for certain rows depending on the source. ## Quickstart (🤗 Datasets) ```python from datasets import load_dataset ds = load_dataset( "csv", data_files="https://huggingface.co/datasets/willfliaw/hsk-dataset/resolve/main/data/hsk_words.csv", ) print(ds) print(ds["train"][0]) ``` ## License This dataset is published under **cc-by-4.0**. Verify original source terms if you plan to redistribute or use commercially. --- *Generated with ❤️ by Chinese2PDF helpers.*
提供机构:
joshuaDami
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作