infinite-dataset-hub/Hsk2Corpus
收藏Hugging Face2024-08-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/infinite-dataset-hub/Hsk2Corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- infinite-dataset-hub
- synthetic
---
# Hsk2Corpus
tags: Chinese Language Modeling, Machine Translation, Text Classification
_Note: This is an AI-generated dataset so its content may be inaccurate or false_
**Dataset Description:** The 'Hsk2Corpus' dataset is curated for researchers and practitioners in the field of Chinese language processing, specifically focusing on High-Frequency Chinese Vocabulary (Hsk2) for advanced learners and professional use. It comprises texts that have been collected and annotated to aid in machine learning tasks such as Chinese Language Modeling, Machine Translation, and Text Classification. The dataset aims to support the development of AI systems that can handle Hsk2 vocabulary efficiently, facilitating more accurate and natural language processing in Chinese.
**CSV Content Preview:**
```csv
TextID,Text,Label
1,"我很高兴能为您提供这些援引。","Review"
2,"遇到难题时,始终保持积极态度。","Educational"
3,"高考即将到来,请确保你已经准备好。","Preparation"
4,"我尊敬的张三在这里,感谢你的建议。","Polite Request"
5,"如果你感到困惑,我可以帮助。","Offer of Assistance"
```
**Source of the data:**
The dataset was generated using the [Infinite Dataset Hub](https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub) and microsoft/Phi-3-mini-4k-instruct using the query 'Hsk2 chinese ':
- **Dataset Generation Page**: https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub?q=Hsk2+chinese+&dataset=Hsk2Corpus&tags=Chinese+Language+Modeling,+Machine+Translation,+Text+Classification
- **Model**: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
- **More Datasets**: https://huggingface.co/datasets?other=infinite-dataset-hub
license: MIT协议
tags:
- 无限数据集枢纽(Infinite Dataset Hub)
- 合成数据集
# Hsk2Corpus
标签:中文语言建模、机器翻译、文本分类
注:本数据集由人工智能生成,其内容可能存在不准确或虚假之处。
**数据集说明:** Hsk2Corpus语料库专为中文自然语言处理领域的研究人员与从业者打造,聚焦HSK2级高频中文词汇(High-Frequency Chinese Vocabulary, Hsk2),适用于高级学习者及专业场景。该语料库收录经收集与标注的文本,可支撑中文语言建模、机器翻译、文本分类等机器学习任务,旨在助力研发可高效处理HSK2级词汇的人工智能系统,推动中文自然语言处理实现更精准、自然的处理效果。
**CSV内容预览:**
csv
TextID,Text,Label
1,"我很高兴能为您提供这些援引。","Review"
2,"遇到难题时,始终保持积极态度。","Educational"
3,"高考即将到来,请确保你已经准备好。","Preparation"
4,"我尊敬的张三在这里,感谢你的建议。","Polite Request"
5,"如果你感到困惑,我可以帮助。","Offer of Assistance"
**数据来源:**
本数据集通过[无限数据集枢纽(Infinite Dataset Hub)](https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub)与Microsoft/Phi-3-mini-4k-instruct模型,以查询词`'Hsk2 chinese '`生成:
- **数据集生成页面**:https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub?q=Hsk2+chinese+&dataset=Hsk2Corpus&tags=Chinese+Language+Modeling,+Machine+Translation,+Text+Classification
- **所用模型**:https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
- **更多数据集**:https://huggingface.co/datasets?other=infinite-dataset-hub
提供机构:
infinite-dataset-hub



