five

CMLI-NLP/MiTC

收藏
Hugging Face2024-11-08 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/CMLI-NLP/MiTC
下载链接
链接失效反馈
官方服务:
资源简介:
# MiTC ## Introduction [MiLMo](https://github.com/CMLI-NLP/MiLMo) constructs a minority multilingual text classification dataset named MiTC which contains five languages, including Mongolian, Tibetan, Uyghur, Kazakh and Korean. We also use [MiLMo](https://github.com/CMLI-NLP/MiLMo) for the downstream experiment of text classification on MiTC. ## Hugging Face https://huggingface.co/datasets/CMLI-NLP/MiTC ## Citation Plain Text: J. Deng, H. Shi, X. Yu, W. Bao, Y. Sun and X. Zhao, "MiLMo:Minority Multilingual Pre-Trained Language Model," 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 2023, pp. 329-334, doi: 10.1109/SMC53992.2023.10393961. BibTeX: ``` @INPROCEEDINGS{10393961, author={Deng, Junjie and Shi, Hanru and Yu, Xinhe and Bao, Wugedele and Sun, Yuan and Zhao, Xiaobing}, booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, title={MiLMo:Minority Multilingual Pre-Trained Language Model}, year={2023}, volume={}, number={}, pages={329-334}, keywords={Soft sensors;Text categorization;Social sciences;Government;Data acquisition;Morphology;Data models;Multilingual;Pre-trained language model;Datasets;Word2vec}, doi={10.1109/SMC53992.2023.10393961}} ``` ## Disclaimer This dataset/model is for academic research purposes only. Prohibited for any commercial or unethical purposes.

# MiTC ## 引言 [MiLMo](https://github.com/CMLI-NLP/MiLMo)构建了一款名为MiTC的少数民族多语种文本分类数据集,该数据集涵盖蒙古语、藏语、维吾尔语、哈萨克语与朝鲜语共五种语言。 我们在MiTC数据集上开展文本分类下游实验时,同时采用了MiLMo模型。 ## Hugging Face 数据集页面 https://huggingface.co/datasets/CMLI-NLP/MiTC ## 引用格式 ### 纯文本格式 邓俊杰、施涵如、于鑫赫、鲍吴格德勒、孙源、赵小兵, "MiLMo:少数民族多语种预训练语言模型", 2023年IEEE系统、人与控制论国际会议(SMC), 美国夏威夷州瓦胡岛火奴鲁鲁, 2023年, 第329-334页, DOI: 10.1109/SMC53992.2023.10393961. ### BibTeX格式 @INPROCEEDINGS{10393961, author={Deng, Junjie and Shi, Hanru and Yu, Xinhe and Bao, Wugedele and Sun, Yuan and Zhao, Xiaobing}, booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, title={MiLMo:Minority Multilingual Pre-Trained Language Model}, year={2023}, volume={}, number={}, pages={329-334}, keywords={软传感器;文本分类;社会科学;政府;数据采集;形态学;数据模型;多语种;预训练语言模型;数据集;Word2vec}, doi={10.1109/SMC53992.2023.10393961}} ## 免责声明 本数据集/模型仅用于学术研究,严禁用于任何商业或不道德用途。
提供机构:
CMLI-NLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作