CMLI-NLP/MiTC

Name: CMLI-NLP/MiTC
Creator: CMLI-NLP
Published: 2024-11-08 07:55:09
License: 暂无描述

Hugging Face2024-11-08 更新2025-04-19 收录

下载链接：

https://hf-mirror.com/datasets/CMLI-NLP/MiTC

下载链接

链接失效反馈

官方服务：

资源简介：

# MiTC ## Introduction [MiLMo](https://github.com/CMLI-NLP/MiLMo) constructs a minority multilingual text classification dataset named MiTC which contains five languages, including Mongolian, Tibetan, Uyghur, Kazakh and Korean. We also use [MiLMo](https://github.com/CMLI-NLP/MiLMo) for the downstream experiment of text classification on MiTC. ## Hugging Face https://huggingface.co/datasets/CMLI-NLP/MiTC ## Citation Plain Text: J. Deng, H. Shi, X. Yu, W. Bao, Y. Sun and X. Zhao, "MiLMo:Minority Multilingual Pre-Trained Language Model," 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 2023, pp. 329-334, doi: 10.1109/SMC53992.2023.10393961. BibTeX: ``` @INPROCEEDINGS{10393961, author={Deng, Junjie and Shi, Hanru and Yu, Xinhe and Bao, Wugedele and Sun, Yuan and Zhao, Xiaobing}, booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, title={MiLMo:Minority Multilingual Pre-Trained Language Model}, year={2023}, volume={}, number={}, pages={329-334}, keywords={Soft sensors;Text categorization;Social sciences;Government;Data acquisition;Morphology;Data models;Multilingual;Pre-trained language model;Datasets;Word2vec}, doi={10.1109/SMC53992.2023.10393961}} ``` ## Disclaimer This dataset/model is for academic research purposes only. Prohibited for any commercial or unethical purposes.

# MiTC ## 引言 [MiLMo](https://github.com/CMLI-NLP/MiLMo)构建了一款名为MiTC的少数民族多语种文本分类数据集，该数据集涵盖蒙古语、藏语、维吾尔语、哈萨克语与朝鲜语共五种语言。我们在MiTC数据集上开展文本分类下游实验时，同时采用了MiLMo模型。 ## Hugging Face 数据集页面 https://huggingface.co/datasets/CMLI-NLP/MiTC ## 引用格式 ### 纯文本格式邓俊杰、施涵如、于鑫赫、鲍吴格德勒、孙源、赵小兵, "MiLMo：少数民族多语种预训练语言模型", 2023年IEEE系统、人与控制论国际会议（SMC）, 美国夏威夷州瓦胡岛火奴鲁鲁, 2023年, 第329-334页, DOI: 10.1109/SMC53992.2023.10393961. ### BibTeX格式 @INPROCEEDINGS{10393961, author={Deng, Junjie and Shi, Hanru and Yu, Xinhe and Bao, Wugedele and Sun, Yuan and Zhao, Xiaobing}, booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, title={MiLMo:Minority Multilingual Pre-Trained Language Model}, year={2023}, volume={}, number={}, pages={329-334}, keywords={软传感器;文本分类;社会科学;政府;数据采集;形态学;数据模型;多语种;预训练语言模型;数据集;Word2vec}, doi={10.1109/SMC53992.2023.10393961}} ## 免责声明本数据集/模型仅用于学术研究，严禁用于任何商业或不道德用途。

提供机构：

CMLI-NLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集