CMLI-NLP/MiTC
收藏Hugging Face2024-11-08 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/CMLI-NLP/MiTC
下载链接
链接失效反馈官方服务:
资源简介:
# MiTC
## Introduction
[MiLMo](https://github.com/CMLI-NLP/MiLMo) constructs a minority multilingual text classification dataset named MiTC which contains five languages, including Mongolian, Tibetan, Uyghur, Kazakh and Korean.
We also use [MiLMo](https://github.com/CMLI-NLP/MiLMo) for the downstream experiment of text classification on MiTC.
## Hugging Face
https://huggingface.co/datasets/CMLI-NLP/MiTC
## Citation
Plain Text:
J. Deng, H. Shi, X. Yu, W. Bao, Y. Sun and X. Zhao, "MiLMo:Minority Multilingual Pre-Trained Language Model," 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 2023, pp. 329-334, doi: 10.1109/SMC53992.2023.10393961.
BibTeX:
```
@INPROCEEDINGS{10393961,
author={Deng, Junjie and Shi, Hanru and Yu, Xinhe and Bao, Wugedele and Sun, Yuan and Zhao, Xiaobing},
booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)},
title={MiLMo:Minority Multilingual Pre-Trained Language Model},
year={2023},
volume={},
number={},
pages={329-334},
keywords={Soft sensors;Text categorization;Social sciences;Government;Data acquisition;Morphology;Data models;Multilingual;Pre-trained language model;Datasets;Word2vec},
doi={10.1109/SMC53992.2023.10393961}}
```
## Disclaimer
This dataset/model is for academic research purposes only. Prohibited for any commercial or unethical purposes.
# MiTC
## 引言
[MiLMo](https://github.com/CMLI-NLP/MiLMo)构建了一款名为MiTC的少数民族多语种文本分类数据集,该数据集涵盖蒙古语、藏语、维吾尔语、哈萨克语与朝鲜语共五种语言。
我们在MiTC数据集上开展文本分类下游实验时,同时采用了MiLMo模型。
## Hugging Face 数据集页面
https://huggingface.co/datasets/CMLI-NLP/MiTC
## 引用格式
### 纯文本格式
邓俊杰、施涵如、于鑫赫、鲍吴格德勒、孙源、赵小兵, "MiLMo:少数民族多语种预训练语言模型", 2023年IEEE系统、人与控制论国际会议(SMC), 美国夏威夷州瓦胡岛火奴鲁鲁, 2023年, 第329-334页, DOI: 10.1109/SMC53992.2023.10393961.
### BibTeX格式
@INPROCEEDINGS{10393961,
author={Deng, Junjie and Shi, Hanru and Yu, Xinhe and Bao, Wugedele and Sun, Yuan and Zhao, Xiaobing},
booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)},
title={MiLMo:Minority Multilingual Pre-Trained Language Model},
year={2023},
volume={},
number={},
pages={329-334},
keywords={软传感器;文本分类;社会科学;政府;数据采集;形态学;数据模型;多语种;预训练语言模型;数据集;Word2vec},
doi={10.1109/SMC53992.2023.10393961}}
## 免责声明
本数据集/模型仅用于学术研究,严禁用于任何商业或不道德用途。
提供机构:
CMLI-NLP



