IndicDialogue Dataset
收藏Mendeley Data2024-06-19 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/wcb4bxbyxx
下载链接
链接失效反馈官方服务:
资源简介:
The IndicDialogue dataset contains raw subtitle SRT files and dialogues extracted from them. The subtitles are in 10 indic languages, namely Hindi, Bengali, Marathi, Telugu, Tamil, Urdu, Odia, Sindhi, Nepali and Assamese. This dataset provides a corpus for performing various NLP tasks in low-resource languages using SLMs(Small Language Models) and LLMs(Large Language Models).
创建时间:
2024-04-10



