five

Bilingual question pair

收藏
DataCite Commons2022-11-13 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/bilingual-question-pair
下载链接
链接失效反馈
官方服务:
资源简介:
Although asking and replying on social media platforms in mixed language is a very common phenomenon these days, there is lack of precise corpora to analyze such code mixed language. Datasets released by various CQA sites are monolingual i.e. only in English language. To perform our task, we needed annotated bilingual dataset which include Question pairs in mashed up language. In view of this scarcity we created a dataset by scraping pairs of questions from distinct social media networks, for-example Yahoo! Answers, Quora and TripAdvisor.  This way, the collected dataset consists of questions from diverse fields like education, entertainment, health, philosophy, sports etc., in the pair we included one English question and the other one is from Hinglish language. This second question may or may not be equivalent to the first one. Also, a label “Is_Duplicate” is used to indicate whether two equations in any question pair are semantically duplicate of each other. 
提供机构:
IEEE DataPort
创建时间:
2022-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作