five

Russian dataset for the reply recovery

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/xm86yszck2
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is constructed from several Telegram chats in order to teach the model of prediction whether one message can be a reply for another or not. **Note:** the messages that actually replies are label with **zero**. The positive replies was aquaired based on natural `reply_to` Teleram markup. The negative case was aquaired by random sampling, which is suprisungly notably give some possibly `reply_to` combination, thus, making negative examples noisy. There are several chats: * balichat_woman - chats with woman from Bali * borussia_chat - football chat * chat_suicidnikov - the chat that dedicated the suicidal game "Siniy kit" * cotedazurchat - chat of immigrants in France * easypeasycodechat - chat of programmers * openwrt_ru - chat that dedicated to openWRT * orange_sosedi - chat of neighbors * sling38 - chat of yong moms * terrariaphone - chat of Terraria gamers The `test_data` was validated by crowdsource with Toloka.ai. The final validation was done by the authors, so it considered as gold test set.
创建时间:
2023-06-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作