Russian dataset for the reply recovery
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/xm86yszck2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is constructed from several Telegram chats in order to teach the model of prediction whether one message can be a reply for another or not.
**Note:** the messages that actually replies are label with **zero**.
The positive replies was aquaired based on natural `reply_to` Teleram markup. The negative case was aquaired by random sampling, which is suprisungly notably give some possibly `reply_to` combination, thus, making negative examples noisy.
There are several chats:
* balichat_woman - chats with woman from Bali
* borussia_chat - football chat
* chat_suicidnikov - the chat that dedicated the suicidal game "Siniy kit"
* cotedazurchat - chat of immigrants in France
* easypeasycodechat - chat of programmers
* openwrt_ru - chat that dedicated to openWRT
* orange_sosedi - chat of neighbors
* sling38 - chat of yong moms
* terrariaphone - chat of Terraria gamers
The `test_data` was validated by crowdsource with Toloka.ai. The final validation was done by the authors, so it considered as gold test set.
创建时间:
2023-06-12



