SynDy
收藏arXiv2024-05-17 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2405.10700v1
下载链接
链接失效反馈官方服务:
资源简介:
SynDy是由康涅狄格大学和Meedan共同开发的合成动态数据集生成框架,专注于利用大型语言模型(LLMs)训练本地化、专业化的语言模型,以应对错误信息问题。该数据集通过LLMs和社交媒体查询自动生成远监督、主题聚焦的数据集,包含精细的合成标签,用于错误信息缓解任务,如声明匹配、主题聚类和声明关系分类。SynDy旨在降低人工标注数据的成本,提高人工主导的事实核查效率。该数据集已集成到Meedan的聊天机器人热线中,服务于超过50个组织和23万用户,并通过WhatsApp等消息应用自动分发人工编写的事实核查。此外,SynDy还将整合到Co·Insights工具包中,支持资源有限的组织为其社区启动热线服务,进一步扩展其在错误信息检测和预防中的应用。
SynDy is a synthetic dynamic dataset generation framework co-developed by the University of Connecticut and Meedan. It focuses on leveraging Large Language Models (LLMs) to train localized, domain-specific language models for addressing misinformation. This framework automatically generates distant-supervised, topic-focused datasets via LLMs and social media queries, which contain fine-grained synthetic labels for misinformation mitigation tasks including claim matching, topic clustering, and claim relation classification. SynDy aims to reduce the cost of manually annotated data and improve the efficiency of human-led fact-checking. It has been integrated into Meedan's chatbot hotline, serving over 50 organizations and 230,000 users, and automatically distributes manually curated fact-checks through messaging applications such as WhatsApp. Additionally, SynDy will be integrated into the Co·Insights toolkit, enabling resource-constrained organizations to launch hotline services for their communities, further expanding its applications in misinformation detection and prevention.
提供机构:
康涅狄格大学
创建时间:
2024-05-17



