Sprinklr/Articles_Denoising
收藏Hugging Face2025-04-25 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Sprinklr/Articles_Denoising
下载链接
链接失效反馈官方服务:
资源简介:
Articles Denoising数据集旨在测试对商业文章进行精炼/去噪的能力。它包括以多种格式(干净和噪声)呈现的知识库条目,目的是识别包含相似信息和包含矛盾信息的文章对。该数据集是使用Gemini-2.0-flash按照精心计划的方案合成生成的,以确保重要品牌实体的选择性。数据集包含1917篇文章,并提供了两个地面真相数据框架:test_similarity和test_contradictory,每个框架都包含多对相似或矛盾的文章。
The Articles Denoising dataset is designed to test the ability to refine/denoise business articles. It comprises knowledge base entries presented in multiple formats (clean and noisy) with the objective to identify pairs of articles containing similar information and pairs containing contradictory information. The dataset was synthetically generated using Gemini-2.0-flash following a carefully planned approach to select important Brand entities. It contains 1917 articles and includes two ground truth data frames: test_similarity and test_contradictory, each containing multiple pairs of articles that are either similar or contradictory.
提供机构:
Sprinklr



