five

FrancophonIA/OG2021

收藏
Hugging Face2025-03-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/FrancophonIA/OG2021
下载链接
链接失效反馈
官方服务:
资源简介:
OG2021语料库包含多语言新闻文章,这些文章报道了2021年东京奥运会期间的事件。数据集的创建目的是为了评估聚类算法。文章最初通过EventRegistry服务获取,使用在线新闻聚类算法进行聚类,最后由单一评估者手动检查和注释,使用翻译服务来理解文章内容。语料库包含一个名为og2021.csv的文件,其中包含10,940篇新闻文章,分为1,350个集群。每篇文章具有以下属性:id、title、lang、source、published_at、URL和cluster_id。数据集还包含body属性,但受更严格的许可限制。

The OG2021 corpus contains multilingual news articles that are reporting on the events happening during the 2021 Tokyo Olympics. The data set was created to evaluate the clustering algorithm. The articles were initially acquired via the EventRegistry service, clustered using an online news clustering algorithm, and finally manually inspected and annotated by a single evaluator using translation services to understand the meaning of the articles content. The corpus consists of a single file called og2021.csv, which contains the data of 10.940 news articles grouped into 1.350 clusters. Each article has the following attributes: id, title, lang, source, published_at, URL, and cluster_id. The dataset is also published with the body attribute but under a more restrictive licence.
提供机构:
FrancophonIA
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作