TextBenDS
收藏arXiv2021-08-12 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2108.05689v1
下载链接
链接失效反馈官方服务:
资源简介:
TextBenDS是一个为分布式系统设计的通用文本数据基准,由布加勒斯特理工大学的研究团队创建。该数据集包含250万个推文,旨在通过动态计算权重来改进传统信息系统的处理方式,解决权重计算的重复性问题。数据集内容涵盖性别、位置和时间等多维度信息,用于提取一般语言和社会背景特征。TextBenDS支持结构化、半结构化和非结构化文本数据的分析,适用于多种文本分析和检索任务,如关键词提取和文档排名。
TextBenDS is a general-purpose text data benchmark designed for distributed systems, created by the research team from the Polytechnic University of Bucharest. This dataset contains 2.5 million tweets, aiming to improve the processing methods of traditional information systems through dynamic weight calculation and solve the repetitive problem in weight calculation. The dataset covers multi-dimensional information such as gender, location and time, and is used to extract general linguistic and social background features. TextBenDS supports the analysis of structured, semi-structured and unstructured text data, and is applicable to a variety of text analysis and retrieval tasks, such as keyword extraction and document ranking.
提供机构:
布加勒斯特理工大学
创建时间:
2021-08-12



