five

Automatic classification of journalistic documents on the Internet1

收藏
DataCite Commons2021-03-23 更新2024-07-25 收录
下载链接:
https://scielo.figshare.com/articles/dataset/Automatic_classification_of_journalistic_documents_on_the_Internet1/5720005
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract Online journalism is increasing every day. There are many news agencies, newspapers, and magazines using digital publication in the global network. Documents published online are available to users, who use search engines to find them. In order to deliver documents that are relevant to the search, they must be indexed and classified. Due to the vast number of documents published online every day, a lot of research has been carried out to find ways to facilitate automatic document classification. The objective of the present study is to describe an experimental approach for the automatic classification of journalistic documents published on the Internet using the Vector Space Model for document representation. The model was tested based on a real journalism database, using algorithms that have been widely reported in the literature. This article also describes the metrics used to assess the performance of these algorithms and their required configurations. The results obtained show the efficiency of the method used and justify further research to find ways to facilitate the automatic classification of documents.

摘要:在线新闻行业的规模与日俱增。全球范围内众多新闻通讯社、报社与杂志均已借助全球网络开展数字化出版业务。用户可通过搜索引擎检索在线发布的各类文档,而为了向用户返回与搜索需求匹配的相关文档,需对这些文档进行索引与分类处理。鉴于每日在线发布的文档数量极其庞大,学界已开展大量研究以探索实现文档自动分类的可行方案。本研究旨在介绍一种面向互联网发布的新闻文档的自动分类实验方法,该方法采用向量空间模型(Vector Space Model)进行文档表征。本研究基于真实新闻数据库,采用学界已广泛报道的各类算法对该模型进行了测试。本文同时阐述了用于评估这些算法性能的评价指标及其所需的配置参数。所得实验结果证实了所提方法的有效性,同时也为进一步探索优化文档自动分类的路径提供了理论依据。
提供机构:
SciELO journals
创建时间:
2017-12-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作