five

Article Data

收藏
Figshare2024-03-19 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Article_Data/25436047
下载链接
链接失效反馈
官方服务:
资源简介:
The package contains 14 categories of class typical dictionaries, class typical feature vector, class typical feature vector DF value and Sort the words by weight value in descending order.This paper proposes an Average Term frequency-Document frequency-Wavelet analysis (ATF-DF-WA) algorithm for text classification. The algorithm leverages the inherent characteristics of big data text samples, which are furnished with pre-existing category labels. It employs wavelet analysis on these samples and subsequently performs classification through waveform similarity computation. The algorithm is structured into two distinct stages. The first stage is the typical feature extraction stage. Before text classification, the ATF-DF text category feature extraction algorithm is initially proposed drawing upon statistical principles of big data. The ATF-DF algorithm is employed for the extraction of features from multi-category large-scale textual data, utilizing the existing category labels to derive class-specific feature vectors. These vectors are subsequently transformed into waveforms, and the wavelet analysis is employed to derive the class typical feature layer waveform. The second stage is the text classification stage. Based on the class's typical feature vector, the feature vector for the sample to be classified is derived. Subsequently, wavelet analysis (WA) is conducted to extract the waveform from the feature layer. Text classification is accomplished by quantifying the similarity between the obtained waveform and the waveform of the class's characteristic feature layer. The ATF-DF-WA algorithm leverages the statistical advantages inherent in big data and effectively employs wavelet analysis tools, thus offering distinctive benefits. Experimental outcomes indicate that, in comparison with conventional text classification algorithms, the ATF-DF-WA algorithm precisely extracts class-representative feature vectors from diverse text types. This precision is attributed to the application of wavelet analysis and waveform similarity computations, which substantially enhance the accuracy, recall rate, and F1 score of text classification.
创建时间:
2024-03-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作