five

Clustering Tasks and Decision Trees with Elegiac Poets

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12682693
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains files generated during a Natural Language Processing (NLP) and automatic text analysis task. Attached is a Jupyter notebook with the complete code, along with several Excel files (.xlsx) containing organized information. Additionally, there are three folders that include files generated during the Silhouette calculation, K-means clustering, and feature extraction using decision trees. The three folders are:1. Silhouette Calculation: Contains PNG images of Silhouette plots for various analysis configurations.2. K-means Clustering: Contains pickle (.pkl) files with features and labels for each combination of excluded author, n-gram type, n-gram range, and matrix type.3. Feature Extraction: Contains CSV files with lists of documents by cluster and the most important features along with information gain and information gain ratio metrics. Other file formats included in the dataset are:- CSV files containing Silhouette scores, optimal clustering results, cluster assignments, and optimal cluster assignments.- PNG images of scatter plots colored by author and by cluster.- Pickle files containing the top features extracted during the analysis.
创建时间:
2024-10-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作