lcalvobartolome/proxann_data
收藏Hugging Face2025-10-30 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/lcalvobartolome/proxann_data
下载链接
链接失效反馈官方服务:
资源简介:
PROXANN数据集提供了用于训练和评估主题模型的相关语料库,来源于PROXANN研究。该数据集包括Bills和Wiki两个子数据集,每个子数据集都包含训练集和测试集。训练集包含上下文嵌入信息,而测试集仅包含元数据信息。此外,数据集还提供了在预处理和模型训练期间使用的15k-token词汇表。
The PROXANN Data provides the corpora used for training and evaluating topic models in the PROXANN research. The dataset includes two subsets, Bills and Wiki, each with training and test splits. The training sets contain contextual embeddings, while the test sets include only metadata. Additionally, the dataset provides the 15k-token vocabularies used during preprocessing and model training.
提供机构:
lcalvobartolome



