five

Dates related to the research results presented in the article: Conceptual Framework for Clustering, Labeling, and Evaluating Scientific Articles with Embedding Models and Bibliometric Analysis

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/v63vrhgwxy
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Description for the Article: "Conceptual Framework for Clustering, Labeling, and Evaluating Scientific Articles with Embedding Models and Bibliometric Analysis" The dataset underpins the experiments described in the article, which proposes a framework for clustering, labeling, and evaluating scientific publications using both statistical and bibliometric indicators. It consists of three CSV files, each corresponding to a different collection of scholarly publications (small, medium, and large). All publications were retrieved from the Scopus database based on specific search queries. CSV Files (3 in total) - Each CSV file contains: Ranked lists of scholarly articles retrieved from Scopus, supplemented with computed vectors. Embedding vectors representing article content (title + abstract). Distance metrics (e.g., cosine distance) to enable quick similarity comparisons. Together, these CSV files mirror the experimental setup discussed in the article, allowing reproduction of the clustering and labeling processes, as well as subsequent evaluations via bibliometric and statistical approaches. Data Structure: Standard Metadata Columns (from Scopus export): These include typical bibliographic information such as: Title, Authors, Year, DOI, Source title, Abstract, Keywords, Affiliations, Document Type, Cited by, and others. Computed Columns (for experimentation): combined_embeddings or article_embedding: A list of numeric values representing the semantic embedding vector generated from the concatenated title and abstract of the publication. distance_cosine: The cosine distance between the publication’s embedding and a reference embedding (e.g., based on a user query). Values range from 0 to 1, where lower values indicate higher semantic similarity. Purpose and Use: These data support the evaluation of embedding-based clustering, labeling, and bibliometric methods for automating systematic literature reviews. They serve as reproducible material for the experiments described in the paper.
创建时间:
2025-06-05
二维码
社区交流群
二维码
科研交流群
商业服务