A Curated Dataset of Research Abstracts on AI and Large Language Models in Information Management and Librarianship (2020-2024)
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/hs9hy54hzv
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 303 curated abstracts of research articles focusing on the application of Artificial Intelligence (AI) and Large Language Models (LLMs) in information management, library science, and related services. The data was systematically collected from the Scopus database for publications between 2020 and 2024.
The original search result yielded 498 documents. This was refined to 307 records upon exporting abstract data. A final set of 303 high-quality abstracts was established after a rigorous preprocessing and cleaning pipeline to ensure data integrity and suitability for natural language processing (NLP) and semantic analysis tasks.
This collection is ideal for researchers working in:
Text Mining and Semantic Clustering
Topic Modeling and Research Trend Analysis
Natural Language Processing (NLP) Applications
Bibliometric and Scientometric Studies
AI in Libraries and Information Services
Keywords: Artificial Intelligence; Large Language Models; Natural Language Processing; Information Management; Librarianship; Research Abstracts; Text Dataset; Scopus.
2. For the Cluster Assignments File
File Name: abstracts_AI_LLM_librarianship_cluster_assignments.csv
Title:
Semantic Cluster Assignments for a Corpus of AI/LLM in Librarianship Research Abstracts
Description:
This file provides the semantic cluster assignments for the corresponding dataset "A Curated Dataset of Research Abstracts on AI and Large Language Models in Information Management and Librarianship (2020-2024)". The clusters were generated by applying K-means clustering to average GloVe word embeddings of the article abstracts.
The analysis identified 7 distinct, semantically coherent research themes, which are detailed in the associated research article. This data is provided to facilitate reproducibility, further analysis, and to serve as a ground truth for comparative studies in semantic clustering and topic discovery.
The file includes the document identifier and its corresponding cluster label (0-6). The dominant themes for each cluster are:
Cluster 0: Healthcare Information Systems
Cluster 1: Information Retrieval Systems
Cluster 2: Research Data Management (RDM)
Cluster 3: Digital Library Adoption
Cluster 4: Knowledge Management
Cluster 5: Reference Services
Cluster 6: Scientific Publishing
Keywords: Semantic Clustering; K-means; Cluster Labels; Research Themes; Topic Discovery; GloVe Embeddings; Text Mining.
创建时间:
2025-10-01



