MultiDomainKeywords Dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10624131
下载链接
链接失效反馈官方服务:
资源简介:
Introduction
The MultiDomainKeywords dataset is a meticulously curated collection created through the collaboration of Generative AI and human oversight. Utilizing advanced AI to generate initial data followed by manual refinement ensures high relevance and quality of the extracted keywords. This dataset spans various domains such as technology, health, finance, education, environment, travel, fashion, food, sports, and arts. It is designed for Natural Language Processing (NLP) tasks and is ideal for researchers and practitioners working on keyword extraction, topic modeling, and semantic analysis.
Dataset Description
This dataset contains 500 entries, each with a unique serial number, title, abstract, and a set of extracted keywords that summarize the main topics of the entry. The dataset spans multiple domains, offering a diverse range of contexts and themes for comprehensive keyword extraction analysis.
Columns
serial number: A unique identifier for each entry.
title: The title of the entry, summarizing the subject.
abstract: A concise summary providing more detail on the entry's subject.
keywords: The extracted keywords representing the main topics of the title and abstract.
Contribution
Contributions to the MultiDomainKeywords dataset are welcome! If you have suggestions for improvements, additional data, or corrections, please open an issue or a pull request.
License
This dataset is available under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which allows others to share, adapt, and build upon this work, commercially or non-commercially, as long as they credit the dataset's creators.
Citation
If you use the MultiDomainKeywords dataset in your research or application, please cite it using the following format: Aradhana Saxena. (2024). MultiDomainKeywords Dataset. Retrieved from https://github.com/aradhana298/Multidomain_keyword
Contact
If you have any questions or feedback regarding the MultiDomainKeywords dataset, please contact at aradhana298@gmail.com or at aradhana@rjit.ac.in
How to Use
To use this dataset, simply download the CSV file and load it into your preferred data analysis tool, such as Python's pandas library, R, or even Excel. Here is a Python snippet to get you started:
import pandas as pd
df = pd.read_csv('path_to_your_file/multi_domain_keyword_extraction_dataset.csv')
print(df.head())
创建时间:
2024-02-06



