five

MultiDomainKeywords Dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10624131
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction The MultiDomainKeywords dataset is a meticulously curated collection created through the collaboration of Generative AI and human oversight. Utilizing advanced AI to generate initial data followed by manual refinement ensures high relevance and quality of the extracted keywords. This dataset spans various domains such as technology, health, finance, education, environment, travel, fashion, food, sports, and arts. It is designed for Natural Language Processing (NLP) tasks and is ideal for researchers and practitioners working on keyword extraction, topic modeling, and semantic analysis. Dataset Description This dataset contains 500 entries, each with a unique serial number, title, abstract, and a set of extracted keywords that summarize the main topics of the entry. The dataset spans multiple domains, offering a diverse range of contexts and themes for comprehensive keyword extraction analysis. Columns serial number: A unique identifier for each entry. title: The title of the entry, summarizing the subject. abstract: A concise summary providing more detail on the entry's subject. keywords: The extracted keywords representing the main topics of the title and abstract. Contribution Contributions to the MultiDomainKeywords dataset are welcome! If you have suggestions for improvements, additional data, or corrections, please open an issue or a pull request. License This dataset is available under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which allows others to share, adapt, and build upon this work, commercially or non-commercially, as long as they credit the dataset's creators. Citation If you use the MultiDomainKeywords dataset in your research or application, please cite it using the following format: Aradhana Saxena. (2024). MultiDomainKeywords Dataset. Retrieved from https://github.com/aradhana298/Multidomain_keyword Contact If you have any questions or feedback regarding the MultiDomainKeywords dataset, please contact at aradhana298@gmail.com or at aradhana@rjit.ac.in How to Use To use this dataset, simply download the CSV file and load it into your preferred data analysis tool, such as Python's pandas library, R, or even Excel. Here is a Python snippet to get you started: import pandas as pd df = pd.read_csv('path_to_your_file/multi_domain_keyword_extraction_dataset.csv') print(df.head())
创建时间:
2024-02-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作