five

Indexed NLP Article Metadata Dataset

收藏
DataONE2023-10-20 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:f72735dba51e808019cdc495db44b6258bf5cb58d161aaf7f22fe1a1aa3126f7
下载链接
链接失效反馈
官方服务:
资源简介:
his dataset consists of a curated collection of published, indexed articles (N=75527) related to Natural Language Processing (NLP) collected from Web Of Science, along with a classification into one of five categories depending on the approach to NLP used. Category 4: The abstract does not mention a particular model or technique. Some papers analyzing frameworks, surveys, papers centered the computer vision component of NLP and dataset proposals among others fall into this category. Category 0 (Rule-Based): A model based on rules or symbolic analysis is used. Category 1 (Statistical Methods): An approach using statistical methods is used. This includes BoWs, N-Grams, TF-IDF, along with other machine learning techniques like SVMs, Logistic Regression, LDA and others. Shallow neural network models like word2vec also belong in this category. Category 2 (Deep Learning): Approaches that use Deep Learning and other Deep Neural Network architectures such as RNNs, CNNs and LSTM are included in this category. Category 3 (Transformer Models): The approach proposed uses transformer based models, like BERT, GPT, T5 and others. It is to note that the classification could be imprecise, is not strictly defined and should be used only as a starting point. Fields: 'Authors', 'Article Title', 'Volume', 'Issue', 'Special Issue', 'Start Page', 'End Page', 'DOI', 'Book DOI', 'Publication Date', 'Times Cited', 'ISSN', 'eISSN', 'Author Full Names', 'Book Author Full Names', 'Language', 'Author Keywords', 'Keywords', 'Funding Orgs', 'Funding Text', 'Cited References', 'DOI Link', 'Number of Pages', 'Categories', 'Research Areas', 'bert_preds', 'setfit_preds', 'knn_preds', 'abstract_hash'. The dataset is provided in different formats. To address potential copyright, licensing, and data privacy concerns, we have replaced the original abstracts with SHA-256 hashes, cryptographic representations of the abstracts' content. Please note that the copyright and licensing status of the original articles may vary, and users should respect any applicable terms and restrictions associated with the source publications.
创建时间:
2023-12-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作