This project aims to identify the optimal window size for classifying important citations in research papers. By determining the best window size, we can enhance the accuracy and efficiency of citation importance detection, which plays a crucial role in understanding the significance of references in academic writing.
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14553960
下载链接
链接失效反馈官方服务:
资源简介:
Purpose:This project focuses on optimizing the window size parameter for semantic classification models to identify important in-text citations within research papers. The goal is to enhance the accuracy and efficiency of citation importance detection, which aids in understanding the contextual and intent-based significance of citations in academic writing.
Datasets Used:
SciCite: A benchmark dataset for citation intent classification.
ACL-ARC: A dataset commonly used for citation context analysis.
Methodology:
Multiple machine learning and deep learning models were implemented, including:
Machine Learning Models: SVM, Naive Bayes, Decision Tree.
Deep Learning Models: CNN, LSTM, GRU.
The models were evaluated at different window sizes to determine the one that yields optimal performance.
Metrics such as accuracy, precision, and recall were used to assess the models.
Results:
The experiments revealed that a window size of 10 produced the most accurate and reliable results across various models.
This finding highlights the importance of parameter tuning in citation classification tasks.
Conclusion:This project demonstrates the critical role of selecting an optimal window size in improving the performance of semantic classification models. The results provide valuable insights for future research in citation analysis and related fields.
Significance:Accurately classifying important citations can:
Help in evaluating the influence of research work.
Improve tools for literature reviews and scientific discovery.
Aid automated systems in summarizing or prioritizing references in academic writing.
Repository Structure:The project repository contains the following:
data/: Preprocessed datasets from SciCite and ACL-ARC.
models/: Implemented machine learning and deep learning models.
results/: Performance metrics and visualizations for different window sizes.
scripts/: Code for training and evaluating the models
创建时间:
2024-12-25



