five

Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6957841
下载链接
链接失效反馈
官方服务:
资源简介:
Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer. Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset. ger_train.csv – The German training set as CSV file. ger_validation.csv – The German validation set as CSV file. en_test.csv – The English test set as CSV file. en_train.csv – The English training set as CSV file. en_validation.csv – The English validation set as CSV file. splitting.py – The python code for splitting a dataset into train, test and validation set. DataSetTrans_de.csv – The final German dataset as a CSV file. DataSetTrans_en.csv – The final English dataset as a CSV file. translation.py – The python code for translating the cleaned dataset.
创建时间:
2022-08-08
二维码
社区交流群
二维码
科研交流群
商业服务