Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6957841
下载链接
链接失效反馈官方服务:
资源简介:
Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.
Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.
ger_train.csv – The German training set as CSV file.
ger_validation.csv – The German validation set as CSV file.
en_test.csv – The English test set as CSV file.
en_train.csv – The English training set as CSV file.
en_validation.csv – The English validation set as CSV file.
splitting.py – The python code for splitting a dataset into train, test and validation set.
DataSetTrans_de.csv – The final German dataset as a CSV file.
DataSetTrans_en.csv – The final English dataset as a CSV file.
translation.py – The python code for translating the cleaned dataset.
创建时间:
2022-08-08



