Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/6957841

下载链接

链接失效反馈

官方服务：

资源简介：

Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer. Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset. ger_train.csv – The German training set as CSV file. ger_validation.csv – The German validation set as CSV file. en_test.csv – The English test set as CSV file. en_train.csv – The English training set as CSV file. en_validation.csv – The English validation set as CSV file. splitting.py – The python code for splitting a dataset into train, test and validation set. DataSetTrans_de.csv – The final German dataset as a CSV file. DataSetTrans_en.csv – The final English dataset as a CSV file. translation.py – The python code for translating the cleaned dataset.

创建时间：

2022-08-08