five

boun-tabilab/Thesis-Abstract-Classification-11K

收藏
Hugging Face2025-12-15 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/boun-tabilab/Thesis-Abstract-Classification-11K
下载链接
链接失效反馈
官方服务:
资源简介:
Thesis-Abstract-Classification-11K数据集是通过处理土耳其学术论文数据集的一个子集构建而成的。原始数据集较大且每个样本包含多个主题字段。为了构建一个合理大小的单类别分类问题数据集,进行了以下处理:对于每个样本,仅保留第一个主题字段作为标签;删除样本数少于60的标签;为每个标签随机选择60个样本,最终构建了一个包含11,220个样本的数据集,涵盖187个独特标签。数据集包含两个字段:文本(论文摘要)和标签(论文领域)。分割方法根据原始数据的分割情况进行了详细说明,包括如何处理不同的分割情况。

Thesis-Abstract-Classification-11K dataset is obtained by processing a subset of Turkish Academic Theses dataset. The original dataset was large and examples had several subject fields, representing the field of the thesis. In order to construct a single-class classification problem with a reasonable data size, the following steps are carried out: For each example, only the first value of subject field was kept as the main field of the thesis to act as a label. Data points for a label with less than 60 examples were dropped, which resulted in 187 unique labels. Random 60 examples for each label is selected to construct a dataset of 11,220 examples. The dataset contains two fields: text (thesis abstract) and label (field of the thesis). The split methodology is detailed based on the original data splits, including how to handle different split scenarios.
提供机构:
boun-tabilab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作