five

List of academic disciplines in Chinese and English for data transformation

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14845566
下载链接
链接失效反馈
官方服务:
资源简介:
Overview: This dataset provides a bilingual list of academic disciplines, extracted from various sources in Chinese and translated into English. It includes a broad range of terms beyond strictly defined academic disciplines, encompassing subfields, interdisciplinary categories, and related academic domains. The dataset is designed to be reusable for various applications, such as term matching across corpora and extracting relevant terms from new datasets. Structure: The dataset consists of multiple columns, each representing different levels of classification for academic disciplines. The primary columns include: Discipline (Chinese Simplified & Traditional): The name of the discipline in Chinese. Discipline (English): The corresponding English translation of the discipline. Discipline Level 2 (Chinese & English): A more specific categorization within a broader academic category. Discipline Level 1 (Chinese & English): A higher-level classification grouping multiple related disciplines. Discipline Level 0 (Chinese & English): The broadest classification, representing major academic fields. Level1_code: A numerical or coded identifier for Level 1 disciplines, which may be useful for structured data processing. Purpose & Applications: Term Matching: The dataset can be used to match extracted terms from other corpora, ensuring consistency across multilingual sources. Hierarchical Classification: The multi-level structure allows users to analyze disciplines at different granularities. Corpus Analysis & Text Mining: The dataset facilitates term extraction and standardization in computational text analysis projects. Cross-Linguistic Comparisons: Researchers can use the bilingual nature of the dataset to study the relationship between Chinese and English academic terminologies. Potential Use Cases: Automated classification of academic articles based on discipline. Developing bilingual glossaries for research institutions. Improving machine learning models for academic domain recognition. Data Quality Considerations: Some terms may appear at multiple levels, reflecting differences in classification across sources. The dataset structure should be checked for delimiter consistency before processing in automated systems.
创建时间:
2025-02-10
二维码
社区交流群
二维码
科研交流群
商业服务