TMIP_MD: Tibetan Medical Classics Entity Term Dataset

Name: TMIP_MD: Tibetan Medical Classics Entity Term Dataset
Creator: Science Data Bank
Published: 2026-01-27 02:37:20
License: 暂无描述

DataCite Commons2026-01-27 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=4653cc3c8bac4985be70ffbc9bcd8c13

下载链接

链接失效反馈

官方服务：

资源简介：

We proposes a Hybrid Adaptive Word Segmentation (HAWS) method based on a previously constructed Tibetan Medicine entity dictionary, to extract entity terms from ancient Tibetan medical texts and to build and release a Tibetan Medicine entity dataset (TMIP_ETD). Specifically, First, the 100 ancient books (333 volumes) included in the “Collection of Classical Tibetan Medicine Literature” were digitized, resulting in a text dataset of 13.8 million syllables. Then, the HAWS method was used for tokenization, and potential entities with a frequency of at least 12 were extracted. After manual verification, a dataset of Tibetan medicine classics entity terms containing 34,601 entries was obtained. This Dataset providing data foundation for knowledge mining and digital inheritance of Tibetan medicine ancient classics.

提供机构：

Science Data Bank

创建时间：

2026-01-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集