TMIP_MD: Tibetan Medical Classics Entity Term Dataset
收藏DataCite Commons2026-01-27 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=4653cc3c8bac4985be70ffbc9bcd8c13
下载链接
链接失效反馈官方服务:
资源简介:
We proposes a Hybrid Adaptive Word Segmentation (HAWS) method based on a previously constructed Tibetan Medicine entity dictionary, to extract entity terms from ancient Tibetan medical texts and to build and release a Tibetan Medicine entity dataset (TMIP_ETD). Specifically, First, the 100 ancient books (333 volumes) included in the “Collection of Classical Tibetan Medicine Literature” were digitized, resulting in a text dataset of 13.8 million syllables. Then, the HAWS method was used for tokenization, and potential entities with a frequency of at least 12 were extracted. After manual verification, a dataset of Tibetan medicine classics entity terms containing 34,601 entries was obtained. This Dataset providing data foundation for knowledge mining and digital inheritance of Tibetan medicine ancient classics.
提供机构:
Science Data Bank
创建时间:
2026-01-27



