five

TMIP_MD: Tibetan Medical Classics Entity Term Dataset

收藏
DataCite Commons2026-01-27 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=4653cc3c8bac4985be70ffbc9bcd8c13
下载链接
链接失效反馈
官方服务:
资源简介:
We proposes a Hybrid Adaptive Word Segmentation (HAWS) method based on a previously constructed Tibetan Medicine entity dictionary, to extract entity terms from ancient Tibetan medical texts and to build and release a Tibetan Medicine entity dataset (TMIP_ETD). Specifically, First, the 100 ancient books (333 volumes) included in the “Collection of Classical Tibetan Medicine Literature” were digitized, resulting in a text dataset of 13.8 million syllables. Then, the HAWS method was used for tokenization, and potential entities with a frequency of at least 12 were extracted. After manual verification, a dataset of Tibetan medicine classics entity terms containing 34,601 entries was obtained. This Dataset providing data foundation for knowledge mining and digital inheritance of Tibetan medicine ancient classics.
提供机构:
Science Data Bank
创建时间:
2026-01-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作