timmers/hpo-cultural-pt-curated
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/timmers/hpo-cultural-pt-curated
下载链接
链接失效反馈官方服务:
资源简介:
这是一个经过整理的巴西俚语与人类表型本体(HPO)标准术语对应的数据集。首次公开发布。数据集包含两个文件:cultural_pairs.json包含705对巴西俚语与HPO标准英文术语的对应关系,每对包含俚语、标准术语、注册信息和地区信息;hard_negatives.json包含72个用于消歧的三元组(锚点、正例、负例)。示例包括água na cabeça对应Hydrocephalus (HP:0000238),esparro (BR-NE)对应Seizure (HP:0001250)等。数据集用途是训练葡萄牙语(巴西)编码器以用于HPO。
A curated dataset of Brazilian colloquial synonyms for HPO phenotypes. First time made openly available. The dataset includes two files: cultural_pairs.json with 705 pairs (colloquial PT anchor → HPO canonical EN) including register, region, and hpo_id; hard_negatives.json with 72 triplets (anchor, positive, negative) for disambiguation. Examples include água na cabeça → Hydrocephalus (HP:0000238), esparro (BR-NE) → Seizure (HP:0001250), etc. Intended use is training PT-BR encoders for HPO.
提供机构:
timmers



