five

ThaiLLM/med-facts

收藏
Hugging Face2025-07-23 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/ThaiLLM/med-facts
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit dataset_info: features: - name: fact_id dtype: string - name: text dtype: string - name: validation struct: - name: grounded dtype: bool - name: subfacts list: - name: supporting_lines list: string - name: text dtype: string - name: source_id dtype: string splits: - name: train num_bytes: 108880814 num_examples: 83237 download_size: 45112187 dataset_size: 108880814 configs: - config_name: default data_files: - split: train path: data/train-* --- # ThaiLLM Dataset: Medical Facts This dataset contains the facts extracted from [medical articles scraped online](https://huggingface.co/datasets/ThaiLLM/med-articles). The facts was extracted using `o4-mini` and also validated using `o4-mini` under different prompt. We also provide [another dataset that assess the validatity of our fact extraction pipeline here](https://huggingface.co/datasets/ThaiLLM/med-fact-verification). ## Fact Extraction Process Given the scraped article (please refer to the source articles dataset [here](https://huggingface.co/datasets/ThaiLLM/med-articles)), we extract facts from source article using the following procedure: 1. Prompt `o4-mini` given the article to extract 4-5 facts from source article. 2. Given the extracted facts from (1) and source article, we also use `o4-mini` with different prompt to remove any facts that LLM flagged as not grounded by the article. The goal is to remove any fact that is hallucinated or not grounded by the source article. (We also provide the dataset that we measure the reliability of `o4-mini`'s verification pipeline with human on [this dataset](https://huggingface.co/datasets/ThaiLLM/med-fact-verification).) ## License This dataset is provided under MIT License. ## Acknowledgement We sincerely appreciate the generous support from the Ministry of Digital Economy and Society whose funding made this project possible. We are also grateful for the invaluable collaboration with VISTEC, and Big Data Institute (BDI) which was crucial in bringing this project to fruition.
提供机构:
ThaiLLM
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作