five

Kornimate/medical-research-clean

收藏
Hugging Face2025-11-27 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Kornimate/medical-research-clean
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit dataset_info: features: - name: nct_id dtype: string - name: brief_title_clean dtype: string - name: brief_summary_clean dtype: string - name: detailed_description_clean dtype: string - name: eligibility_criteria_clean dtype: string - name: keywords_clean dtype: string - name: mesh_terms_clean dtype: string - name: condition_browse_module_clean dtype: string - name: intervention_browse_module_clean dtype: string - name: conditions list: string - name: interventions dtype: 'null' - name: combined_text dtype: string - name: text_len dtype: int64 splits: - name: train num_bytes: 3499875492 num_examples: 479038 download_size: 1707293695 dataset_size: 3499875492 configs: - config_name: default data_files: - split: train path: data/train-* language: - en --- This dataset is a modified format of the ```louisbrulenaudet/clinical-trials``` dataset. The following is the content of this modified dataset: - (#1) ```nct_id``` is unique id for each research - (#2) ```brief_title_clean``` is *cleaned* version of original ```brief_title``` feature, plain text format - (#3) ```brief_summary_clean``` is *cleaned* version of original ```brief_summary``` feature, plain text format - (#4) ```detailed_description_clean``` is *cleaned* version of original ```detailed_description``` feature, plan text format - (#5) ```eligibility_criteria_clean``` is *cleaned* version of original ```eligibility_criteria``` feature, plain text format - (#6) ```keywords_clean``` is *normalized* version of original ```keywords``` feature, plain text format - (#7) ```mesh_terms_clean``` is *cleaned* version of original ```mesh_terms``` feature, plain text format - (#8) ```condition_browse_module_clean``` is *cleaned* version of original ```condition_browse_module``` feature, plain text format - (#9) ```intervention_browse_module_clean``` is *cleaned* version of original ```intervention_browse_module``` feature, plain text format - (#10) ```conditions``` is *cleaned* version of original ```conditions``` feature, plain text format - (#11) ```interventions``` is *cleaned* version of original ```interventions``` feature, plain text format - (#12) ```combined_text``` is concatenated version of **#1 - #8** with removed stopwords and lemmatized - (#13) ```text_len``` is text length of **#12** The term *cleaned* means the following transformations: - if it was a plain text feature originally, then: removed HTML tags, casefolded, removed trailing whitespaces - if it was a strcutured text feature, then: filtered for specific keys: ```["meshes","browseLeaves","browseBranches","ancestors","conditions","interventions"]```, filtered for only existing and valid terms, text content lowercased and stripped/trimmed, rejoined with space delimiter The term *normalized* means that the content was splitted at whitespaces, casefolded, trimmed/stripped and rejoined with space delimiter
提供机构:
Kornimate
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作