five

Nason/clinicaltrials-database

收藏
Hugging Face2026-03-17 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Nason/clinicaltrials-database
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - tabular-classification - tabular-regression tags: - clinical-trials - clinicaltrials-gov - aact - duckdb - medical - health - government-data pretty_name: ClinicalTrials.gov Database size_categories: - 10M<n<100M --- # ClinicalTrials.gov Database (AACT) A clean, queryable DuckDB database built from the [AACT flat files](https://aact.ctti-clinicaltrials.org/) -- the most comprehensive structured export of ClinicalTrials.gov data. **56,397,018 rows** across **48 tables** covering every registered clinical trial. Built with [clinicaltrials-database](https://github.com/ian-nason/clinicaltrials-database). ## Quick Start ### DuckDB CLI ```sql INSTALL httpfs; LOAD httpfs; ATTACH 'https://huggingface.co/datasets/Nason/clinicaltrials-database/resolve/main/clinicaltrials.duckdb' AS ct (READ_ONLY); -- Trials by phase SELECT overall_status, COUNT(*) as n FROM ct.studies GROUP BY overall_status ORDER BY n DESC; ``` ### Python ```python import duckdb con = duckdb.connect() con.sql("INSTALL httpfs; LOAD httpfs;") con.sql(""" ATTACH 'https://huggingface.co/datasets/Nason/clinicaltrials-database/resolve/main/clinicaltrials.duckdb' AS ct (READ_ONLY) """) con.sql("SELECT * FROM ct.studies LIMIT 5").show() ``` DuckDB uses HTTP range requests, so only the pages needed for your query are downloaded. ## Tables | Table | Description | Rows | |-------|-------------|------| | `reported_events` | Summary of reported adverse events | 11,446,277 | | `outcome_measurements` | Summary data for outcome measures by study group | 4,777,352 | | `browse_conditions` | NLM-generated MeSH terms for study conditions | 4,194,046 | | `design_outcomes` | Planned outcome measures and observations | 3,566,612 | | `facilities` | Facility names, addresses, and recruiting status | 3,410,973 | | `baseline_measurements` | Demographic and baseline measures by group | 2,808,533 | | `browse_interventions` | NLM-generated MeSH terms for study interventions | 2,480,003 | | `result_groups` | Aggregate list of group titles/descriptions for results reporting | 2,120,694 | | `outcome_counts` | Sample sizes for each outcome by study group | 1,537,817 | | `keywords` | Investigator-provided keywords describing the study | 1,523,516 | | `design_group_interventions` | Cross-reference mapping design groups to interventions | 1,283,125 | | `study_references` | Citations to publications related to the study | 1,078,399 | | `design_groups` | Protocol-specified groups or cohorts assigned to interventions | 1,056,031 | | `conditions` | Disease or condition names studied in each trial | 1,025,650 | | `interventions` | Interventions or exposures: drugs, devices, procedures, vaccines | 973,952 | | `sponsors` | Sponsor and collaborator names and types | 920,618 | | `milestones` | Participant progress through each stage of the study | 855,865 | | `countries` | Countries where the study has sites | 783,722 | | `id_information` | Identifiers other than NCT ID (org study IDs, secondary IDs) | 748,312 | | `outcomes` | Outcome measure descriptions and time frames | 638,016 | | `outcome_analysis_groups` | Groups involved in each outcome analysis | 609,509 | | `drop_withdrawals` | Summary of participant withdrawals: counts and reasons | 577,923 | | `reported_event_totals` | Totals of reported adverse events by category | 577,536 | | `calculated_values` | AACT-computed fields: months to report results, facilities count, etc. | 576,029 | | `studies` | Core study record: title, status, phase, type, dates, enrollment, and regulatory info | 576,029 | | `brief_summaries` | Brief text summary of the study protocol | 575,064 | | `detailed_descriptions` | Detailed text description of the study protocol | 575,064 | | `eligibilities` | Participant eligibility criteria (inclusion/exclusion, age, sex) | 575,064 | | `designs` | Study design details: allocation, masking, assignment, purpose | 571,286 | | `responsible_parties` | Parties responsible for submitting study information | 557,553 | | `overall_officials` | People responsible for overall scientific leadership | 519,601 | | `search_term_results` | | 494,618 | | `intervention_other_names` | Synonymous names for interventions | 470,456 | | `facility_contacts` | Contact information at each study facility | 409,886 | | `outcome_analyses` | Statistical analyses performed on outcomes | 315,026 | | `baseline_counts` | Sample sizes at baseline for each study group | 230,199 | | `facility_investigators` | Investigator names at each study facility | 223,509 | | `central_contacts` | Primary and backup contacts for enrollment questions | 217,960 | | `ipd_information_types` | Individual participant data sharing information types | 87,849 | | `provided_documents` | Protocol, SAP, and informed consent form documents | 78,018 | | `result_contacts` | Points of contact for scientific information about results | 76,963 | | `result_agreements` | Agreements between sponsor and principal investigators about results | 76,963 | | `participant_flows` | Recruitment and pre-assignment details | 76,963 | | `links` | Web links relevant to the study | 74,921 | | `pending_results` | Events related to submission of results for QC review | 32,304 | | `documents` | Full study protocol and statistical analysis plan | 10,692 | | `retractions` | Retraction notices for study results or publications | 334 | | `search_terms` | | 186 | ## Data Source [AACT (Aggregate Analysis of ClinicalTrials.gov)](https://aact.ctti-clinicaltrials.org/) -- maintained by the Clinical Trials Transformation Initiative (CTTI). Updated daily from ClinicalTrials.gov. This is public domain U.S. government data. ## License Database build code: MIT. Underlying data: public domain (U.S. government work). ## GitHub Full source code, build instructions, and data dictionary: [github.com/ian-nason/clinicaltrials-database](https://github.com/ian-nason/clinicaltrials-database)
提供机构:
Nason
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作