claritystorm/fda-faers-drug-adverse-events
收藏Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/claritystorm/fda-faers-drug-adverse-events
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: public-domain
task_categories:
- tabular-classification
- tabular-regression
tags:
- healthcare
- pharmacovigilance
- drug-safety
- fda
- adverse-events
- united-states
pretty_name: FDA FAERS Drug Adverse Events 2023
size_categories:
- 1M<n<10M
---
# FDA FAERS Drug Adverse Events 2023
FDA Adverse Event Reporting System (FAERS) data for 2023 — cleaned, deduplicated,
and structured for pharmacovigilance and drug safety research.
1.5M+ adverse event reports across 7 relational tables, with normalised drug names
and MedDRA preferred reaction terms.
**This repository contains a 1,000-row sample of the Demographics table (Public Domain).**
Full dataset (all 7 tables in CSV + Parquet) available at
[claritystorm.com/datasets/fda-faers](https://claritystorm.com/datasets/fda-faers).
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("claritystorm/fda-faers-drug-adverse-events")
import pandas as pd
df = pd.read_csv("sample_1000.csv")
print(df["sex_label"].value_counts())
print(df["age_years"].describe())
```
## Schema (DEMO table, selected fields)
| Field | Type | Description |
|-------|------|-------------|
| primaryid | string | Unique report identifier |
| caseid | string | Case ID (groups versions of same case) |
| caseversion | int | Case version (deduped: latest version kept) |
| fda_dt | string | FDA receipt date (YYYY-MM-DD) |
| rept_dt | string | Report date (YYYY-MM-DD) |
| age_years | float | Patient age in years (normalised) |
| sex_label | string | Male / Female / Unknown |
| reporter_type | string | Physician / Consumer / Pharmacist / etc. |
| report_type | string | Expedited / Periodic / Direct / Voluntary |
| wt_kg | float | Patient weight in kg (normalised) |
| _quarter | string | Source quarter (e.g. 2023Q1) |
## Tables in Full Dataset
- **demo** — 1.5M+ rows: one row per deduplicated adverse event report
- **drug** — 7.4M+ rows: drugs involved in each report
- **reac** — 5.8M+ rows: MedDRA preferred reaction terms per report
- **outc** — 1.2M+ rows: serious outcome codes per report (Death, Hospitalization, etc.)
- **ther** — 2.6M+ rows: drug therapy start/end dates per report
- **indi** — 4.5M+ rows: drug indication (reason for use) per report
- **rpsr** — 52K+ rows: report source per report
All tables join on `primaryid`.
## Source
US Food and Drug Administration (FDA), Adverse Event Reporting System (FAERS).
FDA FAERS data is a US federal government work in the **public domain** (17 U.S.C. 105).
Processed by [ClarityStorm Data](https://claritystorm.com).
许可证:其他
许可证名称:公有领域
任务类别:
- 表格分类
- 表格回归
标签:
- 医疗健康
- 药物警戒
- 药物安全
- 美国食品药品监督管理局(Food and Drug Administration, FDA)
- 不良事件
- 美国
数据集展示名称:FDA FAERS 2023年药物不良事件
样本量范围:100万 < n < 1000万
# FDA FAERS 2023年药物不良事件数据集
本数据集包含2023年美国食品药品监督管理局(Food and Drug Administration, FDA)不良事件报告系统(Adverse Event Reporting System, FAERS)的公开数据,经清洗、去重并完成结构化处理,可用于药物警戒与药物安全相关研究。数据集涵盖7张关系型数据表,包含超150万条不良事件报告,且已对药物名称与国际医学监管活动词典(Medical Dictionary for Regulatory Activities, MedDRA)首选反应术语进行标准化处理。
**本仓库包含人口统计学表(Demographics table)的1000行样本数据(公有领域授权)。** 完整数据集(含7张数据表的CSV与Parquet格式文件)可访问 [claritystorm.com/datasets/fda-faers](https://claritystorm.com/datasets/fda-faers) 获取。
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("claritystorm/fda-faers-drug-adverse-events")
import pandas as pd
df = pd.read_csv("sample_1000.csv")
print(df["sex_label"].value_counts())
print(df["age_years"].describe())
## 数据表结构(DEMO表,选取字段)
| 字段名 | 数据类型 | 字段说明 |
|-------|------|-------------|
| primaryid | 字符串 | 唯一报告标识符 |
| caseid | 字符串 | 病例ID(用于分组同一病例的不同版本) |
| caseversion | 整数 | 病例版本(去重时保留最新版本) |
| fda_dt | 字符串 | FDA接收日期(格式为YYYY-MM-DD) |
| rept_dt | 字符串 | 报告日期(格式为YYYY-MM-DD) |
| age_years | 浮点数 | 患者年龄(年,已标准化) |
| sex_label | 字符串 | 性别标签:男性/女性/未知 |
| reporter_type | 字符串 | 报告者类型:医师/消费者/药师等 |
| report_type | 字符串 | 报告类型:快速报告/定期报告/直接报告/自愿报告等 |
| wt_kg | 浮点数 | 患者体重(千克,已标准化) |
| _quarter | 字符串 | 数据来源季度(例如2023Q1) |
## 完整数据集包含的数据表
- **demo表**:超150万行,每行对应一条去重后的不良事件报告
- **drug表**:超740万行,记录每份报告中涉及的药物
- **reac表**:超580万行,记录每份报告对应的MedDRA首选反应术语
- **outc表**:超120万行,记录每份报告的严重结局代码(死亡、住院等)
- **ther表**:超260万行,记录每份报告的药物治疗起止日期
- **indi表**:超450万行,记录每份报告的药物适应症(用药原因)
- **rpsr表**:超5.2万行,记录每份报告的报告来源
所有数据表均通过`primaryid`字段进行关联。
## 数据来源
美国食品药品监督管理局(Food and Drug Administration, FDA)不良事件报告系统(FAERS)。FDA FAERS数据属于美国联邦政府作品,处于**公有领域**(17 U.S.C. 105)。本数据集由 [ClarityStorm Data](https://claritystorm.com) 整理加工。
提供机构:
claritystorm



