Hospitalized Acute Pancreatitis 24‑Hour Admission Laboratory and Outcome Dataset (2019–2023, Dongyang, China)
收藏DataCite Commons2025-12-05 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=7a58cb68d4164bfe93167efdda4b3800
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains de-identified clinical and laboratory data from adult patients hospitalized with acute pancreatitis at the Department of Gastroenterology, Affiliated Dongyang Hospital, Wenzhou Medical University (Dongyang, Zhejiang Province, China). The data were collected retrospectively from the hospital information system for all consecutive admissions between January 1, 2019 and December 31, 2023, and were used to develop and evaluate a 24‑hour four‑dimensional composite index for early risk stratification. The temporal coverage of the dataset is 5 years (2019–2023). All patients were treated in a single tertiary hospital in Dongyang, China; there is no finer spatial resolution such as ward-level or geographic coordinates in order to protect patient privacy.The core data file is provided as a table with 438 patient-level records (rows) and approximately 40–60 variables (columns), depending on the final version. Each row corresponds to one hospitalization episode fulfilling the diagnostic criteria for acute pancreatitis in adults (age ≥ 18 years). Columns include: (1) basic demographic variables (e.g., age in years, sex as male/female); (2) lifestyle and comorbidity variables (e.g., smoking and alcohol status as yes/no, history of hypertension and diabetes as yes/no); (3) routine laboratory measurements within 24 hours of admission, including inflammatory markers [procalcitonin (PCT, ng/mL), C-reactive protein (CRP, mg/L), white blood cell count (WBC, ×10^9/L)], coagulation indices [D-dimer, μg/mL fibrinogen-equivalent units], oxygenation and perfusion parameters [arterial lactate, mmol/L; partial pressure of arterial oxygen PaO₂, mmHg; partial pressure of arterial carbon dioxide PaCO₂, mmHg; fraction of inspired oxygen FiO₂, unitless fraction], and electrolyte–metabolic indices [serum calcium, mmol/L; serum magnesium, mmol/L; serum phosphate, mmol/L; hemoglobin, g/L]. (4) Derived variables and scores, including three conventional severity scores [Modified Marshall score, SOFA score, SIRS criteria count] and the four-dimensional composite indices [Inflammation, Hypoxia/Low perfusion (HLO), Electrolyte–Metabolic Disturbance Index (EDI), and log‑transformed standardized D-dimer (log1p_Ddimer_z)]. (5) Outcome variables, including a composite adverse outcome indicator (yes/no) capturing organ failure (defined by the Modified Marshall scoring system), severe acute pancreatitis according to the Revised Atlanta Classification, ICU admission, infectious complications, need for interventional procedures (e.g., percutaneous drainage, surgery), venous thromboembolism events (deep vein thrombosis, pulmonary embolism), and in-hospital death, as well as length of hospital stay in days.Data processing followed a prespecified pipeline designed to support reproducible prognostic modeling. First, raw data were extracted from the electronic hospital information system and underwent de-identification (removal of names, IDs, exact dates, bed numbers and any direct identifiers). Second, data cleaning steps included range checks, unit harmonization (e.g., ensuring all D-dimer values in μg/mL FEU, electrolytes in mmol/L), and logical consistency checks between variables (e.g., outcome dates versus admission dates). Third, deficit variables for electrolytes were constructed by subtracting the observed value from the lower limit of normal: ΔCa = max(0, LLN_Ca − Ca) with LLN_Ca = 2.02 mmol/L; ΔMg = max(0, LLN_Mg − Mg) with LLN_Mg = 0.67 mmol/L; ΔP = max(0, LLN_P − P) with LLN_P = 0.96 mmol/L. Fourth, to reduce the influence of extreme values and accommodate zero measurements, log(1 + x) transformations were applied to PCT, CRP, lactate, and D-dimer. All continuous variables used in the composite indices were then standardized using z‑scores based on the study sample. The four-dimensional composites were constructed as follows: Inflammation = log(1+PCT)_z + log(1+CRP)_z + log(1+WBC)_z; Coagulation = log1p_Ddimer_z; HLO = −(Hemoglobin)_z + (log1p_Lactate)_z + (PaCO₂)_z − (PaO₂/FiO₂)_z; EDI = ΔCa_z + ΔMg_z + ΔP_z. These derived variables are provided in the dataset alongside the raw or minimally processed measurements so that users can either reuse the existing derivations or reconstruct them independently.Missing data were present for some laboratory parameters and clinical variables due to the retrospective nature of data collection and differences in ordering patterns by clinicians. In the original analysis presented in the accompanying manuscript, k‑nearest neighbors (KNN) imputation was applied to continuous variables and mode imputation to categorical variables for model fitting, while descriptive statistics were reported on the available raw values. In this shared dataset, we provide both the pre‑imputation raw variables and, where appropriate, the imputed and transformed variables used for modeling. Users should consult the accompanying README file and data dictionary for detailed information on variable-level missingness patterns, coding schemes (including −99 or NA as missing indicators where applicable), and recommendations for handling missing data according to their own research questions.As this dataset is based on real-world clinical measurements, it inevitably contains measurement error and variability arising from laboratory assays, arterial blood gas sampling, and routine clinical practice (e.g., timing of blood draws relative to admission, oxygen therapy titration affecting PaO₂ and FiO₂). No additional experimental calibration beyond standard hospital quality control was performed. These sources of error are discussed in the associated research article as potential contributors to statistical noise; they should be taken into account when interpreting fine-grained parameter estimates. Nevertheless, internal consistency checks (e.g., physiologic plausibility ranges, cross-checks between related variables) were performed to minimize gross data entry errors.The dataset is provided primarily in widely used tabular formats (e.g., CSV and/or Excel .xlsx files) that can be opened with common software such as Microsoft Excel, LibreOffice Calc, and statistical packages (R, Python, Stata, SPSS). The main file AP_24h_composite_raw.csv (or .xlsx) contains the raw and basic derived variables for 438 patients. Optional supplementary files include: (1) AP_24h_composite_imputed.csv, which contains the preprocessed and imputed dataset used for model fitting; (2) variable_dictionary.pdf or .xlsx, which describes each variable name, definition, unit, type (continuous/categorical), allowed range, and missing-value coding; and (3) analysis_code.R, which provides example R code for data preprocessing and replication of key models. No specialized or proprietary software is required beyond standard data analysis tools.
提供机构:
Science Data Bank
创建时间:
2025-12-05



