The Lancet Archive 1823-1930
收藏Snowflake2026-04-24 更新2026-04-25 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZSXZGPW469H
下载链接
链接失效反馈官方服务:
资源简介:
Complete pre-1930 archive of The Lancet, one of the world's oldest and most prestigious medical journals, founded in 1823 by Thomas Wakley. **851,796 rows** of clean, structured text spanning from the first volume through 1930.
**What this data does for your model:**
- *Without this data:* Model invents plausible‑sounding 19th‑century surgical techniques.<br/>*With this data:* Model retrieves actual Lancet lectures describing nasal polyp removal, hare‑lip repair, and mercury treatments.
- *Without this data:* Model confabulates historical hospital case studies.<br/>*With this data:* Model cites real patient records from St. Thomas's, Middlesex, and Guy's Hospital.
- *Without this data:* Model mistakes modern drug names for 19th‑century pharmaceuticals.<br/>*With this data:* Model learns authentic Victorian pharmacology (calomel, opium, quinine, antimony).
- *Without this data:* Model misses the historical context of medical reform.<br/>*With this data:* Model understands The Lancet's founding mission to expose medical corruption and demand accountability.
<p><br/></p>
**What's inside:**
- Foundational surgical lectures and clinical teaching
- Original research from 19th-century medicine
- Hospital case reports (St. Thomas's, Middlesex, Guy's)
- Medical reform and political commentary
- Early chemistry and pharmacology
**Perfect for:**
- LLM fine-tuning on 19th-century medical text
- Clinical NLP and surgical terminology extraction
- History of medicine and digital humanities
- Medical education and curriculum development
- **Perfect for RAG applications** - ground LLM responses in primary source medical text from 1823–1930. Ideal for clinical decision support, medical history research, and retrieval-augmented generation systems.
**Format:** Snowflake-native JSONL with columns: ISSUE, TITLE, AUTHOR, TYPE, TEXT. Fully cleaned, bias-audited, and ready for AI training.
*From the first issue in 1823 through 1930, the journal that revolutionized medical publishing, now ready for AI.*
<p><br/></p>
提供机构:
Devin Media Corp.
创建时间:
2026-04-24
原始信息汇总
数据集概述:The Lancet Archive 1823-1930
基本信息
- 数据集名称:The Lancet Archive 1823-1930
- 提供商:Devin Media Corp.
- 数据集规模:851,796行(记录)
- 内容范围:覆盖从1823年创刊至1930年的《柳叶刀》完整档案,包含清晰、结构化的文本数据
- 数据格式:Snowflake 原生 JSONL 格式,包含字段:ISSUE、TITLE、AUTHOR、TYPE、TEXT
- 数据质量:已进行专业清洗、偏差审核,可直接用于AI训练
数据内容
- 基础性外科讲座与临床教学
- 19世纪医学原创研究
- 医院病例报告(St. Thomass、Middlesex、Guys 医院)
- 医学改革与政治评论
- 早期化学与药理学内容
适用场景
- 大型语言模型(LLM)微调:针对19世纪医学文本的领域特定微调
- 临床自然语言处理(NLP)与手术术语提取
- 医学史研究与数字人文
- 医学教育与课程开发
- 检索增强生成(RAG)系统:基于1823–1930年原始医学文本,用于临床决策支持、医学史研究及教育工具
业务需求
- 机器学习:在85.1万行精选文本上训练、微调和部署模型
- 真实世界数据(RWD):利用历史病例记录进行研究和分析
- 生命科学商业化:支持基于历史文献的医学研究
- 检索增强生成(RAG):构建可检索并引用原始医学文本的系统
更新频率
- 年度更新
法律条款
- 标准条款
交付方式
- 安全共享
分类标签
- AI & ML
- Life Sciences Commercialization
- Machine Learning
- Real World Data (RWD)
联系方式
- 销售与支持邮箱:hello@devinmediacorp.com
提供商简介
Devin Media Corp. 专注于提供高质量历史数据用于AI训练,数据集经过专业OCR、清洗、溯源和偏差审核,格式为JSONL,通过安全API交付,涵盖医学、金融、时尚、法律、文化等领域。



