five

evekhm/cms_iom_3000

收藏
Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/evekhm/cms_iom_3000
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - expert-generated language: - en license: - apache-2.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - https://www.cms.gov/medicare-coverage-database - >- https://www.cms.gov/medicare/regulations-guidance/manuals/internet-only-manuals-ioms pretty_name: CMS Coverage Documents Dataset language_bcp47: - en-US tags: - CMS - Medicare - NCD - LCD --- ## Dataset Summary This dataset contains CMS information with local and national coverage document data sets (LCD & NCD), as Coverage Articles and [Internet-Only Manuals (IOMs)(https://www.cms.gov/medicare/regulations-guidance/manuals/internet-only-manuals-ioms) A list of Current LCDS, NCDs and Articles is obrained from [Medicare Coverage Database](https://www.cms.gov/medicare-coverage-database/downloads/downloads.aspx). The data itself was obtainted by scrapping the urls and extracting data from the pdf files listed in [current articles](https://downloads.cms.gov/medicare-coverage-database/downloads/exports/current_article.zip) and [current lcds](https://downloads.cms.gov/medicare-coverage-database/downloads/exports/current_lcd.zip) ## CMS Dataset with Medicare Regulations Guidance Manuals Data is chunked using `text-embedding-3-large` model tokenizer with chunk size=3000 and overlap=300 applying `langchain.text_splitter.RecursiveCharacterTextSplitter` ## Licensing Information Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ## Author @evekhm Eva Khmelinskaya
提供机构:
evekhm
原始信息汇总

数据集概述

基本信息

  • 名称: CMS Coverage Documents Dataset
  • 语言: 英语(en-US)
  • 许可证: Apache-2.0
  • 多语言性: 单语种
  • 大小: 10K<n<100K

数据来源

  • 源数据集:
    • Medicare Coverage Database (https://www.cms.gov/medicare-coverage-database)
    • Internet-Only Manuals (IOMs) (https://www.cms.gov/medicare/regulations-guidance/manuals/internet-only-manuals-ioms)

数据内容

  • 包含: 本地和国家覆盖文档数据集(LCD & NCD),覆盖文章和Internet-Only Manuals (IOMs)
  • 获取方式: 通过抓取URL和从PDF文件中提取数据

数据处理

  • 分块方式: 使用text-embedding-3-large模型,分块大小为3000,重叠300,应用langchain.text_splitter.RecursiveCharacterTextSplitter

标签

  • CMS
  • Medicare
  • NCD
  • LCD
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作