evekhm/cms_iom_3000
收藏Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/evekhm/cms_iom_3000
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- expert-generated
language:
- en
license:
- apache-2.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- https://www.cms.gov/medicare-coverage-database
- >-
https://www.cms.gov/medicare/regulations-guidance/manuals/internet-only-manuals-ioms
pretty_name: CMS Coverage Documents Dataset
language_bcp47:
- en-US
tags:
- CMS
- Medicare
- NCD
- LCD
---
## Dataset Summary
This dataset contains CMS information with local and national coverage document data sets (LCD & NCD),
as Coverage Articles and [Internet-Only Manuals (IOMs)(https://www.cms.gov/medicare/regulations-guidance/manuals/internet-only-manuals-ioms)
A list of Current LCDS, NCDs and Articles is obrained from [Medicare Coverage Database](https://www.cms.gov/medicare-coverage-database/downloads/downloads.aspx).
The data itself was obtainted by scrapping the urls and extracting data from the pdf files listed in [current articles](https://downloads.cms.gov/medicare-coverage-database/downloads/exports/current_article.zip) and [current lcds](https://downloads.cms.gov/medicare-coverage-database/downloads/exports/current_lcd.zip)
## CMS Dataset with Medicare Regulations Guidance Manuals
Data is chunked using `text-embedding-3-large` model tokenizer with chunk size=3000 and overlap=300 applying `langchain.text_splitter.RecursiveCharacterTextSplitter`
## Licensing Information
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
## Author
@evekhm
Eva Khmelinskaya
提供机构:
evekhm
原始信息汇总
数据集概述
基本信息
- 名称: CMS Coverage Documents Dataset
- 语言: 英语(en-US)
- 许可证: Apache-2.0
- 多语言性: 单语种
- 大小: 10K<n<100K
数据来源
- 源数据集:
- Medicare Coverage Database (https://www.cms.gov/medicare-coverage-database)
- Internet-Only Manuals (IOMs) (https://www.cms.gov/medicare/regulations-guidance/manuals/internet-only-manuals-ioms)
数据内容
- 包含: 本地和国家覆盖文档数据集(LCD & NCD),覆盖文章和Internet-Only Manuals (IOMs)
- 获取方式: 通过抓取URL和从PDF文件中提取数据
数据处理
- 分块方式: 使用
text-embedding-3-large模型,分块大小为3000,重叠300,应用langchain.text_splitter.RecursiveCharacterTextSplitter
标签
- CMS
- Medicare
- NCD
- LCD



