five

Pile-USPTO

收藏
魔搭社区2024-08-30 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/Pile-USPTO
下载链接
链接失效反馈
官方服务:
资源简介:
displayName: Pile-USPTO license: - MIT taskTypes: - Natural Language Generation - Language Modelling mediaTypes: - Text labelTypes: - English Corpus tags: [] publisher: - EleutherAI publishDate: '2023-07-18' publishUrl: https://pile.eleuther.ai/ paperUrl: '' --- # 数据介绍 ## 简介 Pile-USPTO数据集是The Pile项目的一部分,用于语言模型的数据集。它是从USPTO(美国专利商标局)的背景文本数据中获取的。 该数据集包含了来自USPTO的专利背景文本,其中包括专利申请的描述、技术背景、相关文献等信息。这些文本通常包含了各种技术领域的专利信息,涵盖了多个行业和领域。 Pile-USPTO数据集的目的是为研究人员和开发者提供一个丰富的专利文本资源,以便用于自然语言处理、信息检索、知识图谱构建等应用的开发和训练。 ## 数据内容 ### 数据说明 Pile-USPTO数据集涵盖了18.9G的数据。 ### 数据示例 ``` { "id": "208944418", "source_id": "", "doc_id": "84712049", "data_type": "text", "data_source": "pile", "data_url": "enwiki-c4-pile-ccnews", "content": "This invention relates to disposable garments and, more particularly, to disposable hooded capes, ponchos and shirts which can be readily dispensed from a roll or a box, e.g., by pulling one unit away from another along pre-scored separating lines.\nSuch disposable and dispensable garments are well known in the art, being made from thin webs of paper or plastic materials. Such known garments include aprons, bibs, neck towels, barber sheets, cap-like head coverings, and even rain garments with arms and legs. These garments are generally formed in a single layer of material, sometimes folded over on itself, or in a two-layer shell formed either from a flattened tube of material or from two layers sealed along opposite outside edges. Scored or perforated lines are rendable to separate portions of the material to form openings and tie straps. In the case of the rain garment referred to above, a two-layered shell of material is sealed along both sides of a plurality of scored lines, permitting the shell to separate along the scored lines to form the desired arms and legs of the garment.\n", "remark": { "pile_set_name": "USPTO Backgrounds" }, "sub_path": "uspto-backgrounds/train" } ``` ## 引文 ``` @misc{conghui2022opendatalab, title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets}, author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin}, journal={https://opendatalab.com/}, year={2022} } ``` ## Download dataset :modelscope-code[]{type="git"}

显示名称:Pile-USPTO 许可证:MIT 任务类型:自然语言生成(Natural Language Generation)、语言建模(Language Modelling) 媒体类型:文本 标签类型:英语语料库(English Corpus) 标签:无 发布者:EleutherAI 发布日期:2023-07-18 发布网址:https://pile.eleuther.ai/ 论文网址:无 --- # 数据介绍 ## 简介 Pile-USPTO数据集是The Pile项目的组成部分,为面向语言模型的专用数据集。该数据集源自USPTO(美国专利商标局,United States Patent and Trademark Office)的背景文本数据。 本数据集收录了USPTO的专利背景文本,涵盖专利申请说明、技术背景、关联文献等内容,文本涉及多技术领域、多行业方向的专利信息。 Pile-USPTO数据集旨在为研究者与开发者提供丰富的专利文本资源,可用于自然语言处理、信息检索、知识图谱构建等场景的开发与模型训练。 ## 数据内容 ### 数据说明 Pile-USPTO数据集的数据规模达18.9GB。 ### 数据示例 { "id": "208944418", "source_id": "", "doc_id": "84712049", "data_type": "text", "data_source": "pile", "data_url": "enwiki-c4-pile-ccnews", "content": "This invention relates to disposable garments and, more particularly, to disposable hooded capes, ponchos and shirts which can be readily dispensed from a roll or a box, e.g., by pulling one unit away from another along pre-scored separating lines. Such disposable and dispensable garments are well known in the art, being made from thin webs of paper or plastic materials. Such known garments include aprons, bibs, neck towels, barber sheets, cap-like head coverings, and even rain garments with arms and legs. These garments are generally formed in a single layer of material, sometimes folded over on itself, or in a two-layer shell formed either from a flattened tube of material or from two layers sealed along opposite outside edges. Scored or perforated lines are rendable to separate portions of the material to form openings and tie straps. In the case of the rain garment referred to above, a two-layered shell of material is sealed along both sides of a plurality of scored lines, permitting the shell to separate along the scored lines to form the desired arms and legs of the garment. ", "remark": { "pile_set_name": "USPTO Backgrounds" }, "sub_path": "uspto-backgrounds/train" } ## 引文 @misc{conghui2022opendatalab, title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets}, author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin}, journal={https://opendatalab.com/}, year={2022} } ## 下载数据集 :modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作