KaiserML/Techie_Filtered_Meta
收藏Hugging Face2023-11-18 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/KaiserML/Techie_Filtered_Meta
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: lens_id
dtype: string
- name: title
dtype: string
- name: publication_type
dtype: string
- name: year_published
dtype: float32
- name: date_published
dtype: string
- name: date_published_parts
sequence: int64
- name: created
dtype: string
- name: external_ids
list:
- name: type
dtype: string
- name: value
dtype: string
- name: open_access
struct:
- name: colour
dtype: string
- name: license
dtype: string
- name: authors
list:
- name: affiliations
list:
- name: country_code
dtype: string
- name: grid_id
dtype: string
- name: ids
list:
- name: type
dtype: string
- name: value
dtype: string
- name: name
dtype: string
- name: name_original
dtype: string
- name: collective_name
dtype: string
- name: first_name
dtype: string
- name: ids
list:
- name: type
dtype: string
- name: value
dtype: string
- name: initials
dtype: string
- name: last_name
dtype: string
- name: source
struct:
- name: asjc_codes
sequence: string
- name: asjc_subjects
sequence: string
- name: country
dtype: string
- name: issn
list:
- name: type
dtype: string
- name: value
dtype: string
- name: publisher
dtype: string
- name: title
dtype: string
- name: type
dtype: string
- name: fields_of_study
sequence: string
- name: languages
sequence: string
- name: start_page
dtype: string
- name: end_page
dtype: string
- name: author_count
dtype: float64
- name: is_open_access
dtype: bool
- name: source_urls
list:
- name: type
dtype: string
- name: url
dtype: string
- name: abstract
dtype: string
- name: references
list:
- name: lens_id
dtype: string
- name: references_count
dtype: float64
- name: scholarly_citations_count
dtype: float64
- name: scholarly_citations
sequence: string
- name: patent_citations
list:
- name: lens_id
dtype: string
- name: patent_citations_count
dtype: float64
- name: issue
dtype: string
- name: publication_supplementary_type
sequence: string
- name: volume
dtype: string
- name: conference
struct:
- name: instance
dtype: string
- name: location
dtype: string
- name: name
dtype: string
- name: mesh_terms
list:
- name: mesh_heading
dtype: string
- name: mesh_id
dtype: string
- name: qualifier_id
dtype: string
- name: qualifier_name
dtype: string
- name: chemicals
list:
- name: mesh_id
dtype: string
- name: registry_number
dtype: string
- name: substance_name
dtype: string
- name: keywords
sequence: string
- name: funding
list:
- name: country
dtype: string
- name: funding_id
dtype: string
- name: org
dtype: string
- name: clinical_trials
list:
- name: id
dtype: string
- name: registry
dtype: string
- name: pdf_urls
sequence: string
- name: domain
sequence: string
splits:
- name: train
num_bytes: 726171677.8835785
num_examples: 423860
download_size: 619298847
dataset_size: 726171677.8835785
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
KaiserML
原始信息汇总
数据集特征
- lens_id: 字符串类型
- title: 字符串类型
- publication_type: 字符串类型
- year_published: 浮点数类型
- date_published: 字符串类型
- date_published_parts: 整数序列类型
- created: 字符串类型
- external_ids: 列表类型,包含:
- type: 字符串类型
- value: 字符串类型
- open_access: 结构类型,包含:
- colour: 字符串类型
- license: 字符串类型
- authors: 列表类型,包含:
- affiliations: 列表类型,包含:
- country_code: 字符串类型
- grid_id: 字符串类型
- ids: 列表类型,包含:
- type: 字符串类型
- value: 字符串类型
- name: 字符串类型
- name_original: 字符串类型
- collective_name: 字符串类型
- first_name: 字符串类型
- ids: 列表类型,包含:
- type: 字符串类型
- value: 字符串类型
- initials: 字符串类型
- last_name: 字符串类型
- affiliations: 列表类型,包含:
- source: 结构类型,包含:
- asjc_codes: 字符串序列类型
- asjc_subjects: 字符串序列类型
- country: 字符串类型
- issn: 列表类型,包含:
- type: 字符串类型
- value: 字符串类型
- publisher: 字符串类型
- title: 字符串类型
- type: 字符串类型
- fields_of_study: 字符串序列类型
- languages: 字符串序列类型
- start_page: 字符串类型
- end_page: 字符串类型
- author_count: 浮点数类型
- is_open_access: 布尔类型
- source_urls: 列表类型,包含:
- type: 字符串类型
- url: 字符串类型
- abstract: 字符串类型
- references: 列表类型,包含:
- lens_id: 字符串类型
- references_count: 浮点数类型
- scholarly_citations_count: 浮点数类型
- scholarly_citations: 字符串序列类型
- patent_citations: 列表类型,包含:
- lens_id: 字符串类型
- patent_citations_count: 浮点数类型
- issue: 字符串类型
- publication_supplementary_type: 字符串序列类型
- volume: 字符串类型
- conference: 结构类型,包含:
- instance: 字符串类型
- location: 字符串类型
- name: 字符串类型
- mesh_terms: 列表类型,包含:
- mesh_heading: 字符串类型
- mesh_id: 字符串类型
- qualifier_id: 字符串类型
- qualifier_name: 字符串类型
- chemicals: 列表类型,包含:
- mesh_id: 字符串类型
- registry_number: 字符串类型
- substance_name: 字符串类型
- keywords: 字符串序列类型
- funding: 列表类型,包含:
- country: 字符串类型
- funding_id: 字符串类型
- org: 字符串类型
- clinical_trials: 列表类型,包含:
- id: 字符串类型
- registry: 字符串类型
- pdf_urls: 字符串序列类型
- domain: 字符串序列类型
数据集分割
- train: 包含423860个样本,占用726171677.8835785字节
数据集大小
- 下载大小: 619298847字节
- 数据集大小: 726171677.8835785字节
配置
- default: 包含训练数据,路径为
data/train-*



