five

machelreid/m2d2

收藏
Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/machelreid/m2d2
下载链接
链接失效反馈
官方服务:
资源简介:
M2D2是一个大规模多领域的语言建模数据集,涵盖了文化、艺术、健康、历史、数学、自然科学、哲学、宗教、社会科学、技术等多个领域。该数据集旨在为多领域语言建模任务提供丰富的文本数据。
提供机构:
machelreid
原始信息汇总

数据集概述

数据集名称

  • M2D2: A Massively Multi-domain Language Modeling Dataset

数据集来源

数据集加载方法

python import datasets

dataset = datasets.load_dataset("machelreid/m2d2", "cs.CL") # 可替换为其他领域

print(dataset[train][0][text])

数据集领域

  • Culture_and_the_arts
    • Culture_and_the_arts__Culture_and_Humanities
    • Culture_and_the_arts__Games_and_Toys
    • Culture_and_the_arts__Mass_media
    • Culture_and_the_arts__Performing_arts
    • Culture_and_the_arts__Sports_and_Recreation
    • Culture_and_the_arts__The_arts_and_Entertainment
    • Culture_and_the_arts__Visual_arts
  • General_referece
    • General_referece__Further_research_tools_and_topics
    • General_referece__Reference_works
  • Health_and_fitness
    • Health_and_fitness__Exercise
    • Health_and_fitness__Health_science
    • Health_and_fitness__Human_medicine
    • Health_and_fitness__Nutrition
    • Health_and_fitness__Public_health
    • Health_and_fitness__Self_care
  • History_and_events
    • History_and_events__By_continent
    • History_and_events__By_period
    • History_and_events__By_region
  • Human_activites
    • Human_activites__Human_activities
    • Human_activites__Impact_of_human_activity
  • Mathematics_and_logic
    • Mathematics_and_logic__Fields_of_mathematics
    • Mathematics_and_logic__Logic
    • Mathematics_and_logic__Mathematics
  • Natural_and_physical_sciences
    • Natural_and_physical_sciences__Biology
    • Natural_and_physical_sciences__Earth_sciences
    • Natural_and_physical_sciences__Nature
    • Natural_and_physical_sciences__Physical_sciences
  • Philosophy
  • Philosophy_and_thinking
    • Philosophy_and_thinking__Philosophy
    • Philosophy_and_thinking__Thinking
  • Religion_and_belief_systems
    • Religion_and_belief_systems__Allah
    • Religion_and_belief_systems__Belief_systems
    • Religion_and_belief_systems__Major_beliefs_of_the_world
  • Society_and_social_sciences
    • Society_and_social_sciences__Social_sciences
    • Society_and_social_sciences__Society
  • Technology_and_applied_sciences
    • Technology_and_applied_sciences__Agriculture
    • Technology_and_applied_sciences__Computing
    • Technology_and_applied_sciences__Engineering
    • Technology_and_applied_sciences__Transport

许可证

  • cc-by-nc-4.0

引用信息

bib @article{reid2022m2d2, title = {M2D2: A Massively Multi-domain Language Modeling Dataset}, author = {Machel Reid and Victor Zhong and Suchin Gururangan and Luke Zettlemoyer}, year = {2022}, journal = {arXiv preprint arXiv: Arxiv-2210.07370} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作