machelreid/m2d2
收藏Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/machelreid/m2d2
下载链接
链接失效反馈官方服务:
资源简介:
M2D2是一个大规模多领域的语言建模数据集,涵盖了文化、艺术、健康、历史、数学、自然科学、哲学、宗教、社会科学、技术等多个领域。该数据集旨在为多领域语言建模任务提供丰富的文本数据。
提供机构:
machelreid
原始信息汇总
数据集概述
数据集名称
- M2D2: A Massively Multi-domain Language Modeling Dataset
数据集来源
- 来自论文 "M2D2: A Massively Multi-domain Language Modeling Dataset",作者 Reid 等,发表于 EMNLP 2022。
数据集加载方法
python import datasets
dataset = datasets.load_dataset("machelreid/m2d2", "cs.CL") # 可替换为其他领域
print(dataset[train][0][text])
数据集领域
- Culture_and_the_arts
- Culture_and_the_arts__Culture_and_Humanities
- Culture_and_the_arts__Games_and_Toys
- Culture_and_the_arts__Mass_media
- Culture_and_the_arts__Performing_arts
- Culture_and_the_arts__Sports_and_Recreation
- Culture_and_the_arts__The_arts_and_Entertainment
- Culture_and_the_arts__Visual_arts
- General_referece
- General_referece__Further_research_tools_and_topics
- General_referece__Reference_works
- Health_and_fitness
- Health_and_fitness__Exercise
- Health_and_fitness__Health_science
- Health_and_fitness__Human_medicine
- Health_and_fitness__Nutrition
- Health_and_fitness__Public_health
- Health_and_fitness__Self_care
- History_and_events
- History_and_events__By_continent
- History_and_events__By_period
- History_and_events__By_region
- Human_activites
- Human_activites__Human_activities
- Human_activites__Impact_of_human_activity
- Mathematics_and_logic
- Mathematics_and_logic__Fields_of_mathematics
- Mathematics_and_logic__Logic
- Mathematics_and_logic__Mathematics
- Natural_and_physical_sciences
- Natural_and_physical_sciences__Biology
- Natural_and_physical_sciences__Earth_sciences
- Natural_and_physical_sciences__Nature
- Natural_and_physical_sciences__Physical_sciences
- Philosophy
- Philosophy_and_thinking
- Philosophy_and_thinking__Philosophy
- Philosophy_and_thinking__Thinking
- Religion_and_belief_systems
- Religion_and_belief_systems__Allah
- Religion_and_belief_systems__Belief_systems
- Religion_and_belief_systems__Major_beliefs_of_the_world
- Society_and_social_sciences
- Society_and_social_sciences__Social_sciences
- Society_and_social_sciences__Society
- Technology_and_applied_sciences
- Technology_and_applied_sciences__Agriculture
- Technology_and_applied_sciences__Computing
- Technology_and_applied_sciences__Engineering
- Technology_and_applied_sciences__Transport
许可证
- cc-by-nc-4.0
引用信息
bib @article{reid2022m2d2, title = {M2D2: A Massively Multi-domain Language Modeling Dataset}, author = {Machel Reid and Victor Zhong and Suchin Gururangan and Luke Zettlemoyer}, year = {2022}, journal = {arXiv preprint arXiv: Arxiv-2210.07370} }



