---
license: apache-2.0
task_categories:
- zero-shot-classification
- summarization
- text-generation
tags:
- news
- campus
size_categories:
- 100K<n<1M
language:
- zh
---
一些高校校园新闻,约 65k * 3(类任务) 条,**稍微做了一点点脱敏,尽可能地遮盖了作者名等**。数据已经整理成了指令的形式,格式如下:
```json
{
"id": <id>,
"category": "(title_summarize|news_classify|news_generate)",
"instruction": <对应的具体指令>,
"input": <空>,
"output": <指令对应的输出>
}
```
总共三类任务:**标题总结**、**栏目分类**、**新闻生成**,本质上是利用新闻元数据中的标题、栏目、内容排列组合生成的,所以可以保证数据完全准确。每个字段内容已经整理成了**单行**的格式。下面是三类任务的样例:
```json
// 标题总结
{
"id": 22106,
"category": "title_summarize",
"instruction": "请你给下面的新闻取一则标题:\n点击图片观看视频 科技创新潜质人才是海院自主选拔的一种新尝试,根据规定,凡在高中阶段获得省级科技创新大赛一等奖或以上的应届高中毕业生,都可申请报名。今年苏浙沪三地共有300多名学生,申请以科技创新潜质人才身份入学海城海院。经过专家甄选,今天上午,有82名学生参加了考试。 上午的面试不考语数外,考生要靠自己的科技成果来证明自己是具有科技创新潜质的人才。记者发现,学生们带来的创新成果五花八门,涉及生物、微电子等多种前沿学科。经过上午的筛选,下午只剩下一半的学生进入了特色测试部分,这部分仍不考语数外,重点考察学生的发散性思维,及兴趣爱好和潜质,海院将根据测试结果,最终确定获得科技创新潜质人才的自主选拔认定名单,进入名单的考生,高考成绩只要达到当地的一本线,就能被海院录取。",
"input": "",
"output": "该新闻的标题可以是:《不考语数外 海院自主选拔创新潜质人才》"
}
```
```json
// 栏目分类
{
"id": 134781,
"category": "news_classify",
"instruction": "根据下面的新闻,归纳出它的分类:\n## 英国格拉斯哥大学医学院副院长访问海城海院生命学院\n\n11月20日,英国格拉斯哥大学(University of Glasgow,以下简称格大)医学院副院长Chris Loughrey一行访问海城海院生命科学技术学院,生命学院副院长张晓君出席会议,魏芳、王晓雷、徐沁等多位教师与会参加交流。格大和海城海院同为国际大学交流组织Universitas 21的成员,在生命和医学领域合作紧密。双方互相介绍了两校在生命科学领域的学科专业、课程体系、学生培养等发展情况。在相互交流中,双方希望在这个宏观的合作框架下,逐步增进相互了解,在人才培养、科学研究等方面形成更多更实质性的合作。格拉斯哥大学是英联邦地区建校历史最悠久的四所大学之一。研究经费及教育质量均排名英国前十,出过7位诺奖获得者及3位英国首相。格大下设四大学院,其中医学院有教职员工2000余人,学生5000余人,医学院下设七个研究中心,涉及分子/细胞/系统生物学、植物、微生物、免疫、癌症、生物多样性、神经科学、心血管医学等领域。",
"input": "",
"output": "本则新闻的分类栏目是:综合新闻"
}
```
```json
// 新闻生成
{
"id": 61402,
"category": "news_generate",
"instruction": "帮我写一则综合新闻栏目的新闻,标题是机械与动力工程学院举办心理健康月系列活动之“与压力一路同行”",
"input": "",
"output": "《机械与动力工程学院举办心理健康月系列活动之“与压力一路同行”》:\n\n5月27日,机械与动力工程学院心理健康月系列活动之“与压力一路同行”讲座在高田会堂举行。活动特邀海城铁路学院兼职心理咨询师李毅作为主讲嘉宾,与大家分享如何正确认识压力以及积极应对压力 。本次活动由机械与动力工程学院王睿瑞主持。相较于受到心理教育较为全面的本科生同学,研究生同学面临更多的现实问题,更加需要心理健康的知识,来实现压力与情绪的合理管理。李毅首先结合心理学知识,从生理学、心理学等多个角度带领同学们认识了压力的本质,并指出并非所有压力都是有害的,负性的慢性压力对人产生危害,合理规避有害的压力尤为重要,关键是找到自己的最佳压力水平。李毅进一步讲述了心理压力与认知、情绪、行为之间的关系。通过对“情绪的身体地图”的说明,使同学们更加深刻地体会到了“情绪丰富了体验,情绪述说着需求,情绪激发出行为,情绪联结了他人”的含义。最后,李毅从时间管理、简化生活、认知需求、活在当下、友善待人等方面给出了应对压力的建议,并寄语大家“与并不完美的自我共处,与压力一路同行”。机械与动力工程学院心理健康月系列活动已举办4年,覆盖2700余名研究生,力求以多样的活动形式帮助学生和导师了解更多的心理知识,引导大家关爱自己、关心他人。"
}
```
这里也给出数据的分位点(`instruction` + `output` 的长度),可按需修剪:
```
count 196101.000000
mean 1059.199912
std 571.623822
min 32.000000
0% 32.000000
5% 369.000000
10% 471.000000
15% 539.000000
20% 596.000000
25% 649.000000
30% 700.000000
35% 753.000000
40% 806.000000
45% 862.000000
50% 920.000000
55% 983.000000
60% 1053.000000
65% 1132.000000
70% 1225.000000
75% 1337.000000
80% 1473.000000
85% 1652.000000
90% 1898.000000
95% 2270.000000
max 3054.000000
```
license: apache-2.0
task_categories:
- zero-shot-classification
- summarization
- text-generation
tags:
- news
- campus
size_categories:
- 100K<n<1M
language:
- zh
This dataset consists of approximately 65,000 × 3 (task types) entries of university campus news, with minor de-identification processing to obscure author names and other sensitive information as much as possible. All data has been formatted into instruction-tuning style, following the schema below:
json
{
"id": <id>,
"category": "(title_summarize|news_classify|news_generate)",
"instruction": <corresponding specific instruction>,
"input": "",
"output": <output corresponding to the instruction>
}
There are three total task categories: **title summarization**, **news category classification**, and **news generation**. These tasks are essentially generated by permuting the title, category, and content from the news metadata, ensuring full accuracy of the dataset. All fields are formatted as single-line strings. Below are examples of the three task types:
json
// Title Summarization
{
"id": 22106,
"category": "title_summarize",
"instruction": "Please write a headline for the following news article:
Click the image to watch the video Talented students with innovative potential in science and technology are a new initiative for independent enrollment at Haicheng Maritime University. According to regulations, any high school graduate who has won first prize or higher in a provincial-level science and technology innovation competition during high school can apply for enrollment under this category. This year, more than 300 students from Jiangsu, Zhejiang, and Shanghai applied to enroll at Haicheng Maritime University as talented students with innovative potential in science and technology. After expert selection, 82 students took the exam this morning. The morning interview does not test Chinese, Math, or English; candidates must prove their innovative potential in science and technology through their own technological achievements. Reporters found that the innovative achievements brought by the students are diverse, covering multiple cutting-edge disciplines such as biology and microelectronics. After the morning screening, only half of the students remained for the special test in the afternoon, which also does not test Chinese, Math, or English, focusing on examining students' divergent thinking, interests, and potential. Haicheng Maritime University will finalize the list of students granted independent enrollment qualification for talented science and technology innovation talents based on the test results. Students on the list will be admitted to the university as long as their college entrance exam scores reach the local first-tier admission threshold.",
"input": "",
"output": "A suitable headline for this news article is: "Exempt from Chinese, Math, and English Exams: Haicheng Maritime University Selects Talents with Innovative Potential""
}
json
// News Category Classification
{
"id": 134781,
"category": "news_classify",
"instruction": "Summarize the category of the following news article:
## Vice Dean of the University of Glasgow's School of Medicine Visits the School of Life Sciences at Haicheng Maritime University
On November 20th, a delegation led by Chris Loughrey, Vice Dean of the University of Glasgow's School of Medicine, visited the School of Life Science and Technology at Haicheng Maritime University. Zhang Xiaojun, Vice Dean of the School of Life Sciences, attended the meeting, and several teachers including Wei Fang, Wang Xiaolei, and Xu Qin participated in the exchange. Both the University of Glasgow and Haicheng Maritime University are members of Universitas 21, an international university exchange organization, and have close cooperation in the fields of life sciences and medicine. The two sides introduced each other's development of disciplines, curriculum systems, and student training in the field of life sciences. During the exchange, both sides expressed hope to gradually enhance mutual understanding under this macro cooperation framework and form more substantial cooperation in talent training, scientific research, and other aspects. The University of Glasgow is one of the four oldest universities in the Commonwealth of Nations in terms of founding history. Its research funding and education quality rank among the top 10 in the UK, and it has produced 7 Nobel laureates and 3 British Prime Ministers. The university has four schools, among which the School of Medicine has more than 2,000 faculty and staff and more than 5,000 students. The School of Medicine has seven research centers covering molecular/cellular/systems biology, botany, microbiology, immunology, cancer, biodiversity, neuroscience, cardiovascular medicine, and other fields.",
"input": "",
"output": "The category of this news article is: Comprehensive News"
}
json
// News Generation
{
"id": 61402,
"category": "news_generate",
"instruction": "Help me write a news article for the Comprehensive News column with the headline "School of Mechanical and Power Engineering Holds 'Walk with Pressure' Series Activities for Mental Health Month"",
"input": "",
"output": ""School of Mechanical and Power Engineering Holds 'Walk with Pressure' Series Activities for Mental Health Month":
On May 27th, the "Walk with Pressure" lecture, a part of the Mental Health Month series activities held by the School of Mechanical and Power Engineering, was held at the Gaotian Auditorium. The activity invited Li Yi, a part-time psychological counselor from Haicheng Railway College, as the keynote speaker, to share with everyone how to correctly recognize stress and actively cope with it. This activity was hosted by Wang Ruirui from the School of Mechanical and Power Engineering. Compared with undergraduate students who have received relatively comprehensive psychological education, graduate students face more practical problems and are more in need of psychological knowledge to properly manage stress and emotions. Li Yi first combined psychological knowledge, leading students to understand the essence of stress from multiple perspectives such as physiology and psychology, and pointed out that not all stress is harmful; negative chronic stress is harmful, and it is particularly important to reasonably avoid harmful stress, and the key is to find one's own optimal stress level. Li Yi further explained the relationship between psychological stress, cognition, emotion, and behavior. Through the explanation of the "body map of emotions", students deeply understood the meaning of "emotions enrich experience, emotions express needs, emotions trigger behaviors, and emotions connect with others". Finally, Li Yi gave suggestions for coping with stress from aspects such as time management, simplifying life, cognitive needs, living in the moment, and being kind to others, and encouraged everyone to "get along with the imperfect self and walk with pressure". The Mental Health Month series activities held by the School of Mechanical and Power Engineering have been held for 4 years, covering more than 2,700 graduate students, aiming to help students and tutors learn more psychological knowledge through various activity forms, and guide everyone to care for themselves and others."
}
Here are the quantiles of the combined length of `instruction` + `output` for optional trimming:
count 196101.000000
mean 1059.199912
std 571.623822
min 32.000000
0% 32.000000
5% 369.000000
10% 471.000000
15% 539.000000
20% 596.000000
25% 649.000000
30% 700.000000
35% 753.000000
40% 806.000000
45% 862.000000
50% 920.000000
55% 983.000000
60% 1053.000000
65% 1132.000000
70% 1225.000000
75% 1337.000000
80% 1473.000000
85% 1652.000000
90% 1898.000000
95% 2270.000000
max 3054.000000