five

HUST-Student-Handbook2

收藏
魔搭社区2025-11-27 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/mars1441/HUST-Student-Handbook2
下载链接
链接失效反馈
官方服务:
资源简介:
数据集文件元信息以及数据文件,请浏览“数据集文件”页面获取。 #### 下载方法 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} <div id="top" align="center"> # 华中科技大学2024学生手册-LORA微调问答数据集 </div> [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ## 数据集概述 - **数据集名称**:华中科技大学2024学生手册-LORA微调问答数据集 - **版本**:1.0 - **许可证**:[Apache License 2.0](LICENSE) ## 数据集介绍 本数据集是基于华中科技大学学生手册内容构建的问答对语料库,专为大语言模型(LLM)的LORA微调设计。数据集包含了大量与华中科技大学学生学习、生活、管理规定等相关的问答对,涵盖了从本科生到研究生的各类学习和管理规定。 数据集旨在帮助大语言模型更好地理解和回答与华中科技大学学生手册相关的问题,提高模型在特定领域的问答能力和准确性。 ## 数据集特点 - **领域专注性**:专注于高校学生管理和学习规定领域,特别是华中科技大学的具体规章制度 - **问答格式**:采用用户提问-助手回答的对话格式,符合大语言模型的训练需求 - **内容全面**:涵盖学位授予、学籍管理、培养方案、学术规范、国际交流等多个方面 - **结构化回答**:回答内容结构清晰,便于模型学习如何组织和呈现信息 - **真实场景**:基于学生实际需求构建问题,贴近真实应用场景 ## 数据集元信息 - **任务类型**:对话/问答 - **应用场景**:教育领域智能问答、校园服务助手、学生事务咨询 - **数据格式**:JSONL - **数据量**:700+组问答对 - **语言**:中文 - **更新日期**:2024年11月 ## 数据集结构 数据集采用JSONL格式存储,每行是一个独立的JSON对象,包含一组对话。每组对话由用户问题和助手回答组成。 ### 数据格式 ```json { "conversations": [ { "role": "user", "content": "用户提问内容" }, { "role": "assistant", "content": "助手回答内容" } ] } ``` ### 数据统计 - 数据集条目数:700+组问答对 - 文件格式:JSONL(JSON Lines) - 文件大小:约2MB ## 数据集内容范围 数据集涵盖了以下主要内容领域: 1. **学位授予**:学位申请条件、学位论文要求、答辩流程等 2. **学籍管理**:入学、注册、休学、退学、转专业等规定 3. **培养方案**:课程设置、学分要求、培养环节等 4. **学术规范**:学术道德、学术不端行为处理等 5. **国际交流**:出国(境)交流、国际学术交流基金等 6. **奖惩制度**:奖学金、处分条例等 7. **校园生活**:宿舍管理、校园安全等 ## 使用方法 ### 数据加载 #### 本地加载 ```python # 使用Python加载数据集示例 import json # 加载数据集 data = [] with open('lora_hust_student_handbookt.jsonl', 'r', encoding='utf-8') as f: for line in f: data.append(json.loads(line)) # 查看数据集大小 print(f"数据集包含 {len(data)} 组问答对") # 查看第一个问答对 print("用户问题:", data[0]["conversations"][0]["content"]) print("助手回答:", data[0]["conversations"][1]["content"]) ``` #### 通过魔搭社区加载 ```python # 使用ModelScope加载数据集 from modelscope.msdatasets import MsDataset # 加载数据集(请替换为实际的数据集ID和命名空间) dataset = MsDataset.load('HUST-Student-Handbook', namespace='your-namespace') # 查看数据集信息 print(dataset.keys()) # 转换为训练格式 train_dataset = dataset['train'] print(f"数据集包含 {len(train_dataset)} 组问答对") ``` ### 用于LORA微调 本数据集适用于大语言模型的LORA微调,可以帮助模型更好地理解和回答与华中科技大学学生手册相关的问题。 ```python # 微调数据准备示例代码 def prepare_training_data(data): # 转换为训练格式 training_data = [] for item in data: conversations = item["conversations"] training_data.append({ "input": conversations[0]["content"], "output": conversations[1]["content"] }) return training_data # 准备训练数据 training_data = prepare_training_data(data) ``` ### 在魔搭社区中使用 本数据集已上传至魔搭社区,您可以直接在魔搭社区的模型训练和推理中使用: 1. 访问魔搭社区数据集页面 2. 搜索「华中科技大学学生手册LORA微调问答数据集」 3. 点击「使用数据集」按钮 4. 在模型训练或推理任务中选择本数据集 ## 数据集示例 以下是数据集中的几个示例问答对: **示例1:学位论文要求** ``` 问:学位论文的摘要应包含哪些基本要素? 答:学位论文摘要应包含以下基本要素: 1. 目的:阐明研究的前提、目的和任务,明确研究涉及的主题范围。 2. 方法:详细描述研究采用的理论基础、实验条件、研究对象等。 3. 结果:系统呈现通过实验、研究或调查获得的具体数据、确定的规律关系等。 4. 结论:对研究结果进行深入分析、比较和评价,指出成果的实际应用价值。 5. 其他重要信息:包括虽非研究主要目标,但具有重要学术价值的附加发现和见解。 ``` **示例2:国际交流** ``` 问:申请短期出国研修项目需要提交哪些材料? 答:申请短期出国研修项目需要提交的材料包括: 1. 《华中科技大学研究生短期研修项目申请表》:需通过研究生培养系统填写并提交。 2. 国外邀请信:需提供拟出访国的著名大学或著名研发机构出具的正式邀请信。 3. 研修计划:需提交具体可行的研修计划,明确出访期间的学习或科研安排。 4. 其他相关证明材料:根据项目要求,可能还需提供其他支持性材料,如外语水平证明等。 ``` **示例3:学术规范** ``` 问:匿名举报学术不端行为在什么情况下可以被受理? 答:匿名举报学术不端行为在以下情况下可以被受理: 1. 事实清楚:举报内容所描述的学术不端行为事实明确,不存在模糊或矛盾之处。 2. 证据充分:举报提供了能够证明学术不端行为的确凿证据,无需进一步补充即可支撑调查。 3. 线索明确:即使证据尚未完全充分,但提供的调查线索清晰具体,足以引导后续核查工作。 ``` ## 许可证 本数据集采用 [Apache License 2.0](LICENSE) 许可证。 ## 贡献 欢迎对数据集进行改进和扩展,可以通过以下方式贡献: 1. 提交问题报告 2. 提交新的问答对 3. 改进现有问答对的质量 ## 引用 如果您在研究或项目中使用了本数据集,请按以下格式引用: ``` @dataset{hust_student_handbook_2024, title={华中科技大学学生手册LORA微调问答数据集}, author={HUST-CSDC-Team}, year={2024}, howpublished={{https://www.modelscope.cn/datasets/alleyf/HUST-Student-Handbook}}, home={{https://github.com/Alleyf}}, } ``` ## 联系方式 如有任何问题或建议,请通过以下方式联系我们: - ModelScope Issues: [提交问题](https://www.modelscope.cn/datasets/alleyf/HUST-Student-Handbook/feedback) - 电子邮件: csfan@hust.edu.cn

Dataset file metadata and data files are available on the "Dataset Files" page. #### Download Methods :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} <div id="top" align="center"> # Huazhong University of Science and Technology 2024 Student Handbook - LORA Fine-tuning QA Dataset </div> [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ## Dataset Overview - **Dataset Name**: Huazhong University of Science and Technology 2024 Student Handbook - LORA Fine-tuning QA Dataset - **Version**: 1.0 - **License**: [Apache License 2.0](LICENSE) ## Dataset Introduction This dataset is a QA pair corpus constructed based on the content of the Huazhong University of Science and Technology (HUST) Student Handbook, specifically designed for LORA fine-tuning of Large Language Models (LLMs). It contains numerous QA pairs related to student study, daily life, and administrative regulations at HUST, covering various study and management rules for both undergraduate and postgraduate students. The dataset aims to help LLMs better understand and answer questions related to the HUST Student Handbook, improving the model's QA performance and accuracy in this specific domain. ## Dataset Characteristics - **Domain Focus**: Specializes in the field of university student management and study regulations, particularly the specific rules and regulations of HUST - **QA Format**: Adopts the user question - assistant answer dialogue format, which meets the training requirements of LLMs - **Comprehensive Content**: Covers multiple aspects including degree awarding, student status management, training plans, academic norms, international exchanges, etc. - **Structured Answers**: The answer content has a clear structure, facilitating the model to learn how to organize and present information - **Real-world Scenarios**: Questions are constructed based on actual student needs, close to real application scenarios ## Dataset Metadata - **Task Type**: Dialogue/QA - **Application Scenarios**: Intelligent QA in education, campus service assistant, student affairs consultation - **Data Format**: JSONL - **Data Scale**: Over 700 QA pairs - **Language**: Chinese - **Update Date**: November 2024 ## Dataset Structure The dataset is stored in JSONL format, with each line being an independent JSON object containing a single dialogue. Each dialogue consists of a user question and an assistant answer. ### Data Format json { "conversations": [ { "role": "user", "content": "User's question content" }, { "role": "assistant", "content": "Assistant's answer content" } ] } ### Data Statistics - **Dataset Entries**: Over 700 QA pairs - **File Format**: JSONL (JSON Lines) - **File Size**: Approximately 2MB ## Dataset Content Scope The dataset covers the following main content areas: 1. **Degree Awarding**: Degree application requirements, dissertation requirements, defense procedures, etc. 2. **Student Status Management**: Enrollment, registration, suspension of studies, withdrawal, major transfer, etc. 3. **Training Plans**: Course setting, credit requirements, training links, etc. 4. **Academic Norms**: Academic ethics, handling of academic misconduct, etc. 5. **International Exchanges**: Overseas (cross-border) exchanges, international academic exchange funds, etc. 6. **Rewards and Punishments System**: Scholarships, disciplinary regulations, etc. 7. **Campus Life**: Dormitory management, campus safety, etc. ## Usage Methods ### Data Loading #### Local Loading python # Example of loading dataset with Python import json # Load dataset data = [] with open('lora_hust_student_handbookt.jsonl', 'r', encoding='utf-8') as f: for line in f: data.append(json.loads(line)) # Check dataset size print(f"Dataset contains {len(data)} QA pairs") # View the first QA pair print("User Question:", data[0]["conversations"][0]["content"]) print("Assistant Answer:", data[0]["conversations"][1]["content"]) #### Loading via ModelScope python # Load dataset with ModelScope from modelscope.msdatasets import MsDataset # Load dataset (please replace with actual dataset ID and namespace) dataset = MsDataset.load('HUST-Student-Handbook', namespace='your-namespace') # View dataset information print(dataset.keys()) # Convert to training format train_dataset = dataset['train'] print(f"Dataset contains {len(train_dataset)} QA pairs") ### For LORA Fine-tuning This dataset is suitable for LORA fine-tuning of LLMs, helping the model better understand and answer questions related to the HUST Student Handbook. python # Example code for preparing training data def prepare_training_data(data): # Convert to training format training_data = [] for item in data: conversations = item["conversations"] training_data.append({ "input": conversations[0]["content"], "output": conversations[1]["content"] }) return training_data # Prepare training data training_data = prepare_training_data(data) ### Usage on ModelScope This dataset has been uploaded to ModelScope, and you can directly use it in model training and inference on ModelScope: 1. Visit the ModelScope dataset page 2. Search for "Huazhong University of Science and Technology Student Handbook LORA Fine-tuning QA Dataset" 3. Click the "Use Dataset" button 4. Select this dataset for model training or inference tasks ## Dataset Examples Here are several sample QA pairs from the dataset: **Example 1: Dissertation Requirements** Q: What basic elements should be included in the abstract of a dissertation? A: The abstract of a dissertation should include the following basic elements: 1. Purpose: Clarify the premise, purpose and task of the research, and define the scope of the research subject. 2. Method: Describe in detail the theoretical basis, experimental conditions, research objects, etc. used in the research. 3. Results: Systematically present specific data, determined laws and relationships, etc. obtained through experiments, research or surveys. 4. Conclusion: Conduct in-depth analysis, comparison and evaluation of the research results, and point out the practical application value of the results. 5. Other important information: Include additional discoveries and insights with important academic value that are not the main research objectives. **Example 2: International Exchanges** Q: What materials need to be submitted for applying for a short-term overseas study program? A: The materials required for applying for a short-term overseas study program include: 1. *Application Form for HUST Postgraduate Short-term Study Program*: Must be filled out and submitted via the postgraduate training system. 2. Overseas Invitation Letter: Formal invitation letter issued by a well-known university or research institution in the country to be visited. 3. Study Plan: Submit a feasible specific study plan, clarifying the study or research arrangements during the visit. 4. Other relevant supporting materials: Depending on the project requirements, additional supporting materials such as language proficiency certificates may also be required. **Example 3: Academic Norms** Q: Under what circumstances can anonymous reports of academic misconduct be accepted? A: Anonymous reports of academic misconduct can be accepted under the following circumstances: 1. Clear Facts: The described academic misconduct is clearly stated, with no ambiguity or contradictions. 2. Sufficient Evidence: The report provides conclusive evidence to support the academic misconduct, which can support the investigation without further supplementation. 3. Clear Clues: Even if the evidence is not fully sufficient, the provided investigation clues are clear and specific, enough to guide subsequent verification work. ## License This dataset is licensed under the [Apache License 2.0](LICENSE). ## Contributions We welcome improvements and extensions to the dataset, which can be made through the following methods: 1. Submit issue reports 2. Submit new QA pairs 3. Improve the quality of existing QA pairs ## Citation If you use this dataset in your research or project, please cite it in the following format: @dataset{hust_student_handbook_2024, title={Huazhong University of Science and Technology Student Handbook LORA Fine-tuning QA Dataset}, author={HUST-CSDC-Team}, year={2024}, howpublished={{https://www.modelscope.cn/datasets/alleyf/HUST-Student-Handbook}}, home={{https://github.com/Alleyf}}, } ## Contact If you have any questions or suggestions, please contact us through the following methods: - ModelScope Issues: [Submit Feedback](https://www.modelscope.cn/datasets/alleyf/HUST-Student-Handbook/feedback) - Email: csfan@hust.edu.cn
提供机构:
maas
创建时间:
2025-07-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作