RoversX/Samantha-data-single-line-Mixed-V1

Name: RoversX/Samantha-data-single-line-Mixed-V1
Creator: RoversX
Published: 2023-08-11 00:58:24
License: 暂无描述

Hugging Face2023-08-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/RoversX/Samantha-data-single-line-Mixed-V1

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - text-generation language: - en - zh --- ``` import json # Load the provided data with open("path_to_your_original_file.jsonl", "r", encoding="utf-8") as file: mixed_data = [json.loads(line) for line in file.readlines()] # Convert the mixed data by extracting all possible Q&A pairs from each conversation reformatted_data_complete = [] for conversation in mixed_data: text = conversation['text'] # Split the text into segments based on the prefixes segments = [segment for segment in text.split("###") if segment.strip()] questions = [] answers = [] for segment in segments: if "Human:" in segment: questions.append(segment.replace("Human:", "").strip()) elif "Assistant:" in segment: answers.append(segment.replace("Assistant:", "").strip()) # Pair up the questions and answers for q, a in zip(questions, answers): reformatted_data_complete.append({ 'text': f"### Human: {q}### Assistant: {a}" }) # Save the completely reformatted data as JSONL reformatted_complete_jsonl = "\n".join(json.dumps(item, ensure_ascii=False) for item in reformatted_data_complete) with open("path_to_save_reformatted_file.jsonl", "w", encoding="utf-8") as file: file.write(reformatted_complete_jsonl) ```

提供机构：

RoversX

原始信息汇总

数据集概述

任务类别

文本生成

语言

英语
中文

5,000+

优质数据集

54 个

任务类型

进入经典数据集