five

RoversX/Samantha-data-single-line-Mixed-V1

收藏
Hugging Face2023-08-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/RoversX/Samantha-data-single-line-Mixed-V1
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-generation language: - en - zh --- ``` import json # Load the provided data with open("path_to_your_original_file.jsonl", "r", encoding="utf-8") as file: mixed_data = [json.loads(line) for line in file.readlines()] # Convert the mixed data by extracting all possible Q&A pairs from each conversation reformatted_data_complete = [] for conversation in mixed_data: text = conversation['text'] # Split the text into segments based on the prefixes segments = [segment for segment in text.split("###") if segment.strip()] questions = [] answers = [] for segment in segments: if "Human:" in segment: questions.append(segment.replace("Human:", "").strip()) elif "Assistant:" in segment: answers.append(segment.replace("Assistant:", "").strip()) # Pair up the questions and answers for q, a in zip(questions, answers): reformatted_data_complete.append({ 'text': f"### Human: {q}### Assistant: {a}" }) # Save the completely reformatted data as JSONL reformatted_complete_jsonl = "\n".join(json.dumps(item, ensure_ascii=False) for item in reformatted_data_complete) with open("path_to_save_reformatted_file.jsonl", "w", encoding="utf-8") as file: file.write(reformatted_complete_jsonl) ```
提供机构:
RoversX
原始信息汇总

数据集概述

任务类别

  • 文本生成

语言

  • 英语
  • 中文
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作