RoversX/Samantha-data-single-line-Mixed-V1
收藏Hugging Face2023-08-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/RoversX/Samantha-data-single-line-Mixed-V1
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
language:
- en
- zh
---
```
import json
# Load the provided data
with open("path_to_your_original_file.jsonl", "r", encoding="utf-8") as file:
mixed_data = [json.loads(line) for line in file.readlines()]
# Convert the mixed data by extracting all possible Q&A pairs from each conversation
reformatted_data_complete = []
for conversation in mixed_data:
text = conversation['text']
# Split the text into segments based on the prefixes
segments = [segment for segment in text.split("###") if segment.strip()]
questions = []
answers = []
for segment in segments:
if "Human:" in segment:
questions.append(segment.replace("Human:", "").strip())
elif "Assistant:" in segment:
answers.append(segment.replace("Assistant:", "").strip())
# Pair up the questions and answers
for q, a in zip(questions, answers):
reformatted_data_complete.append({
'text': f"### Human: {q}### Assistant: {a}"
})
# Save the completely reformatted data as JSONL
reformatted_complete_jsonl = "\n".join(json.dumps(item, ensure_ascii=False) for item in reformatted_data_complete)
with open("path_to_save_reformatted_file.jsonl", "w", encoding="utf-8") as file:
file.write(reformatted_complete_jsonl)
```
提供机构:
RoversX
原始信息汇总
数据集概述
任务类别
- 文本生成
语言
- 英语
- 中文



