REILX/extracted_tagengo_gpt4
收藏Hugging Face2024-05-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/REILX/extracted_tagengo_gpt4
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
### Original dataset.
- https://huggingface.co/datasets/lightblue/tagengo-gpt4
### Python Code
使用以下Python遍历tagengo-gpt4的jsonl文件中的每一行数据,提取出human和gpt对话内容,并使用instruction和output作为键来保存提取的信息到新的jsonl文件中:
(Use the following Python code to iterate through each line of the jsonl file of tagengo-gpt4, extract the dialogue content between human and GPT, and save the extracted information to a new jsonl file using "instruction" and "output" as keys.)
```python
import json
input_file_path = 'tagengo-gpt4.jsonl'
output_file_path = 'extracted_tagengo_gpt4.jsonl'
with open(input_file_path, 'r', encoding='utf-8') as input_file, \
open(output_file_path, 'w', encoding='utf-8') as output_file:
for line in input_file:
data = json.loads(line.strip())
conversations = data.get('conversations', [])
extraction = {'instruction': '', 'output': ''}
for conv in conversations:
if conv['from'] == 'human':
extraction['instruction'] = conv['value']
elif conv['from'] == 'gpt':
extraction['output'] = conv['value']
output_file.write(json.dumps(extraction, ensure_ascii=False) + '\n')
print(f"The conversation content has been extracted, and the results have been saved to'{output_file_path}'。")
```
提供机构:
REILX
原始信息汇总
数据集概述
数据集名称
- tagengo-gpt4
数据集来源
数据集文件格式
- jsonl
数据集内容提取方法
- 使用Python代码遍历jsonl文件中的每一行数据,提取人类与GPT之间的对话内容。
- 提取的信息以"instruction"和"output"为键保存到新的jsonl文件中。
数据集许可证
- Apache-2.0



