Zhaoming213/baike_qa
收藏Hugging Face2026-03-16 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Zhaoming213/baike_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
## 数据集介绍
这是社区问答对话数据集,对原始数据集进行了简单清洗,最终变成了类似:{"text":....}对结构,jsonl文件。
## 清洗代码
```
import json
datas = []
with open("baike_qa_train.json","r",encoding="UTF-8") as f:
for line in f:
datas.append(json.loads(line))
with open("baikeqa.jsonl","w",encoding="UTF-8") as f:
for lines in datas:
if len(lines['title']) + len(lines['answer']) <512:
sections = {"text":f"{lines['title']}{lines['answer']}"}
f.write(json.dumps(sections,ensure_ascii=False)+"\n")
---
许可证:Apache 2.0
---
## 数据集介绍
本数据集为社区问答对话类数据集,我们对原始数据集进行了简易清洗预处理,最终得到格式为{"text": ……}的JSON Lines(JSONL)格式文件。
## 清洗代码
python
import json
datas = []
with open("baike_qa_train.json", "r", encoding="UTF-8") as f:
for line in f:
datas.append(json.loads(line))
with open("baikeqa.jsonl", "w", encoding="UTF-8") as f:
for lines in datas:
if len(lines['title']) + len(lines['answer']) < 512:
sections = {"text": f"{lines['title']}{lines['answer']}"}
f.write(json.dumps(sections, ensure_ascii=False) + "
")
提供机构:
Zhaoming213



