five

日本OKWAVE问答数据集,助力大语言模型训练

收藏
数据堂2025-06-14 收录
下载链接:
https://www.datatang.com/dataset/1840
下载链接
链接失效反馈
官方服务:
资源简介:
源自日本知名问答平台OKWAVE的海量日文文本数据,截至2025年4月,提问840万条、23亿文字;回答2700万条、76亿文字;感谢(提问者对回答者表达的谢意)1550万条、17亿文字;补充说明210万条、3.6亿文字;数据字段完整(含问题、答案、类别、日期、作者、感谢及补充说明),经专业清洗,是训练面向日本市场的大语言模型、优化问答与对话系统的优质语料资源。

A large-scale Japanese text corpus sourced from OKWAVE, a leading Japanese Q&A platform. As of April 2025, it contains 8.4 million questions totaling 2.3 billion characters, 27 million answers totaling 7.6 billion characters, 15.5 million thank-you remarks (expressions of gratitude from questioners to answerers) totaling 1.7 billion characters, and 2.1 million supplementary explanations totaling 360 million characters. The dataset has complete data fields covering questions, answers, categories, dates, authors, thank-you remarks and supplementary explanations. After professional data cleaning, it is a high-quality corpus resource for training large language models (LLMs) targeting the Japanese market and optimizing Q&A and dialogue systems.
提供机构:
数据堂
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务