vnu-llm2023-ftdata/QA_web_crawl_data
收藏Hugging Face2025-08-08 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/vnu-llm2023-ftdata/QA_web_crawl_data
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了从越南河内国立大学技术学院新闻网站自动生成的问答对(Q&A)。这些问答对是为了验证LLM模型在教育和招生背景下的性能,以及评估LLM的检索、引用和准确回答能力。数据来源于学院的官方新闻页面,由专家根据新闻主题手动定义问题主题。数据通过爬取新闻文本、定义问题提示、使用Gemini 2.0 Flash模型生成,然后进行后处理以筛选和清理数据。数据集以JSON格式提供,包括问题、答案、参考文本、URL、难度、问题类型和主题等信息。
The dataset consists of automatically generated question-answer pairs (Q&A) from the news website of the University of Engineering and Technology - Vietnam National University, Hanoi. These Q&A pairs are used for validating LLM models in educational and admissions contexts, as well as evaluating the retrieval, citation, and accurate answering capabilities of LLMs. The data source is the official news page of the university, with question topics manually defined by experts based on news subjects. Data is generated by crawling news texts, defining prompt questions, using the Gemini 2.0 Flash model, and then post-processed to filter and clean the data. The dataset is provided in JSON format, including information such as questions, answers, reference text, URL, difficulty, question type, and topic.
提供机构:
vnu-llm2023-ftdata



