five

Evaluation Scores Dataset: LLM and Human Assessment of Halal Tourism Websites for Knowledge Graph Construction

收藏
DataCite Commons2024-12-18 更新2025-01-06 收录
下载链接:
https://figshare.com/articles/dataset/Evaluation_Scores_Dataset_LLM_and_Human_Assessment_of_Halal_Tourism_Websites_for_Knowledge_Graph_Construction/28050476
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the raw evaluation scores used in the research paper "Enhancing Knowledge Graph Construction with Automated Source Evaluation Using Large Language Models" published in the Journal of Universal Computer Science (2024). The data represents comparative evaluations of Halal tourism websites in Japan, assessed by both Large Language Models (LLMs) and human experts.The dataset includes:LLM evaluation scores across multiple criteria (credibility, relevance, content quality, coverage, comprehensiveness, and accessibility) for four Halal tourism websites, using various LLM models including GPT-3.5-turbo, GPT-4, Google Gemini Pro, and Mixtral-8x7B-Instruct-v0.1Human expert evaluation scores from 22 respondents using the same criteria frameworkThese evaluations were conducted using a weighted criteria system (credibility: 30%, relevance: 24%, content quality: 21%, coverage: 8%, comprehensiveness: 8%, accessibility: 9%) to assess the suitability of websites as knowledge sources for constructing domain-specific knowledge graphs.The data underpins the comparative analysis between LLM and human evaluations, demonstrating the effectiveness of automated source evaluation in knowledge graph construction. This dataset can be valuable for researchers studying automated website evaluation, knowledge graph construction, or the application of LLMs in quality assessment tasks.

本数据集包含发表于《通用计算机科学杂志》(Journal of Universal Computer Science)2024年期刊的研究论文《基于大语言模型自动化源评估的知识图谱构建优化》(Enhancing Knowledge Graph Construction with Automated Source Evaluation Using Large Language Models)中所使用的原始评估分数。该数据为针对日本清真旅游网站的对比评估结果,由大语言模型(Large Language Models,LLMs)与人类专家共同完成测评。本数据集包含以下内容:1. 针对4家清真旅游网站,在可信度、相关性、内容质量、覆盖范围、全面性与可访问性多项评估维度下,采用GPT-3.5-turbo、GPT-4、Google Gemini Pro及Mixtral-8x7B-Instruct-v0.1等多款大语言模型生成的评估分数;2. 由22名受访者基于同一评估维度框架给出的人类专家评估分数。本次评估采用加权评分体系,各维度权重设置为:可信度30%、相关性24%、内容质量21%、覆盖范围8%、全面性8%、可访问性9%,用于评估网站作为构建领域专属知识图谱的知识源的适配性。该数据集支撑了大语言模型与人类评估的对比分析,验证了自动化源评估在知识图谱构建中的有效性。本数据集可用于自动化网站评估、知识图谱构建,以及大语言模型在质量评估任务中应用方向的相关研究。
提供机构:
figshare
创建时间:
2024-12-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作