Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

Name: Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models
Creator: 北京市丰台区第一小学; 北京行动者教育咨询有限公司; Jingyi Li; 北京市东城区和平里第九小学; 北京市丰台区方庄小学
Published: 2025-01-17 00:00:00
License: 暂无描述

科学数据银行2025-01-17 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=d45836d61eed4162beb0a4ec231b3eaa

下载链接

链接失效反馈

官方服务：

资源简介：

Understanding children's cognitive and conceptual learning outcomes through children's drawings has been proven to be an effective method. However, there are two major issues with existing research: 1 The coding rules for children's painting tasks and analytical painting heavily rely on research objectives, resulting in low ecological validity of research conclusions and difficulty in applying research results to frontline teaching; 2. The personal heterogeneity of painting interpretation is prominent, therefore at least two researchers must independently identify the painting and conduct consistency tests (such as Kappa coefficient). Therefore, this study establishes a norm for children's scientific painting: 1 Provide baseline drawing characteristics of children without specific tasks for relevant research; 2. All paintings are recognized using the same Large Language Model (LLM), effectively avoiding consistency issues. This study focuses on the following three issues: 1 Do children have consistent representation preferences for the same theme in their drawings (for example, when it comes to the formation of solar eclipses, most children use light path diagrams to represent them)? Is there a relationship between this preference (if any) and the accuracy of LLM in recognizing paintings? What factors, if any, may drive this preference? This study included 1420 children's paintings covering 9 scientific themes/concepts. The drawing prompt is: 'We have learned the knowledge of..., please draw it out.' There are no other prompts to prevent students from being prompted or misled. Then, without prompting LLM (ChatGLM 4.0), let LLM guess the drawn content. For LLM recognition results and reasons, semantic similarity processing is adopted, mainly using semi supervised word2vec (semi supervised algorithm TF-IDF) to generate sentence average word vectors, and then using cosine similarity to calculate semantic similarity. Finally, we analyzed several factors that may affect representation consistency using Kendall's rank correlation coefficient (Kendall tau). This dataset includes the following four parts: 1. LLM image recognition reasons (. txt); 2. All word vectors generated by word2vec (. extcel); 3. Semantic similarity matrix (Excel file named yuyi -...); 4. Results of correlation factor analysis. In addition, all file names are the Chinese phonetic of the theme/concept, which are: "chuan" -increasing the carrying capacity of a boat, "dianheci" -electromagnetics, "feiting" -boiling, "fuli" -buoyancy, "jiandandianlu" -circuit, "rishi" -solar eclipses, "taiyangxi" -solar system, "wuli&huaxue" -physical and chemical changes, "zhiwushengzhang" -life history of a plant.

提供机构：

北京市丰台区第一小学; 北京行动者教育咨询有限公司; Jingyi Li; 北京市东城区和平里第九小学; 北京市丰台区方庄小学

创建时间：

2024-07-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集