酱酒企业小麦原料质量监测分析数据集合
收藏贵州省数据知识产权登记平台2025-12-11 更新2025-12-12 收录
下载链接:
https://gzdipp.gzsis.cn:12020/noticeDetail?id=1940&type=1
下载链接
链接失效反馈官方服务:
资源简介:
数据预处理阶段制定严格规则:对理化指标缺失值,若单样本缺失率≤3%采用中位数填充,>3%则直接剔除;异常值通过Z-score法识别,结合酱酒制曲实际需求判断是否保留。分析算法上,运用主成分分析法提取影响小麦制曲品质的核心因子,通过K-means++聚类算法将小麦原料划分为特级、一级、二级、三级四个等级;构建随机森林模型,分析种植环境、理化指标与小麦制曲适配性的关联程度,算法参数经多次交叉验证与迭代优化,确保分析结果的准确性与可靠性。
A strict rule system is formulated in the data preprocessing stage: For missing values in physicochemical indexes, median imputation is implemented when the per-sample missing rate is no more than 3%, and samples with a missing rate exceeding 3% are directly removed; outliers are identified using the Z-score method, and the retention of outliers is decided in combination with the actual demands of Jiang-flavor liquor koji production. For the analysis workflow, principal component analysis (PCA) is utilized to extract core factors influencing the quality of wheat koji making; the K-means++ clustering algorithm is employed to categorize wheat raw materials into four grades: premium, first-class, second-class, and third-class; a random forest model is built to analyze the correlation degree among planting environment, physicochemical indexes and the adaptability of wheat to koji making. The algorithm parameters are optimized through multiple rounds of cross-validation and iterative tuning to ensure the accuracy and reliability of the analysis results.
提供机构:
贵州酱酒集团有限公司
创建时间:
2025-12-09
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是贵州酱酒集团有限公司自行产生的10G规模数据,专注于酱酒企业小麦原料的质量监测分析,无更新周期。它通过主成分分析、聚类和随机森林等算法,将小麦原料划分为四个等级,应用于原料采购、种植优化、行业监管和科研研究,提升酱酒制曲原料的选用效率和适配性。
以上内容由遇见数据集搜集并总结生成



