Big data and machine learning-based future chemical management strategy
收藏中国科学数据2026-04-13 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1360/TB-2025-0221
下载链接
链接失效反馈官方服务:
资源简介:
Chemicals are inevitably released into the environment throughout their life cycle as development is fostered and livelihoods are enhanced, posing significant risks to human health. The vast number of chemicals makes traditional management methods insufficient, highlighting the need for the development of big data-based chemical management strategies to protect human health and promote sustainable development. Big data is characterized by multi-source and multimodal heterogeneity. Multi-source heterogeneity refers to the integration of data from diverse sources, technological platforms, or time points, and multimodal heterogeneity pertains to the various modalities of data, including numerical vectors, text sequences, and others.The integration of multi-source and multimodal heterogeneous data, including chemical structures, production volumes, exposure, and hazards, provides a robust foundation for sound chemical management. It is essential to achieve intelligent and precise management throughout the entire lifecycle of chemicals, thereby enabling process optimization, pollution control, and carbon mitigation. However, the challenge of accurately extracting and utilizing information from big data remains a pressing issue that requires urgent resolution.Consequently, the development of a strategy that combines artificial intelligence (AI) with big data is pivotal for the sustainable development of industries involved in the application and production of chemicals. The strategy can be divided into three stages: the initial, middle, and final stages. The first stage focuses on big data collection, which necessitates the application of large language models. These models, built upon corpus construction, pre-training, and domain knowledge fine-tuning, facilitate the integration of chemical data dispersed across reports, literature, and databases, overcoming the limitations of manual data collection and annotation. Furthermore, the information in scientific studies is not limited to textual formats but is also presented in multimodal forms, such as images. With the rapid advancements of technology, multimodal large models hold the potential to offer innovative approaches for the structured extraction and aggregation of multimodal data, thereby building a more comprehensive foundational database for the management of chemicals.The middle stage of the strategy is big data analysis, achieved through the development of both discriminative and generative AI models. Discriminative AI models facilitate end-to-end learning from big data. These models are capable of integrating multimodal data to screen and predict the properties, hazards, and environmental risks of chemicals, thereby providing a scientific basis for management decisions. Generative AI models can generate optimal chemical management strategies under different constraints, such as the molecular design of green alternatives. Moreover, integrating discriminative and generative AI models to develop AI agents is expected to simulate expert decision-making processes, offering intelligent recommendations for chemical management.The terminal stage of the strategy involves the creation of a digital twin system, which establishes a mapping between physical entities and their digital counterparts in chemical management. By utilizing data acquired from the first stage and analytical parameters generated during the middle stage, digital twins can be constructed to interact with the impacts of real-world environments in virtual environments. Furthermore, data acquisition prioritization and model optimization can be guided to form the intelligent closed-loop management of chemicals.The AI-driven chemical management strategy, powered by big data, aims to enhance the intelligence of chemical management processes, propelling the chemical industry towards an efficient, safe, and sustainable development path. The transformative process requires not only continuous technical advancements but also the establishment of a robust support system, encompassing cross-domain collaboration, infrastructure, and other essential aspects.
创建时间:
2025-06-17



