Dataset after preprocessing for agricultural knowledge extraction experiment
收藏DataCite Commons2025-12-02 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=08a72a66ff604942921aa5deb4cb5cf2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset, "Pre processed Dataset for Agricultural Knowledge Extraction Experiment," is a text dataset in the field of agriculture, which systematically compiles agricultural knowledge for four major crops: cotton, rice, corn, and cotton. The data content comes from the text crawling and organization of agricultural technology guidance literature and related websites, and is extracted and structured into text records through manual and semi-automatic methods. During the data processing, text extraction tools or scripts were used to convert raw agricultural technology data into a unified format of plain text files with a certain expression structure. The spatial and temporal scope covered by the data is not clearly indicated in the document, but the content reflects the relevant agricultural knowledge of the promotion stage of modern agricultural technology in the agricultural activities of major agricultural crops in China. The data is stored in text format, with a file size of approximately 1.48MB and in UTF-8 encoded plain text (. txt) format. It can be opened and browsed using any text editor (such as Notepad++, VS Code, Sublime Text, etc.) or common office software (such as Microsoft Word, WPS). This dataset contains over 8000 independent entries, each of which can be viewed as a "data row". Its "column" structure is implicit in the text, including fields such as crop type, pest and disease name, control method type (agricultural, physical, biological, chemical control), crop growth stage, and specific operating instructions. The measurement units commonly used in chemical control, such as "dosage per acre" and "kilograms of water", are in line with the actual usage habits of Chinese agriculture. The overall integrity of the data is high, with no obvious missing entries. This dataset is suitable for natural language processing tasks such as building agricultural knowledge bases, training intelligent question answering systems, and developing recommendation models for disease and pest control, and has high practical value in the industry.
提供机构:
Science Data Bank
创建时间:
2025-12-02



