CN-121353282-B - Method and device for detecting installation normalization, electronic equipment and storage medium

CN121353282BCN 121353282 BCN121353282 BCN 121353282BCN-121353282-B

Abstract

The invention provides a method, a device, electronic equipment and a storage medium for detecting installation standardability, which belong to the technical field of artificial intelligence and comprise the steps of retrieving standardability related to a detection instruction, decomposing the standardability into at least one vision detection subtask for detecting an installation image by using a language model to obtain a structured task plan; and executing a visual detection subtask on the installation image by using the visual model to obtain a structured visual detection result, inputting the structured visual detection result and the normative terms into the language model, and obtaining a judging result of whether the installation of the product is in compliance. The cognitive framework constructed by the invention enables the language model to understand the normative terms in the knowledge base in real time and dynamically convert the normative terms into the visual model executable detection plan, and the collaborative mode of knowledge-driven planning and visual execution solves the problems of dependence on hard coding logic and poor expansibility of the traditional scheme, can quickly respond to the change of the service normative with extremely low cost, and has high accuracy and high agility.

Inventors

CHEN ZHAOJI
WU HANG
YIN DEBIN
LI BAILI

Assignees

广东美的制冷设备有限公司
美的集团股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251218

Claims (11)

1. A method of detecting installation normalization, comprising: Receiving an installation image and a detection instruction, wherein the installation image is shot in a product installation scene, and the detection instruction is an instruction for normative detection of product installation; Retrieving at least one specification term related to the detection instruction from a preset installation specification knowledge base, wherein the installation specification knowledge base is an electronic knowledge set which is built in advance and stores standards, rules and operation manuals related to product installation; Decomposing the normative clause into at least one visual detection subtask for detecting the installation image by using a language model, and generating a structured task plan containing all visual detection subtasks; Executing the vision detection subtasks on the installation image by using a vision model, and obtaining a structural vision detection result, wherein the structural vision detection result comprises vision characteristic information corresponding to each vision detection subtask; Inputting the structured visual detection result and the normative clause into the language model, and acquiring a judgment result of whether the product installation output by the language model is compliant or not; The decomposing the normative clause into at least one visual inspection subtask for inspecting the installation image by using a language model, and generating a structured task plan containing all the visual inspection subtasks, including: Constructing a structured prompt message containing the normative terms and the detection instructions; Inputting the structured prompt information into the language model to obtain the structured task plan output by the language model; The step of inputting the structured visual detection result and the normative clause to the language model to obtain a judgment result of whether the product installation output by the language model is compliant, including: judging whether an abnormal item exists in the structured visual detection result, wherein the abnormal item comprises a detection result with the confidence coefficient lower than a preset threshold value or with the state marked as uncertain; If the abnormal item does not exist, acquiring a logical matching condition of the language model based on the structured visual detection result and the standard clause, and directly outputting a qualified or unqualified judgment result; If the abnormal item exists, executing a re-judging mechanism aiming at the abnormal item; the executing the re-judging mechanism for the abnormal item comprises the following steps: generating a re-judging instruction aiming at a target area corresponding to the abnormal item by utilizing the language model, wherein the re-judging instruction comprises an image enhancement requirement or a detail checking requirement aiming at the target area; inputting the re-judging instruction into the visual model again to perform secondary detection on a local image corresponding to the target area in the installation image, and obtaining a re-judging detection result; And updating the structured visual detection result based on the repeated judgment detection result by the language model, and regenerating the judgment result.
2. The installation normalization detection method according to claim 1, wherein the structured prompt information includes character definition information, background knowledge information, task instruction information, and thought chain guide information, and the constructing a structured prompt information including the normalization terms and the detection instructions includes: determining the role definition information based on preset system expert identity configuration; determining the background knowledge information based on the canonical clause; determining the task instruction information based on the detection instruction and a preset task decomposition requirement; the mental chain guide information is determined based on a preset logical reasoning paradigm for demonstrating a reasoning process that converts the detection instructions into specific operation steps.
3. The method of claim 1, wherein the structured task plan comprises a task list; each item in the task list records a task identification and task description of one of the visual inspection subtasks.
4. The method of claim 3, wherein performing the visual inspection subtask on the installation image using a visual model to obtain a structured visual inspection result comprises: Traversing the task list, and constructing a visual query instruction containing task description aiming at each visual detection subtask; Inputting the installation image and the visual query instruction into the visual model, and acquiring a single detection result which is output by the visual model and is specific to each visual detection subtask; And summarizing all the single detection results to generate the structured visual detection result.
5. The mounting normalization detection method according to claim 4, wherein the single detection result includes position coordinate information of a detection target to which the visual model is positioned in the mounting image, and attribute state information of the detection target that is recognized; the structured visual detection results are evidence chain data formed by converging all the single detection results.
6. The installation normalization detection method according to claim 1, wherein the visual model is trained based on the following steps: obtaining a general visual language model as a basic model; constructing a sample data set in the field of household appliance installation, wherein the sample data set comprises a plurality of sample images with installation component labels and installation defect descriptions; Keeping pre-training parameters of the base model frozen and introducing trainable adapter parameters into the base model; And training and updating the adapter parameters by using the sample data set, and determining the basic model containing the trained adapter parameters as the vision model.
7. The installation normalization detection method according to claim 1, wherein the installation normalization knowledge base is created based on the steps of: acquiring an original document containing product installation specification content; Extracting text from the original document to divide the obtained text content into a plurality of independent semantic knowledge blocks; carrying out vectorization processing on each semantic knowledge block by using a text embedding model to obtain a semantic vector corresponding to each semantic knowledge block; And storing the text content of each semantic knowledge block and the semantic vector association corresponding to the text content into a vector database to construct and obtain the installation specification knowledge base.
8. The method for detecting the installation normalization of claim 7, wherein retrieving at least one normalization term associated with the detection instruction from a preset installation normalization knowledge base includes: Vectorizing the detection instruction by using the text embedding model to obtain a query vector; Calculating the similarity between the query vector and each semantic vector in the installation specification knowledge base; and screening at least one semantic knowledge block which is most relevant to the semantics of the detection instruction from the vector database according to the similarity from large to small, and taking the semantic knowledge block as the normative clause.
9. An installation normalization detection device, characterized by comprising: the information input unit is used for receiving an installation image and a detection instruction, wherein the installation image is shot in a product installation scene, and the detection instruction is an instruction for normative detection of product installation; the system comprises a standard calling unit, a detection instruction detection unit and a detection unit, wherein the standard calling unit is used for retrieving at least one standard term related to the detection instruction from a preset installation standard knowledge base, and the installation standard knowledge base is an electronic knowledge set which is built in advance and stores standards, rules and operation manuals related to product installation; the task planning unit is used for decomposing the normative clause into at least one visual detection subtask for detecting the installation image by using a language model, and generating a structured task plan containing all the visual detection subtasks; The decomposing the normative clause into at least one visual inspection subtask for inspecting the installation image by using a language model, and generating a structured task plan containing all the visual inspection subtasks, including: Constructing a structured prompt message containing the normative terms and the detection instructions; Inputting the structured prompt information into the language model to obtain the structured task plan output by the language model; The visual detection unit is used for executing the visual detection subtasks on the installation image by utilizing a visual model to obtain a structural visual detection result, and the structural visual detection result comprises visual characteristic information corresponding to each visual detection subtask; A result judging unit for inputting the structured visual inspection result and the normative terms to the language model, and obtaining a judging result of whether the product installation output by the language model is compliant; The step of inputting the structured visual detection result and the normative clause to the language model to obtain a judgment result of whether the product installation output by the language model is compliant, including: judging whether an abnormal item exists in the structured visual detection result, wherein the abnormal item comprises a detection result with the confidence coefficient lower than a preset threshold value or with the state marked as uncertain; If the abnormal item does not exist, acquiring a logical matching condition of the language model based on the structured visual detection result and the standard clause, and directly outputting a qualified or unqualified judgment result; If the abnormal item exists, executing a re-judging mechanism aiming at the abnormal item; the executing the re-judging mechanism for the abnormal item comprises the following steps: generating a re-judging instruction aiming at a target area corresponding to the abnormal item by utilizing the language model, wherein the re-judging instruction comprises an image enhancement requirement or a detail checking requirement aiming at the target area; inputting the re-judging instruction into the visual model again to perform secondary detection on a local image corresponding to the target area in the installation image, and obtaining a re-judging detection result; And updating the structured visual detection result based on the repeated judgment detection result by the language model, and regenerating the judgment result.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the installation normalization detection method according to any one of claims 1 to 8 when executing the computer program.
11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the installation normalization detection method according to any one of claims 1 to 8.

Description

Method and device for detecting installation normalization, electronic equipment and storage medium Technical Field The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for detecting installation normalization, an electronic device, and a storage medium. Background In many fields such as smart home, industrial manufacturing and equipment operation and maintenance, the quality of products or equipment is directly related to the safety, stability and user experience of subsequent use in the post-delivery installation links. Therefore, based on the photos of the installation site, whether the installation operation meets the standard of the set installation requirement or not is a critical quality inspection link, and the current general technical requirements of the industry are formed. Various automated detection schemes based on image recognition have been proposed in the prior art, with the mainstream scheme being to analyze the installed image using a dedicated, small computer vision model. For example, for the detection term of "color non-compliance of ground line", a YOLO (You Only Look Once) series of object detection models can be trained to locate the ground line in the image, and then an independent image classification model is used to determine whether the color is "yellow-green two-color". Such a model typically requires training one or more specialized models for each specific test element or specification, and by combining and concatenating these separate models, a complete test flow is constructed, and the decision logic of the flow is typically hard-coded in the program code of the business system. On the one hand, because the detection logic is solidified in the program code and the model structure, when a new detection item needs to be added or an existing specification is adjusted (for example, the packing standard of a refrigerant pipe of an air conditioner of a certain model is changed), the corresponding special model needs to be redeveloped, trained and deployed, and the service code is modified, so that the whole process has long period and high cost, and the dynamic change of the service is difficult to respond quickly. On the other hand, the robustness and generalization ability of the system are insufficient. Based on the scheme of small model combination, each model can only process specific tasks aimed at when the model is trained, and the comprehensive understanding capability of complex and changeable installation site environment is lacked. When the non-ideal conditions such as poor shooting angle, too strong or too dark local illumination, partial shielding of a target object and the like are met, the identification failure of a single model is easily caused, and the interruption or misjudgment of the whole detection link is caused. Disclosure of Invention The invention provides a method, a device, electronic equipment and a storage medium for detecting installation normalization, which are used for solving the problems that an installation normalization detection scheme in the prior art has poor expansibility, is difficult to flexibly adapt to dynamic change of normalization clauses and has insufficient robustness when facing complex and changeable actual installation scenes. The invention provides an installation normalization detection method, which comprises the following steps: Receiving an installation image and a detection instruction, wherein the installation image is shot in a product installation scene, and the detection instruction is an instruction for normative detection of product installation; Retrieving at least one specification term related to the detection instruction from a preset installation specification knowledge base; Decomposing the normative clause into at least one visual detection subtask for detecting the installation image by using a language model, and generating a structured task plan containing all visual detection subtasks; Executing the vision detection subtasks on the installation image by using a vision model, and obtaining a structural vision detection result, wherein the structural vision detection result comprises vision characteristic information corresponding to each vision detection subtask; and inputting the structured visual detection result and the normative clause into the language model, and acquiring a judgment result of whether the product installation output by the language model is compliant. According to the installation normalization detection method provided by the invention, the language model is utilized to decompose the normalization clause into at least one visual detection subtask for detecting the installation image, and a structured task plan containing all the visual detection subtasks is generated, and the method comprises the following steps: Constructing a structured prompt message containing the normative terms and the detection instr