CN-117390161-B - Flame detection method based on visual text large-scale pre-training model

CN117390161BCN 117390161 BCN117390161 BCN 117390161BCN-117390161-B

Abstract

The invention discloses a flame detection method based on a large-scale pre-training model of a visual text, which comprises the specific steps of obtaining the large-scale pre-training model of the visual text, carrying out fine adjustment on an image description generation data set, a visual question-answer data set and a visual text retrieval data set on the pre-training model to respectively obtain a description generation model, a visual question-answer model and a visual text retrieval model, integrating the three models by utilizing a multi-expert mode to obtain a multi-expert flame detection model, and judging whether flame exists in an image to be detected or not by fusing output results of the three models, so that a flame detection task is finished at high quality. The invention takes a visual text large-scale pre-training model as a core, further provides a multi-expert flame detection model, has strong generalization capability, context understanding capability and reasoning capability, can obtain good performance under the condition of task scenes without specific flame data training, and has wider application prospect.

Inventors

LI XI
MIAO PEIHAN
LI XUEWEI

Assignees

浙江大学
北京智谱华章科技有限公司

Dates

Publication Date: 20260508
Application Date: 20231024

Claims (10)

1. A flame detection method based on a visual text large-scale pre-training model is characterized by comprising the following steps: The method comprises the steps of S1, performing fine adjustment on a visual text large-scale pre-training model on an image description generation data set to obtain a description generation model for flame detection, wherein the judgment flow in the description generation model is that an image to be detected is input into the description generation model, the description generation model outputs N text descriptions, whether the N text descriptions contain keywords related to flames or not is detected, and whether the flame exists in the image to be detected is judged according to a detection result; S2, performing fine adjustment on a visual text large-scale pre-training model on a visual question and answer data set to obtain a visual question and answer model for flame detection, wherein the judging flow in the visual question and answer model is that an image to be detected and O text queries are input into the visual question and answer model together, each text query is a question of whether flame exists in the question image, the visual question and answer model outputs corresponding O answer results, and whether flame exists in the image to be detected is judged according to the answer results; S3, performing fine adjustment on a visual text retrieval data set by a visual text large-scale pre-training model to obtain a visual text retrieval model for flame detection, wherein the judging flow in the visual text retrieval model is that an image to be detected and Q text descriptions are input into the visual text retrieval model together, each text description is a keyword related to flame, the visual text retrieval model outputs visual characteristics and text characteristics of each text description, similarity between the visual characteristics and each text characteristic is calculated, and whether flame exists in the image to be detected is judged by a threshold method; And S4, integrating the description generation model, the visual question-answer model and the visual text retrieval model by utilizing a multi-expert mode to obtain a multi-expert flame detection model, and finally judging whether flames exist in the image to be detected by fusing output results of the three models.
2. The flame detection method based on a visual text large-scale pre-training model according to claim 1, wherein the visual text large-scale pre-training model adopts a BLIP model.
3. The method for detecting flames based on a large-scale pre-training model for visual texts according to claim 1, wherein the keywords related to flames are "fire", "flame", "fire" or "flag".
4. The method for detecting flames based on the large-scale pre-training model of visual texts according to claim 1, wherein in the step S1, whether flames exist in a given image or not is judged according to a detection result, and if any text description contains keywords related to the flames, flames are considered to exist in the image to be detected.
5. The method for detecting flames based on the large-scale pre-training model of visual texts according to claim 1, wherein in the step S2, whether flames exist in a given image or not is judged according to the answer results, and if flames exist in any one of the answer results, the flames are considered to exist in the image to be detected.
6. The method for detecting flames based on large-scale pre-training model for visual text according to claim 1, wherein in S3, for visual features And each text feature Similarity between the two The calculation function of (a) is as follows: Wherein, the Representing a dot product; Representing the product; Representing the L2 norm.
7. The method for determining whether a flame exists in an image to be detected by using a thresholding method according to claim 1, wherein the method for determining whether a flame exists in an image to be detected in S3 is to compare a similarity value between a visual feature and each text feature with a set similarity threshold, and consider that a flame exists in the image to be detected if any one of the similarity values is higher than the set similarity threshold.
8. The method for flame detection based on a visual text large-scale pre-training model of claim 7, wherein the similarity threshold is 0.2.
9. The method for detecting flames based on the large-scale pre-training model for visual texts according to claim 1, wherein in the step S4, the method for fusing the output results of the three models by the multi-expert flame detection model is that if the number of experts with flames exceeds a set number threshold, the flames are finally considered to be present in the image to be detected.
10. The method for flame detection based on a visual text large-scale pre-training model of claim 9, wherein in S4, the number threshold is 2.

Description

Flame detection method based on visual text large-scale pre-training model Technical Field The invention relates to the field of computer vision, in particular to a flame detection method based on a large-scale pre-training model of visual texts. Background Flames have the extremely dangerous and damaging feature of being able to quickly ignite and spread the combustible material, causing extensive damage. For example, economically, the damage and loss from fires is enormous, repair and reconstruction costs are high, and businesses and personal properties are subject to heavy trauma, negatively impacting business and employment. In addition, in environmental aspects, fire may release a large amount of smoke, toxic gases and dust, which have serious effects on air quality and environment, exacerbating global warming and climate change. Therefore, how to prevent flames to reduce fire hazard is important. In recent years, a flame detection method based on target detection has attracted extensive research. Such methods utilize a mature target detection framework such as yolov to accomplish flame detection. However, the flame detection method based on target detection often faces the characteristics of few learnable samples, difficult sample collection, complex and changeable samples and the like in practice, so that a model obtained by training is easy to be over-fitted, and the generalization capability is weak. This results in that the flame detection method based on target detection is difficult to apply to a real scene. Recently, the advent of large-scale pre-training models of visual text has enabled the above challenges to be alleviated. Specifically, the visual text large-scale pre-training model is obtained by training massive visual and text data, has strong context understanding and reasoning capability, can better cope with complex scenes and semantic contexts, has superior generalization capability, and can achieve good performance under the condition of task scenes without specific data training. Thus, how to implement flame detection using a visual text large-scale pre-training model becomes a challenge. Disclosure of Invention Aiming at the problems, the invention provides a flame detection method based on a visual text large-scale pre-training model. The technical scheme adopted by the invention is as follows: A flame detection method based on a visual text large-scale pre-training model comprises the following steps: The method comprises the steps of S1, performing fine adjustment on a visual text large-scale pre-training model on an image description generation data set to obtain a description generation model for flame detection, wherein the judgment flow in the description generation model is that an image to be detected is input into the description generation model, the description generation model outputs N text descriptions, whether the N text descriptions contain keywords related to flames or not is detected, and whether the flame exists in the image to be detected is judged according to a detection result; S2, performing fine adjustment on a visual text large-scale pre-training model on a visual question and answer data set to obtain a visual question and answer model for flame detection, wherein the judging flow in the visual question and answer model is that an image to be detected and O text queries are input into the visual question and answer model together, each text query is a question of whether flame exists in the question image, the visual question and answer model outputs corresponding O answer results, and whether flame exists in the image to be detected is judged according to the answer results; S3, performing fine adjustment on a visual text retrieval data set by a visual text large-scale pre-training model to obtain a visual text retrieval model for flame detection, wherein the judging flow in the visual text retrieval model is that an image to be detected and Q text descriptions are input into the visual text retrieval model together, each text description is a keyword related to flame, the visual text retrieval model outputs visual characteristics and text characteristics of each text description, similarity between the visual characteristics and each text characteristic is calculated, and whether flame exists in the image to be detected is judged by a threshold method; And S4, integrating the description generation model, the visual question-answer model and the visual text retrieval model by utilizing a multi-expert mode to obtain a multi-expert flame detection model, and finally judging whether flames exist in the image to be detected by fusing output results of the three models. Preferably, the visual text large-scale pre-training model adopts a BLIP model. Preferably, the keywords related to flame are synonyms of "fire" and "flame", or names of substances capable of emitting light and burning. Preferably, in the step S1, the method for judging whether the flame ex