CN-122024434-A - Lightweight system and method based on image snapshot and multi-mode AI identification

CN122024434ACN 122024434 ACN122024434 ACN 122024434ACN-122024434-A

Abstract

The invention relates to the technical field of intelligent monitoring and safe operation, in particular to a lightweight system and a lightweight method based on image snapshot and multi-mode AI identification, wherein the lightweight system comprises a monitoring module, a control module and a control module, wherein the monitoring module monitors motion information, positioning information and environment information; the system comprises a monitoring module, a sound information acquisition module, a main control module, a terminal and a cloud end, wherein the monitoring module is used for monitoring the motion information and the environment information acquired by the monitoring module, the sound information acquisition module is used for acquiring sound information and image information, the main control module is used for controlling the opening and closing of each module and the working state of each module, a two-stage wake-up mechanism is arranged, and finally the terminal is controlled to enter a working mode. The scheme can integrate the monitoring and analysis of the operation process of the multi-source data, perform multi-aspect scene sensing, and reduce the power consumption and the network dependency.

Inventors

CHEN ZAIXIN
HUANG CHENGXIN
WANG YUAN
TANG HONGBO
CHEN KUN
ZHANG XIANGRUI
ZHAO YUJIA

Assignees

国网四川省电力公司南充供电公司

Dates

Publication Date: 20260512
Application Date: 20260407

Claims (10)

1. The lightweight system based on image snapshot and multi-mode AI recognition is characterized by comprising a terminal and a cloud; The terminal comprises a main control module, a monitoring module, an acquisition module and a communication module; the monitoring module is used for monitoring the movement information, the positioning information and the environment information; The acquisition module is used for acquiring sound information and image information; The communication module is used for uploading the motion information, the environment information, the sound information and the image information to the cloud; The main control module is used for controlling the opening and closing and working states of the monitoring module, the acquisition module and the communication module, and is provided with a two-stage wake-up mechanism, and comprises: A first stage awakening step, namely starting a monitoring module, judging whether to enter a buffer zone of a preset operation area or not by receiving the motion information and the positioning information acquired by the monitoring module, and if so, executing a second stage awakening step; a second-stage awakening step, namely starting an acquisition module, periodically acquiring image information through a first preset acquisition frequency, and uploading the cloud end through a communication module; the cloud end is used for analyzing the current scene according to the image information, and if the current scene is a preset working scene, the control terminal enters a working mode; The cloud end is further used for analyzing the operation state and the risk level according to the uploaded image information, sound information, motion information and environment information, and triggering the terminal to perform early warning according to the operation state and the risk level.
2. The light-weight system based on image capture and multi-mode AI recognition of claim 1, wherein the cloud end is further used for carrying out state analysis on motion information to obtain personnel motion states, wherein the personnel motion states comprise resting, walking and ascending; wherein the second presets the acquisition frequency, adjusts according to personnel motion state, includes: Wherein the method comprises the steps of The current second preset acquisition frequency after adjustment is obtained; in order to adjust the coefficient of the power supply, Is an action factor and is confirmed according to the movement state of the personnel.
3. The lightweight system based on image capture and multi-modal AI recognition of claim 1, wherein the cloud deployment has a multi-modal AI model comprising a target detection model, a text conversion model, a speech emotion recognition model, a multi-modal fusion layer, a job status analysis model, a risk level analysis model; The target detection model detects targets in the image information, outputs target boundary boxes and category confidence, and extracts scene semantic feature vectors at the same time And output the corresponding confidence Wherein the targets include power utility type targets and personnel behavior type targets; the text conversion model converts the voice information into text, performs keyword matching and outputs text characteristics ; The voice emotion recognition model detects emotion in voice information, outputs emotion labels and confidence level, and forms emotion characteristics 。
4. The lightweight system based on image capture and multi-modal AI recognition of claim 3, wherein the risk level The method comprises a first-level risk level, a second-level risk level, a third-level risk level and a fourth-level risk level; A first level of risk level, Indicating no illegal action and job specification; The secondary risk level of the wind is set, Indicating a slight violation; three levels of risk are provided, Representing a general violation; A four-level risk level is provided, Representing a serious violation; Through a predefined rule-breaking behavior library, basic weights are distributed to each rule-breaking behavior ; Different job states have different tolerance to illegal behaviors, and risk coefficients of different job states are defined ; If the confidence coefficient output by the image recognition model is lower than a confidence coefficient threshold value, neglecting the violation behavior; if the number of times of illegal behaviors in the same user in the predicted detection time is greater than a preset number of times threshold, the risk levels should be accumulated, and penalty factors are correspondingly set ; And analyzing the risk level according to the detected illegal behaviors and the operation state in the target by adopting a weighted risk value method, wherein the method comprises the following steps of: For each currently identified violation Calculating a single risk value: ; taking the maximum single risk value, and combining the risk coefficient and the penalty factor for adjustment: ; mapping the maximum individual risk value to a discrete level: ; Wherein the method comprises the steps of A threshold value is corresponding to the risk level.
5. The lightweight system based on image capture and multi-modal AI recognition of claim 4, wherein the cloud is further configured to extract a voltage level from a work ticket Calculating the near electricity early warning distance : ; And if the movement state of the person is not ascending, judging that the person is not ascending, and setting the ascending early warning height Rice; The system is also used for pushing the operation state, the risk level, the voltage level, the near electricity early warning distance and the ascending early warning height to the terminal; And the terminal is used for carrying out early warning according to one or more information of the operation state, the risk level, the voltage level, the near electricity early warning distance and the ascending early warning height.
6. The lightweight system based on image capture and multi-modal AI recognition of claim 5, wherein the terminal further comprises a storage module; a storage module for storing motion information, environment information, sound information, and image information; The system is also used for storing the operation state, the risk level, the voltage level, the near electricity early warning distance and the ascending early warning height, wherein the near electricity early warning distance and the ascending early warning height are used as current thresholds and are respectively compared with the actual distance and the relative height, if the actual distance is smaller than the near electricity early warning distance and the preset time is continuous, the alarm is triggered, and if the relative height is larger than the ascending early warning height, the alarm is triggered.
7. The lightweight system based on image capturing and multi-mode AI recognition of claim 1, wherein the terminal is further configured to determine, according to the action information, whether a user of the terminal falls, and if so, upload the cloud; the cloud end is further used for analyzing the specific injury position and injury type of the wearer by combining the judgment result with the collected action information, sound information and image information, giving out movement advice and pushing the movement advice to the terminal.
8. The lightweight system based on image capture and multi-modal AI recognition of claim 7, wherein the terminal calculates a resultant acceleration and resultant angular velocity for the acceleration and angular velocity in the motion information; According to the acceleration and the angular velocity, adopting a fall detection model to detect whether a terminal wearer falls or not; The method comprises the steps of determining a composite acceleration, namely a falling detection model, wherein the falling detection model is a two-stage triggering model, and a first stage carries out quick threshold judgment, namely a second stage is carried out if the composite acceleration exceeds a preset acceleration threshold, or a current attitude angle is calculated, and if the change of the attitude angle exceeds a preset angle within a preset time interval after impact, the second stage is carried out; The second stage carries out machine learning judgment, namely extracting the motion characteristics in a window of a preset time interval after the impact occurs, inputting the motion characteristics into a classifier, and judging whether the motion characteristics are true falls or not; If the classification result is a fall, immediately triggering an emergency mode: The method comprises the steps that high-frequency image acquisition and continuous sound acquisition are carried out on terminals in a preset distance range of the terminals worn by the user; and uploading the motion information, the image information and the sound information with the latest preset time length to the cloud.
9. The lightweight system based on image capture and multi-mode AI recognition of claim 8, wherein the cloud extracts image pose features and target detection results from image information; splicing the image attitude characteristics, the target detection results, the audio characteristics and the motion characteristics into vectors, inputting a multi-mode fusion network, and outputting an injured part and an injured type; The cloud end is provided with a rescue knowledge base in advance, the injured part, the injured type and the positioning information are used as input, rules of the knowledge base are matched, a suggestion text is generated, and the suggestion text is pushed to the terminal for broadcasting so as to assist rescue.
10. A method for lightening image capture and multi-modal AI recognition, characterized in that the lightening system based on image capture and multi-modal AI recognition of any one of claims 1-9 is used.

Description

Lightweight system and method based on image snapshot and multi-mode AI identification Technical Field The invention relates to the technical field of intelligent monitoring and safety operation, in particular to a lightweight system and method based on image snapshot and multi-mode AI identification. Background With the deep propulsion of intelligent transformation in the power industry, on-site operation safety monitoring has become a key link for guaranteeing the stable operation of a power system. The electric power operation environment is complex, the risk factors are many, traditional safety supervision mode mainly relies on manual inspection and fixed point position video monitoring, has the problems of large supervision blind area, response lag, difficulty in covering mobile scattered operation points and the like. In order to solve the problems, intelligent wearable monitoring equipment (such as intelligent safety helmets) integrating functions of video acquisition, positioning, communication and the like is gradually applied to high-risk industries such as electric power, buildings and the like, and real-time monitoring and recording of an operation process are realized. The existing intelligent wearable monitoring equipment is mainly divided into two types, wherein the first type adopts a continuous video recording mode to upload the whole operation process to a monitoring center in real time, and the second type adopts a local storage and post analysis mode to export video for manual or semi-automatic examination after the operation is finished. The two modes have the defects that the first mode has high power consumption and short duration, the continuous video recording or high-frequency image transmission causes extremely high equipment power consumption, the battery duration is usually only 2-4 hours, the all-weather (more than 8 hours) monitoring requirement of electric power operation is difficult to meet, the actual use experience and the operation efficiency are seriously influenced by frequent charging or battery replacement; The second mode scene perception capability is lost, the existing scheme is mainly used for all-weather indiscriminate acquisition, whether personnel enter an operation area or not cannot be intelligently identified, the personnel are in what operation stage, data are always acquired and transmitted at a fixed frequency, the blind acquisition mode not only wastes energy, but also generates a large amount of redundant data, and the background analysis burden is increased. Therefore, a need exists for a lightweight system and method based on image capturing and multi-mode AI identification, which can integrate monitoring and analysis of the operation process of multi-source data, perform multi-aspect scene sensing, reduce power consumption and network dependency, and solve the problems in the prior art. Disclosure of Invention One of the purposes of the invention is to provide a light system based on image snapshot and multi-mode AI recognition, which can synthesize the monitoring and analysis of the operation process of multi-source data, perform multi-aspect scene perception, and reduce the power consumption and the network dependence. The invention provides a basic scheme I, which is a lightweight system based on image snapshot and multi-mode AI identification, and comprises a terminal and a cloud; The terminal comprises a main control module, a monitoring module, an acquisition module and a communication module; the monitoring module is used for monitoring the movement information, the positioning information and the environment information; The acquisition module is used for acquiring sound information and image information; The communication module is used for uploading the motion information, the environment information, the sound information and the image information to the cloud; The main control module is used for controlling the opening and closing and working states of the monitoring module, the acquisition module and the communication module, and is provided with a two-stage wake-up mechanism, and comprises: A first stage awakening step, namely starting a monitoring module, judging whether to enter a buffer zone of a preset operation area or not by receiving the motion information and the positioning information acquired by the monitoring module, and if so, executing a second stage awakening step; a second-stage awakening step, namely starting an acquisition module, periodically acquiring image information through a first preset acquisition frequency, and uploading the cloud end through a communication module; the cloud end is used for analyzing the current scene according to the image information, and if the current scene is a preset working scene, the control terminal enters a working mode; The cloud end is further used for analyzing the operation state and the risk level according to the uploaded image information, sound information, motion informati