CN-122027773-A - Campus tyrant real-time monitoring and alarming method based on audio and video double modes

CN122027773ACN 122027773 ACN122027773 ACN 122027773ACN-122027773-A

Abstract

A campus tyrant real-time monitoring and alarming method based on audio and video double modes belongs to the technical field of intelligent security and artificial intelligent computer vision. The method comprises the steps of collecting video image sequences and real-time audio streams of a monitoring area, positioning human frames in video frame images through a human body feature extraction algorithm, carrying out video anomaly alarming if the presence of a Baling behavior is judged based on the human frames, simultaneously matching the collected audio streams with sensitive word stock containing different threat levels, triggering corresponding audio alarming according to the word levels, recording time stamps for triggering the video anomaly alarming and the audio alarming, and triggering audio-visual dual serious alarming and locking an interface to enter a serious alarming state if the occurrence time difference of the two is within a preset association time window. The method has the advantages of multi-mode fusion, false alarm reduction, high-sensitivity action detection, rapid response, voice grading recognition and blind coverage.

Inventors

DU GAO
Su Haocheng
CUI CHEN
Chen Diefan
Huang Jianuan

Assignees

浙江警察学院

Dates

Publication Date: 20260512
Application Date: 20260320

Claims (6)

1. A campus tyrant real-time monitoring and alarming method based on audio and video double modes is characterized by comprising the following steps: Step one, collecting a video image sequence and a real-time audio stream of a monitoring area; step two, locating a human body frame in a video frame picture through a human body feature extraction algorithm, and carrying out video anomaly alarm if the existence of a B-E behavior is judged based on the human body frame; And thirdly, recording a time stamp for triggering the video abnormal alarm and the audio alarm, and triggering the audio and video dual serious alarm and locking the interface to enter a serious alarm state if the occurrence time difference of the video abnormal alarm and the audio alarm falls into a preset associated time window.
2. The method for monitoring and alarming campus tyrant based on audio and video double modes in real time according to claim 1 is characterized in that the human body feature extraction algorithm in the second step is to scale a video frame and then detect a human body frame through HOG and SVM algorithms.
3. The method for monitoring and alarming the campus tyrant based on the audio and video double modes in real time according to claim 1 is characterized in that the tyrant behavior in the second step is an abnormal behavior of dragging, boxing or kicking, wherein the dragging behavior is calculated and judged based on the space distance of a human frame, and the boxing or kicking is calculated and judged based on the moving speed of a specific part of the human frame in continuous frames.
4. The method for monitoring and alarming on campus flash based on audio and video dual modes according to claim 2 is characterized in that in the second step, when the number of human frames is greater than or equal to 2, if the minimum distance between the geometric centers of any two human frames is calculated to be smaller than a set threshold value and the duration exceeds a drag judgment threshold value, drag alarming is triggered, when the human frames of the front and rear video frames are compared, if the geometric center moving speed of the upper half area of the human frames is calculated to be greater than a punch speed threshold value and the duration exceeds a set value, punch alarming is triggered, and if the geometric center moving speed of the lower half area of the human frames is calculated to be greater than a kick speed threshold value and the duration exceeds a set value, kick alarming is triggered.
5. The method for monitoring and alarming on campus tyrant based on audio and video double modes according to claim 1, wherein in the second step, voice recognition is carried out on audio by utilizing Vosk models, voice recognition is carried out on the audio to be converted into texts, spaces are removed, then the texts are matched with a sensitive word stock, the sensitive word stock is divided into a first-level deadly threat class, a second-level abuse class and a third-level distress class, if the first-level and third-level vocabularies are matched, audio alarming is triggered immediately once the first-level and third-level vocabularies are matched, if the second-level vocabularies are matched, audio alarming is triggered only when the accumulated matching times reach a preset value in a set period, and cooling time for preventing repeated alarming is arranged after the audio alarming is triggered.
6. The method for monitoring and alarming campus tyrant based on audio and video dual modes in real time according to claim 1 is characterized in that in the third step, after triggering an audio and video dual serious alarm, the system enters a state locking mode, an interface executes alarming interaction, and normal state detection is not resumed until a confirmation release instruction of manual review is received.

Description

Campus tyrant real-time monitoring and alarming method based on audio and video double modes Technical Field The invention belongs to the technical field of intelligent security and artificial intelligent computer vision, and particularly relates to a campus tyrant real-time monitoring and alarming method based on audio and video double modes. Background The problem of campus tyrant is widely focused on society for a long time, and the manifestations of the problem are various, including limb conflict, speech abuse, threat of threat, social rejection and the like, and serious injury is caused to physical and mental health of victims. In recent years, with the frequent occurrence of campus security events, the establishment of an effective campus security monitoring system has become an urgent need for education authorities and school administrators. Most of the traditional campus security monitoring systems rely on pure video recording, and highly rely on manual real-time staring at a tray. This monitoring mode has the following obvious drawbacks in practical applications: First, conventional monitoring systems often lack audio acquisition and analysis capabilities and are unable to identify cryptic overlooking behaviors such as speech abuse, scaring, or distress. Because the Baling behavior often has burstiness and concealment, it is difficult to comprehensively capture complete information of Baling events by simply relying on video pictures, and especially when Baling behavior occurs in monitoring dead angles or limb conflicts are not obvious, missing report is easily caused. Secondly, when the traditional behavior detection algorithm faces actions (such as boxing, kicking, charging and the like) with smaller amplitude or shorter duration in fighting, the sensitivity is often insufficient, and serious problems of missing report and hysteresis exist. The existing video analysis technology has limited accuracy rate for identifying rapid and complex human body actions, and is difficult to discover and early warn of a Baling event in the first time. Third, single visual or audible alarms are prone to environmental interference producing false alarms, lacking cross-validation of multi-dimensional information. For example, environmental noise may lead to audio misinterpretation and light and shadow variations may affect the accuracy of the video analysis. The alarm mechanism of single mode can't effectively distinguish the siren action and normal student's alarm, the limbs contact of sports activity or sudden loud noise, lead to the alarm accuracy to be difficult to satisfy the practical application demand. In view of the above, there is a need for a technical scheme for monitoring campus ba in which audio and video multi-mode information can be fused and high-sensitivity behavior recognition capability is provided, so as to make up for the defects of the prior art and practically improve the campus security capability. To this end, the inventors have advantageously devised that the technical solutions described below are created in this context. Disclosure of Invention The invention aims to provide an audio-video dual-mode based campus tyrant real-time monitoring and alarming method, which aims to solve the problems of monitoring lag, low single-mode recognition rate, lack of hidden speech violence detection, high false alarm rate and the like in the prior art. The invention aims to achieve the purpose, namely a campus tyrant real-time monitoring and alarming method based on audio and video double modes, which comprises the following steps: Step one, collecting a video image sequence and a real-time audio stream of a monitoring area; step two, locating a human body frame in a video frame picture through a human body feature extraction algorithm, and carrying out video anomaly alarm if the existence of a B-E behavior is judged based on the human body frame; And thirdly, recording a time stamp for triggering the video abnormal alarm and the audio alarm, and triggering the audio and video dual serious alarm and locking the interface to enter a serious alarm state if the occurrence time difference of the video abnormal alarm and the audio alarm falls into a preset associated time window. In a specific embodiment of the present invention, the human body feature extraction algorithm in the second step is to scale the video frame and then detect the human body frame through HOG and SVM algorithms. In another specific embodiment of the present invention, the above-mentioned tyrant behavior in the second step is an abnormal behavior of dragging, punching, or kicking, wherein the dragging behavior is calculated and judged based on a spatial distance of a human frame, and the punching or kicking behavior is calculated and judged based on a moving speed of a specific part of the human frame in a continuous frame. In a further specific embodiment of the present invention, in the second step, when the number of human frame