CN-121982635-A - Event stream monitoring and early warning method based on multidimensional feature fusion video analysis

CN121982635ACN 121982635 ACN121982635 ACN 121982635ACN-121982635-A

Abstract

The application relates to the technical field of computer vision and discloses a method for monitoring and early warning event stream based on multidimensional feature fusion video analysis, which comprises the steps of obtaining an original video stream, receiving the obtained multipath high-definition original video stream, and processing the original video stream in parallel through a first path and a second path respectively; and outputting interpretable crowd fluid state semantics and risk grades to a user through a four-level early warning decision mechanism based on a comprehensive risk index given by the quantized risk model fused with the multi-dimensional features. The application comprehensively utilizes core visual algorithms such as dense optical flow calculation, crowd density estimation of a deep convolutional neural network, crowd behavior modeling based on statistical fluid mechanics and the like to realize real-time, quantitative perception and dynamic risk early warning on microscopic states of dense crowds in scenes such as large stadiums, event sites and the like.

Inventors

LI WEI

Assignees

青岛理工大学

Dates

Publication Date: 20260505
Application Date: 20260105

Claims (8)

1. The event stream monitoring and early warning method based on multidimensional feature fusion video analysis is characterized by comprising the following steps of: acquiring an original video stream, wherein the original video stream is a high-definition video of an acquired key area of a stadium; receiving the acquired multipath high-definition original video stream, and respectively processing the original video stream in parallel through a first path and a second path; Receiving a low-frame-rate key frame image through a second path, and adopting a high-precision static density estimation method based on optimized CNN, wherein an improved CSRNet network, namely VGG-16, is used as a backbone network to extract a high-level feature map, and feature fusion is carried out through a multi-scale cavity convolution branch to obtain a fusion feature map, and a high-precision density map is obtained through density estimation, wherein the floating point value of each pixel in the high-precision density map represents the crowd density contribution at the position; constructing a quantitative risk model fused with multidimensional features based on the obtained crowd kinetic energy field, crowd entropy field and high-precision density map to realize risk decision; And outputting interpretable crowd fluid state semantics and risk grades to a user through a four-level early warning decision mechanism based on the comprehensive risk index given by the quantitative risk model fused with the multidimensional features.
2. The method of claim 1, wherein the obtaining the original video stream comprises: crowd flow videos are collected through high-definition network cameras deployed at entrances and exits of stadiums, stand channels and squares, the high-definition network cameras serve as intelligent sensing nodes, primary interested areas are cut, and optimized video streams are output.
3. The method of claim 1, wherein the CUDA-accelerated Farneback algorithm comprises: And performing special optimization on Farneback algorithm parameters, namely setting a multi-stage image pyramid, wherein the scaling scale pyr_scale=0.5 of the image pyramid and the layer number levels =3 of the image pyramid so as to capture motion at different speeds, and setting the size winsize=15 of a smoothing window to perform polynomial expansion so as to balance smoothing noise and reserved details.
4. A method according to any one of claims 1-3, wherein calculating a dense optical flow field using a CUDA-accelerated Farneback algorithm and performing a physical index calculation on the optical flow field calculated for each frame to obtain a crowd kinetic energy field and a crowd entropy field comprises: first stage, optimized dense optical flow field construction: Inputting two continuous frames of gray level images, namely gray level frames And gray scale frames Eliminating camera shake by global motion compensation GMC, and obtaining a continuous and dense speed vector field by arranging multi-stage image pyramid and Farneback algorithm accelerated by CUDA , wherein, Respectively representing the movement speed of the pixel point at the position (x, y); and the second stage, real-time calculation of physical indexes: Calculating the kinetic energy field of people : ; Wherein, the Is a density map; Calculating crowd entropy field : In an image local area, the direction angle of an optical flow vector is counted and quantized into 8 direction interval bins to form a direction histogram; ; wherein P represents the probability distribution of the optical flow direction histogram, Indicating that the optical flow direction falls on the first Probability of individual direction intervals; Shannon entropy for calculating probability distribution P , ; Will be Normalizing to obtain ; Wherein S represents normalized shannon entropy, represents entropy value of a local area of an image, the value range is 0,1, Indicating that the optical flow direction falls on the first Probability of each direction interval.
5. The method of claim 4, wherein the extracting the high-level feature map through the improved CSRNet network, that is, through VGG-16 as a backbone network, and performing feature fusion through multi-scale hole convolution branches to obtain a fused feature map, and obtaining a high-precision density map through density estimation includes: Taking a 1fps key frame image as input, performing image preprocessing to obtain a processed image, and inputting the processed image into an improved CSRNet network, wherein the image preprocessing comprises size normalization and perspective correction; In the improved CSRNet network, VGG-16 is used as a main network to perform feature extraction on an input image to obtain a high-level feature map, and the high-level feature map is processed through a plurality of multi-scale cavity convolution branches in parallel; The convolution kernels of the first cavity convolution, the second cavity convolution and the fourth cavity convolution are 2, 4, 8 and 16, features output by the second cavity convolution and the second cavity convolution are subjected to feature fusion, features output by the first cavity convolution and the fourth cavity convolution are subjected to feature fusion, a fusion feature map is obtained, 1×1 convolution and up-sampling operations are performed, and a density map is finally obtained through density estimation and is used as output.
6. The method of claim 5, wherein constructing a quantized risk model incorporating multi-dimensional features based on the obtained crowd kinetic energy field, crowd entropy field, and high precision density map comprises: constructing a risk situation function : ; Wherein, the The density is normalized for the region and, In order to normalize the kinetic energy, Is the entropy value Is used as a mapping function of the (c), Is a weight coefficient adjusted according to the scene.
7. The method of claim 6, wherein the four-level early warning decision mechanism comprises: Grade IV, attention grade, namely, density is increased but order is good, and the manager is reminded of attention; III, warning grade, namely high density, slow movement, early warning of congestion and suggestion of starting a dredging plan; Level II, severity level, namely, the kinetic energy is rapidly increased and the directions are highly consistent, and the possible impact or escape is early warned, and the immediate verification and the intervention preparation are needed; And the level I, critical level, namely extremely high density, huge kinetic energy and extremely unordered, is judged to be a trampling critical point, and automatically triggers the highest level linkage response, including video locking, forced cutting broadcasting and security personnel notification.
8. The event stream monitoring and early warning system based on the multidimensional feature fusion video analysis is characterized by comprising: the front-end perception layer is provided with an acquisition module for acquiring an original video stream, wherein the original video stream is a high-definition video of an acquired key area of a stadium; the edge calculation layer is used for receiving the acquired multipath high-definition original video stream, and processing the original video stream in parallel through a first path and a second path respectively, is provided with a fluid dynamics modeling module based on improved Farneback dense optical flow, and is used for calculating a dense optical flow field through the first path by adopting a Farneback algorithm accelerated by CUDA and carrying out physical index calculation on the optical flow field calculated by each frame to obtain a crowd kinetic energy field and a crowd entropy field; The cloud end is provided with a high-precision static density estimation module based on optimized CNN, and is used for receiving a low-frame-rate key frame image through a second path, and a high-precision static density estimation method based on optimized CNN is adopted, wherein a high-level feature map is extracted through an improved CSRNet network, namely through VGG-16 as a main network, feature fusion is carried out through multi-scale cavity convolution branches, a fusion feature map is obtained, a high-precision density map is obtained through density estimation, and floating point values of each pixel in the high-precision density map represent crowd density contributions at the position; The cloud end is further provided with a multidimensional risk fusion and early warning decision module based on phase plane analysis, and the multidimensional risk fusion and early warning decision module is used for constructing a quantitative risk model fused with multidimensional features based on the obtained crowd kinetic energy field, crowd entropy field and high-precision density map to realize risk decision; And outputting interpretable crowd fluid state semantics and risk grades to a user through a four-level early warning decision mechanism based on the comprehensive risk index given by the quantitative risk model fused with the multidimensional features by using a display and linkage layer.

Description

Event stream monitoring and early warning method based on multidimensional feature fusion video analysis Technical Field The application belongs to the technical field of computer vision and intelligent video analysis, and particularly relates to an event stream monitoring and early warning method based on multidimensional feature fusion video analysis. Background Currently, security monitoring for intensive scenes such as large-scale sports events and the like mainly depends on traditional video monitoring and back-end analysis based on computer vision. The method has the obvious defects of three aspects, namely, data hysteresis, the fact that the instantaneous detention crowd density of a key area cannot be sensed in real time depending on gate counting or polling type video analysis, the fact that state sensing is lost, most methods can only count people or identify preset specific abnormal behaviors (such as running and falling) and lack quantitative description and understanding capability on the overall physical motion state (such as ordered flowing, chaotic turbulence and static congestion) of the crowd, and the fact that a passive response mode is difficult for a system to send forward looking early warning based on a physical model before safety accidents (such as trampling) occur due to insufficient sensing depth and instantaneity is insufficient. It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art. Disclosure of Invention In order to solve or at least alleviate one or more of the problems, the method for monitoring and early warning the event stream based on multidimensional feature fusion video analysis is provided, and can realize synchronous, real-time and quantitative perception of the density of the stream of people in a microscopic region, the kinetic energy of macroscopic population and the degree of order (entropy) in a venue on an end-side-cloud cooperative architecture by fusing static density estimation of a deep Convolutional Neural Network (CNN) and dynamic hydrodynamic modeling based on dense optical flow, and construct an interpretable risk assessment model according to the synchronous, real-time and quantitative perception, so as to achieve accurate and second-level automatic identification and early warning of high-risk states such as congestion, turbulence, opposite impact, escape and the like, thereby establishing an active intelligent public security control system capable of early warning, in-process research and retroactive traceability. To achieve the above object, according to a first aspect of the present application, there is provided an event stream monitoring and early warning method based on multidimensional feature fusion video analysis, including: acquiring an original video stream, wherein the original video stream is a high-definition video of an acquired key area of a stadium; receiving the acquired multipath high-definition original video stream, and respectively processing the original video stream in parallel through a first path and a second path; Receiving a low-frame-rate key frame image through a second path, and adopting a high-precision static density estimation method based on optimized CNN, wherein an improved CSRNet network, namely VGG-16, is used as a backbone network to extract a high-level feature map, and feature fusion is carried out through a multi-scale cavity convolution branch to obtain a fusion feature map, and a high-precision density map is obtained through density estimation, wherein the floating point value of each pixel in the high-precision density map represents the crowd density contribution at the position; constructing a quantitative risk model fused with multidimensional features based on the obtained crowd kinetic energy field, crowd entropy field and high-precision density map to realize risk decision; And outputting interpretable crowd fluid state semantics and risk grades to a user through a four-level early warning decision mechanism based on the comprehensive risk index given by the quantitative risk model fused with the multidimensional features. To achieve the above object, according to a second aspect of the present application, there is provided an event stream monitoring and early warning system based on multidimensional feature fusion video analysis, the event stream monitoring and early warning system comprising: the front-end perception layer is provided with an acquisition module for acquiring an original video stream, wherein the original video stream is a high-definition video of an acquired key area of a stadium; the edge calculation layer is used for receiving the acquired multipath high-definition original video stream, and processing the original video stream in parallel through a firs