CN-121999015-A - Police target tracking system based on deep learning

CN121999015ACN 121999015 ACN121999015 ACN 121999015ACN-121999015-A

Abstract

The invention discloses a target tracking system for police based on deep learning, and relates to the field of police video monitoring. The system comprises a video sensing and front-end processing unit, an edge intelligent tracking server and a cloud model optimizing and commanding platform, wherein the video sensing and front-end processing unit processes visible light and thermal infrared video streams through collecting the visible light and thermal infrared video streams, the edge intelligent tracking server generates a stable track with an identity mark, and the cloud model optimizing and commanding platform aggregates all edge data based on a federal learning framework, iteratively optimizes a global model and achieves commanding and dispatching. The system effectively improves the robustness and the instantaneity of tracking under the complex environment and the accuracy of maintaining the multi-target identity, and meets the actual combat demands of police service.

Inventors

Zhu Yuchengxi
HU CHUN
XU YIBO
CHEN QI
BAI RIGUANG

Assignees

头流(杭州)网络科技有限公司
昆奇智能科技(浙江)有限公司

Dates

Publication Date: 20260508
Application Date: 20260408

Claims (6)

1. The police target tracking system based on deep learning is characterized by comprising a video sensing and front-end processing unit, intelligent tracking servers at all edges and a cloud model optimizing and commanding platform; the video sensing and front-end processing unit is deployed in a key monitoring area, integrates a multispectral image sensor and a preprocessing module, acquires an original multispectral video stream composed of a visible light video stream and a thermal infrared video stream, processes the original multispectral video stream, and outputs the processed multispectral video stream; Each edge intelligent tracking service is connected with the video perception and front-end processing unit through a network, receives the processed multispectral video stream and generates an optimal tracking instruction set; The cloud model optimizing and commanding platform is deployed in a remote data center and is in bidirectional communication with each edge intelligent tracking server through a safety private network, the platform is responsible for maintaining a global model, performing iterative optimization and issuing on the global model based on a federal learning framework, the model comprises a lightweight self-adaptive detection module and core network parameters in a multi-target tracking module based on space-time attention, gathering tracking data, difficult samples and performance indexes from a plurality of edge nodes, performing incremental training and optimization based on the global model, and providing global situation visualization, tracking task issuing, police dispatch and collaborative command functions.
2. The deep learning-based police target tracking system of claim 1, wherein the integrated modules of the intelligent edge tracking servers function as follows: The multispectral feature fusion module is used for carrying out space-time alignment and registration on the processed multispectral video stream, and generating a multi-mode target feature tensor to be tracked by adopting a cross-mode feature fusion network guided by the channel attention; The light-weight self-adaptive detection module takes a multi-mode target feature tensor to be tracked as input, takes a light-weight convolutional neural network generated based on a neural architecture search technology as a backbone network, and combines a self-adaptive spatial feature pyramid to output a target boundary frame sequence and initial class confidence thereof; The multi-target tracking module based on space-time attention receives a target boundary frame sequence and initial category confidence coefficient thereof, correlates with a historical tracking track, calculates the correlation degree between a current frame detection target and the historical track through an appearance attention mechanism and a motion attention mechanism, optimally matches by adopting a Hungary algorithm, realizes the consistent tracking of identity, and outputs a stable target track sequence with a unique identity mark; And the tracking instruction scheduling module takes a stable target track sequence with a unique identity as input, and generates an optimal tracking instruction set according to a preset police plan rule base and by combining with behavior analysis of the track sequence.
3. The deep learning-based police target tracking system according to claim 2, wherein the multispectral feature fusion module processes the multispectral video stream by using a channel attention-guided cross-modal feature fusion network, and generates a multi-modal target feature tensor to be tracked, and specifically comprises the following steps: step S1, synchronizing hardware time stamps of a visible light video stream and a thermal infrared video stream, and adopting an image registration algorithm based on SIFT feature points and an affine transformation model to realize pixel-level space alignment of double-flow images and generate aligned multispectral video frame pairs; S2, taking the aligned multispectral video frame pairs as input, constructing a channel attention-guided cross-modal feature fusion network, and respectively extracting a visible light primary feature map and a thermal infrared primary feature map through two convolution neural networks which share structures and are independent in parameters; Step S3, inputting the visible light primary feature map and the thermal infrared primary feature map into a feature fusion core, implementing a dual-path channel attention mechanism, respectively calculating channel weight vectors of the two feature maps and completing weighting treatment to obtain weighted bimodal feature maps, and splicing the bimodal feature maps in the channel dimension to obtain a spliced feature map; s4, taking the spliced feature map as input, sending the spliced feature map into a lightweight Transformer cross attention layer, and enabling the layer to interact and integrate feature information from a visible light primary feature map and a thermal infrared primary feature map in the spliced feature map, and outputting a feature map after deep fusion; And S5, taking the fusion feature map as input, performing dimension reduction and refining through a convolution layer, and finally outputting the multi-mode target feature tensor to be tracked.
4. The deep learning-based police target tracking system according to claim 2, wherein the lightweight adaptive detection module processes the multi-mode target feature tensor to be tracked and outputs a target bounding box sequence and its initial class confidence level, and specifically comprises the following steps: Step D1, constructing a lightweight convolutional neural network backbone network based on a neural architecture search technology, and extracting a multi-level general feature map by taking a multi-mode target feature tensor to be tracked as an input; step D2, introducing a self-adaptive spatial feature pyramid structure, receiving a multi-level general feature map, dynamically generating a spatial weight map according to input contents through a weight prediction network, applying the spatial weight map to a fusion process of different-level feature maps, and outputting an enhanced feature map; step D3, respectively accessing a classification sub-network and a regression sub-network on the enhanced feature map to obtain a rough predicted dense boundary frame; and D4, screening the dense boundary boxes by adopting a non-maximum suppression algorithm, and outputting a target boundary box sequence and the corresponding initial category confidence coefficient.
5. The deep learning-based police target tracking system according to claim 2, wherein the process of associating the target bounding box sequence with the historical tracking trajectory and outputting the stable target trajectory sequence with the unique identity based on the multi-target tracking module of the space-time attention, specifically comprises the following steps: Step T1, predicting the position state of a current frame by adopting a Kalman filter based on a uniform motion model aiming at a historical tracking track, screening a new unmatched target boundary frame from a target boundary frame sequence, initializing a new track for the new target boundary frame, and distributing a global unique identity; step T2, extracting depth appearance characteristics of each target boundary frame of the current frame, comparing the depth appearance characteristics with appearance characteristics cached by the historical tracking track, and constructing an appearance similarity matrix through cosine similarity calculation; Calculating an intersection ratio with the current detection target boundary frame position based on the Kalman filtering predicted track position, predicting a short-time motion trend through a small-sized cyclic neural network based on a target historical track point, and carrying out weighted fusion on the trend and a result predicted by a Kalman filter to obtain an accurate motion prediction frame so as to construct a motion association matrix; Step T4, linearly weighting and fusing the appearance similarity matrix and the motion relevance matrix according to preset weights to generate a comprehensive relevance matrix, and solving the matrix by adopting a Hungary algorithm to obtain an optimal matching pair of a current detection target and an existing track, and a non-matched detection target and a non-matched historical track list; And T5, updating the Kalman filter state and the characteristic cache of the corresponding track of the new position and the appearance characteristic of the detection target corresponding to the optimal matching pair, initializing the new track of the unmatched detection target, setting the track which is not matched in a continuous preset frame number as invalid, and outputting a stable target track sequence with a unique identity.
6. The deep learning-based police target tracking system according to claim 1, wherein the cloud model optimizing and commanding platform performs incremental training and optimizing on the core network based on the federal learning framework, and specifically comprises the following steps: step F1, initializing a cloud model optimization and command platform and issuing a global model to each edge intelligent tracking server; Step F2, collecting difficult samples by each edge intelligent tracking server in the local operation process; step F3, the edge node performs incremental training on the global model downloaded from the cloud model optimization and command platform by using the locally cached difficult sample set; Step F4, after the incremental training is finished, the edge node calculates the update quantity of the local model parameters and encrypts and uploads the update quantity to the cloud model optimization and command platform; Step F5, collecting the encryption updating quantity from the edge nodes and the corresponding sample quantity by the cloud model optimization and command platform, carrying out weighted aggregation, and calculating the global model updating quantity after aggregation; and F6, applying the aggregated global model updating quantity to current global model parameters to generate a new generation global model, and transmitting the new generation global model to each edge intelligent tracking server after evaluation.

Description

Police target tracking system based on deep learning Technical Field The invention relates to the field of police video monitoring, in particular to a police target tracking system based on deep learning. Background The existing police target tracking system is mostly dependent on a fixed camera network and a centralized cloud server processing architecture. When a target moves fast or enters a monitoring blind area, the system is easy to cause tracking loss due to communication delay and calculation force bottleneck, response is delayed, and the requirements of police real-time pursuit and control distribution cannot be met. The tracking algorithm mostly adopts a single visible light video stream, and the characteristic extraction capability is drastically reduced and the robustness is insufficient in complex environments such as night, foggy days, strong backlight or target partial shielding. Meanwhile, the existing system mostly adopts a pre-trained general target detection model, and cannot perform self-adaptive optimization aiming at specific characteristics such as changeable dressing, gesture, carrying objects and the like in police scenes, so that the model generalization capability is limited, and the false alarm and missing report rate is high. In addition, under the dense crowd or multi-target staggered scene, the existing system lacks an efficient identity maintenance and track management mechanism, and identity jump (ID Switch) is easy to occur, so that targets are mixed up, and tracking continuity is poor. Therefore, it is necessary to design a police target tracking system integrating multispectral sensing, edge intelligent computing and cloud model optimization, so as to realize high-robustness, high-real-time and high-precision target continuous tracking in complex scenes, and improve the target searching and controlling efficiency in police actual combat. Disclosure of Invention The invention provides a deep learning-based police target tracking system, which aims at solving the problems of poor environmental adaptability, insufficient tracking instantaneity, weak model generalization capability and disordered multi-target identity management of the existing system, and constructs a cooperative framework of 'front-end multispectral perception-edge intelligent tracking-cloud optimization iteration', so as to realize comprehensive extraction of target features, real-time generation of tracking decisions and continuous evolution of model capability. The invention provides a police target tracking system based on deep learning, which comprises a video sensing and front-end processing unit, intelligent tracking servers at all edges and a cloud model optimizing and commanding platform; the video sensing and front-end processing unit is deployed in a key monitoring area, integrates a multispectral image sensor and a preprocessing module, acquires an original multispectral video stream composed of a visible light video stream and a thermal infrared video stream, processes the original multispectral video stream, and outputs the processed multispectral video stream; Each edge intelligent tracking server is connected with the video perception and front-end processing unit in a network manner, receives the processed multispectral video stream and generates an optimal tracking instruction set; The multispectral feature fusion module is used for carrying out space-time alignment and registration on the processed multispectral video stream, and generating a multi-mode target feature tensor to be tracked with strong discrimination by adopting a cross-mode feature fusion network guided by the channel attention; The lightweight self-adaptive detection module takes the multi-mode target feature tensor to be tracked as input, adopts a lightweight convolutional neural network generated based on a neural architecture search technology as a backbone network, combines a self-adaptive spatial feature pyramid to realize accurate detection of targets with different scales, and outputs a target boundary frame sequence and initial category confidence thereof; The multi-target tracking module based on space-time attention receives a target boundary box sequence and initial category confidence coefficient thereof, correlates with a historical tracking track, calculates appearance similarity of a current detection target and the historical tracking track through an appearance attention mechanism, predicts position correlation of the track and the current detection through a motion attention mechanism based on a Kalman filter, fuses two attention scores, performs optimal matching through a Hungary algorithm, completes correlation and identity maintenance of new and old targets, and outputs a stable target track sequence with unique identity identification; The tracking instruction scheduling module takes a stable target track sequence with a unique identity mark as input, generates control instructions aiming at differen