Search

CN-122023818-A - Multi-element example decoupling crowd counting method and system based on density grade segmentation

CN122023818ACN 122023818 ACN122023818 ACN 122023818ACN-122023818-A

Abstract

The application discloses a multi-element example decoupling crowd counting method and system based on density grade segmentation, and belongs to the technical field of crowd counting. The method comprises the steps of obtaining a target image, inputting the target image into a segmentation mask generator which is trained and generated through a Gaussian distribution mask generator to obtain a density grade mask, inputting the target image into a multi-element sample counting module to obtain a frame position information graph, a point position information graph and a density position information graph, inputting the density grade mask, the frame position information graph, the point position information graph and the density position information graph into a result fusion module to obtain a composite result graph, and summing based on the composite result graph to obtain the total number of people in the target image. The method improves crowd counting accuracy.

Inventors

  • HU LONG
  • WANG RUI
  • GUO YUQI
  • HAO YIXUE
  • LI XIANZHI

Assignees

  • 华中科技大学

Dates

Publication Date
20260512
Application Date
20251219

Claims (10)

  1. 1. A method for counting a plurality of exemplary decoupling groups based on density level segmentation, the method comprising: acquiring a target image, and inputting the target image into a segmentation mask generator trained and generated through a Gaussian distribution mask generator to obtain a density grade mask, wherein the segmentation mask generator comprises an image encoder and a density grade segmentation module; inputting the target image into a multi-element sample counting module to obtain a frame position information diagram, a point position information diagram and a density position information diagram; Inputting the density level mask, the frame position information graph, the point position information graph and the density position information graph into a result fusion module to obtain a composite result graph; And summing based on the composite result graph to obtain the total number of people in the target image.
  2. 2. The method of density level segmentation based on multivariate sample decoupling crowd counting of claim 1, wherein the inputting the target image into the segmentation mask generator trained by the gaussian distribution mask generator to obtain the density level mask comprises: Inputting a target image into an image encoder to perform image encoding and position encoding to obtain encoding characteristics, wherein the image encoder comprises a block embedded transducer encoding module and a position encoding module; and inputting the coding features into a density grade segmentation module to carry out multi-stage feature enhancement on the coding features, so as to obtain a density grade mask.
  3. 3. The method of density level segmentation based multi-element exemplary decoupling crowd counting according to claim 2, wherein the inputting the target image into the image encoder for image encoding and position encoding to obtain the encoding features comprises: Inputting a target image into a block embedding transform coding module, and dividing the image by block embedding to obtain a plurality of image blocks; Flattening the plurality of image blocks to obtain a plurality of image block vectors; position coding is carried out on a plurality of image block vectors to obtain an image block sequence; And inputting the image block sequence into a plurality of transducer layers, and extracting global and local features of the image block sequence through MHSA and a feedforward neural network to obtain coding features.
  4. 4. The method of claim 2, wherein inputting the encoded features into the density level segmentation module for multi-stage feature enhancement of the encoded features to obtain the density level mask comprises: inputting the coding features into a transposition convolution module for upsampling, and recovering the spatial dimension of the coding features to obtain preliminary upsampling features; inputting the preliminary upsampling features into an activation layer to further extract and refine local context features to obtain local enhancement features; Inputting the local enhancement features into a multi-head self-attention module to perform global dependency modeling on the two-dimensional space dimension of the local enhancement features, and capturing context information to obtain global perception features; and inputting the global perception feature into a two-dimensional convolution feature mapping module, mapping channel dimensions, and predicting density level to obtain a density level mask.
  5. 5. The method for counting a plurality of exemplary decoupling groups based on density level segmentation according to claim 1, wherein the inputting the target image into the plurality of exemplary counting modules to obtain the frame position information map, the point position information map, and the density position information map comprises: inputting a target image into a frame detection module, extracting features through a first feature extractor to obtain a plurality of feature images with different scales, respectively inputting the feature images with different scales into a top-down feature modulation module, and integrating the feature images with different scales through a scale feedback mechanism to obtain a frame position information image; inputting the target image into a point positioning module, extracting the point positioning feature image through a second feature extractor, up-sampling the point positioning feature image, and inputting the up-sampled point positioning feature image into a convolution module for deep convolution to obtain a point position information image; Inputting the target image into a density map generating module, performing feature extraction and focus density map regression head up-sampling and focus positioning through a third feature extractor to obtain a density position information map; The multi-element example counting module comprises a frame detection module, a point positioning module and a density map generation module, wherein the density map generation module comprises a third feature extractor and a focus density map regression head.
  6. 6. The method for counting a plurality of exemplary decoupling groups based on density level segmentation according to claim 5, wherein the inputting the target image into the density map generating module, performing feature extraction and up-sampling and focus positioning by the focus density map regression head through the third feature extractor, and obtaining the density position information map comprises: Inputting the target image into a density map generating module, and extracting features through a third feature extractor to obtain a first multi-scale feature map, a second multi-scale feature map, a third multi-scale feature map and a fourth multi-scale feature map; inputting the first multi-scale feature map, the second multi-scale feature map, the third multi-scale feature map and the fourth multi-scale feature map into a transition layer for feature fusion to obtain a multi-scale fusion feature map; Inputting the multi-scale fusion feature map into a convolution layer and a normalization layer to carry out depth convolution and normalization to obtain a first convolution feature map; inputting the first convolution feature map into an activation layer for activation to obtain a second convolution feature map; and inputting the second convolution characteristic diagram into a two-dimensional transposition convolutional layer for transposition, and extracting a candidate local maximum value to obtain a density position information diagram.
  7. 7. The method for counting a plurality of exemplary decoupled populations based on density level segmentation according to claim 1, wherein inputting the density level mask, the frame location information map, the point location information map, and the density location information map into the result fusion module to obtain the composite result map comprises: representing the density level mask as a box binary mask, a dot binary mask, and a density binary mask according to the density region size; Element multiplication is carried out on the frame binary mask and the frame position information graph to obtain a filtered frame position information graph; Element multiplication is carried out on the point binary mask and the point position information graph to obtain a filtered point position information graph; Element multiplication is carried out on the density binary mask and the density position information graph to obtain a filtered density position information graph; and carrying out pixel summation on the filter frame position information diagram, the filter point position information diagram and the filter density position information diagram to obtain a composite result diagram.
  8. 8. A density-class-segmentation-based multi-instance decoupled population count system implemented using the density-class-segmentation-based multi-instance decoupled population count method of any one of claims 1-7, the system comprising: the acquisition module is used for acquiring a target image, inputting the target image into a segmentation mask generator which is trained and generated through a Gaussian distribution mask generator to obtain a density grade mask, wherein the segmentation mask generator comprises an image encoder and a density grade segmentation module; The processing module is used for inputting the target image into the multi-element example counting module to obtain a frame position information diagram, a point position information diagram and a density position information diagram; The fusion module is used for inputting the density level mask, the frame position information diagram, the point position information diagram and the density position information diagram into the result fusion module to obtain a composite result diagram; And the counting module is used for summing based on the composite result graph to obtain the total number of people in the target image.
  9. 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the density level segmentation based multivariate sample decoupling population count method of any one of claims 1 to 7 when the program is executed by the processor.
  10. 10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the density level segmentation based multivariate sample decoupling population count method of any one of claims 1 to 7.

Description

Multi-element example decoupling crowd counting method and system based on density grade segmentation Technical Field The application belongs to the technical field of crowd counting, and particularly relates to a multi-element example decoupling crowd counting method and system based on density grade segmentation. Background The crowd counting is used as a core research topic in the field of computer vision, and has increasing application value in scenes such as public safety monitoring, large-scale activity management, emergency response and the like. Along with the acceleration of the urban process and the normalization of the crowd gathering phenomenon in public places, the intelligent technology is utilized to accurately estimate and analyze the crowd density, and the method has become a key means for preventing congestion and treading, optimizing resource scheduling and improving urban management efficiency. The existing research mostly adopts a direct regression method of the number of people, and the method constructs a direct mapping relation between the global features of the image and the total number of people, but has larger limitation in practical application because the method can not provide space distribution information and is easy to be interfered by complex background. In order to improve the counting precision and obtain richer scene information, three main stream technical examples are derived, namely frame detection, point positioning and density map generation. With the complexity of application scenes, the single-instance technical scheme still has the following defects that firstly, a monitoring image in the real world often has a remarkable visual angle deformation effect, a plurality of areas with extremely large crowd density difference exist in the same picture at the same time, the single-instance counting is difficult to effectively cope with the situation of uneven density at the same time, secondly, different instances of counting have a problem of eliminating the difference between the capability of coping with different density scenes and the richness of the position information provided by the different instances of counting, and thirdly, the existing density sensing methods usually only stay in adapting different model parameters or branches for the same instance of counting although the crowding degree of the model sensing area is tried in a mode of generating a density mask through an attention mechanism, and the inherent limitation of the single instance is not fundamentally solved. The existing crowd counting method mostly adopts a single counting example (such as density map regression, point positioning or frame detection), and is difficult to simultaneously adapt to the change of areas with different densities in an image. The method has the problems of low counting accuracy, insufficient adaptability, large calculated amount and the like. Disclosure of Invention The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides a multi-element example decoupling crowd counting method and system based on density grade segmentation, and the method improves the accuracy of crowd counting. In a first aspect, the present application provides a method for counting a plurality of exemplary decoupling populations based on density level segmentation, the method comprising: acquiring a target image, and inputting the target image into a segmentation mask generator trained and generated through a Gaussian distribution mask generator to obtain a density grade mask, wherein the segmentation mask generator comprises an image encoder and a density grade segmentation module; inputting the target image into a multi-element sample counting module to obtain a frame position information diagram, a point position information diagram and a density position information diagram; Inputting the density level mask, the frame position information graph, the point position information graph and the density position information graph into a result fusion module to obtain a composite result graph; And summing based on the composite result graph to obtain the total number of people in the target image. According to one embodiment of the present application, the inputting the target image into the segmentation mask generator trained by the gaussian distribution mask generator to obtain the density level mask includes: Inputting a target image into an image encoder to perform image encoding and position encoding to obtain encoding characteristics, wherein the image encoder comprises a block embedded transducer encoding module and a position encoding module; and inputting the coding features into a density grade segmentation module to carry out multi-stage feature enhancement on the coding features, so as to obtain a density grade mask. According to one embodiment of the present application, the inputting the target image in