CN-121982744-A - Pedestrian detection method and apparatus, device, storage medium, and program product with countermeasure capability

CN121982744ACN 121982744 ACN121982744 ACN 121982744ACN-121982744-A

Abstract

The disclosure relates to a pedestrian detection method and device with countermeasure capability, equipment, a storage medium and a program product, wherein the method comprises the steps of converting a target image into a gray level image, detecting a countermeasure area of the gray level image in a frequency domain to obtain a plurality of groups of countermeasure area mask images of n-level frequency bands in the frequency domain, using the countermeasure mask images of any groups of any-level frequency bands to indicate the countermeasure area of the gray level image in the frequency domain, respectively fusing the countermeasure area mask images of the same group of n-level frequency bands in the plurality of groups of n-level frequency bands to obtain a plurality of target mask images, performing fuzzy processing on the countermeasure area in the target image based on each target mask image to obtain a plurality of preprocessed images, respectively performing pedestrian detection on the plurality of preprocessed images through a pedestrian detection model, and performing non-maximum suppression fusion on detection results to obtain a target detection result. Therefore, various challenge attacks can be effectively defended, and the challenge defending capability of pedestrian detection is improved.

Inventors

HU XIAOLIN
ZHANG WEI

Assignees

清华大学

Dates

Publication Date: 20260505
Application Date: 20260107

Claims (10)

1. A pedestrian detection method, characterized by comprising: acquiring a target image to be detected, and converting the target image into a gray level image; Obtaining a plurality of groups of countermeasure area mask patterns of n-level frequency bands in a frequency domain by detecting the countermeasure area of the gray level image in the frequency domain, wherein the countermeasure area mask patterns of any level frequency band of any group are used for indicating the countermeasure area of the gray level image in the frequency domain under the level frequency band, the countermeasure area represents the area for resisting pedestrian detection, and the values of n in the countermeasure area mask patterns of different groups of n-level frequency bands are different; respectively fusing the countermeasure area mask maps of the same n-level frequency bands in the n-level frequency bands to obtain a plurality of target mask maps, and performing fuzzy processing on the countermeasure area in the target image based on each target mask map to obtain a plurality of preprocessed images, wherein the target mask maps are used for indicating the countermeasure area in the target image; And respectively carrying out pedestrian detection on the plurality of preprocessed images through a pedestrian detection model to obtain a target detection result, wherein the target detection result is used for indicating the area where the pedestrian is located in the target image.
2. The method according to claim 1, wherein the obtaining the challenge area mask map of the plurality of n-level frequency bands in the frequency domain by performing challenge area detection on the gray map in the frequency domain includes: for each n value in a plurality of preset n values, carrying out n-level discrete wavelet transformation on the gray level map to obtain a plurality of groups of n-level frequency band spectrum detail coefficient maps in a frequency domain, wherein the frequency spectrum detail coefficient map of any level of frequency band comprises a horizontal detail coefficient map, a vertical detail coefficient map and a diagonal detail coefficient map; For the spectrum detail coefficient graphs of the kth-level frequency bands in the plurality of groups of n-level frequency bands, stacking the spectrum detail coefficient graphs of each level of frequency bands in the kth-level frequency bands into three-channel feature graphs, obtaining a spectrum intensity graph of a single channel of each level of frequency bands in the kth-level frequency bands by calculating the L2 norm of the three-channel feature graphs in the channel dimension, and carrying out smoothing treatment on the spectrum intensity graph of the single channel of each level of frequency bands in the kth-level frequency bands to obtain a target spectrum intensity graph of each level of frequency bands in the kth-level frequency bands; And obtaining a countermeasure region mask map of each level of frequency bands in the kth set of n level frequency bands by considering a region with intensity greater than a preset threshold in the target spectrum intensity map of each level of frequency bands in the kth set of n level frequency bands as a countermeasure region, wherein the countermeasure region mask map of each level of frequency bands in the kth set of n level frequency bands comprises a countermeasure region mask map from the 1 st level frequency band to the nth level frequency band, and the frequency of the 1 st level frequency band is higher than the frequency of the nth level frequency band.
3. The method according to claim 1, wherein the obtaining a plurality of target mask patterns by respectively fusing the countermeasure area mask patterns of the same set of n-level frequency bands of the plurality of sets of n-level frequency bands includes: For the countermeasure area mask map of any group of n-level frequency bands, obtaining an initial mask map of the group of n-level frequency bands with the same size as the target image by performing spatial up-sampling processing on the countermeasure area mask map of each level of frequency bands in the group of n-level frequency bands; and performing superposition processing or union processing on the initial mask map of the n-level frequency bands to obtain a target mask map.
4. The method according to claim 1, wherein the step of performing pedestrian detection on the plurality of preprocessed images by the pedestrian detection model to obtain the target detection result includes: respectively carrying out pedestrian detection on the plurality of preprocessed images by utilizing a pedestrian detection model to obtain respective initial detection results of the plurality of preprocessed images, wherein the initial detection result of any preprocessed image is used for indicating the area where a pedestrian is in the preprocessed image by adopting one or more prediction frames; and performing non-maximum value inhibition processing after overlapping initial detection results of the plurality of preprocessed images to obtain the target detection result.
5. The method according to any one of claims 1 to 4, wherein the pedestrian detection model is a pedestrian detection model obtained using countermeasure training.
6. The method of any one of claims 1 to 4, wherein the value of n is in the range of 1≤n≤log [ min (H 0 ,W 0 )],H 0 represents the height of the target image, W 0 represents the width of the target image), min (H 0 ,W 0 ) represents the minimum of the height and width of the target image, and log [ min (H 0 ,W 0 ) ] represents the logarithmic value of the minimum of the height and width of the target image.
7. A pedestrian detection apparatus characterized by comprising: The acquisition conversion module is used for acquiring a target image to be detected and converting the target image into a gray level image; the countermeasure area detection module is used for detecting the countermeasure area of the gray level map in the frequency domain to obtain a plurality of groups of countermeasure area mask maps of n-level frequency bands in the frequency domain, wherein the countermeasure mask maps of any-level frequency bands in any group are used for indicating the countermeasure area of the gray level map in the frequency domain under the level frequency bands, the countermeasure area represents the pedestrian detection countermeasure area, and the values of n in the countermeasure area mask maps of different groups of n-level frequency bands are different; The fusion blurring processing module is used for respectively fusing the countermeasure area mask graphs of the same group of n-level frequency bands in the plurality of groups of n-level frequency bands to obtain a plurality of target mask graphs, and blurring processing is carried out on the countermeasure area in the target image based on each target mask graph to obtain a plurality of preprocessed images, wherein the target mask graph is used for indicating the countermeasure area in the target image; The pedestrian detection module is used for respectively detecting pedestrians in the preprocessed images through a pedestrian detection model to obtain a target detection result, and the target detection result is used for indicating the area where the pedestrians are located in the target image.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1 to 6.
9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the method according to any of claims 1 to 6.
10. A computer program product comprising a computer program, or a non-transitory computer readable storage medium carrying a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.

Description

Pedestrian detection method and apparatus, device, storage medium, and program product with countermeasure capability Technical Field The present disclosure relates to the field of object detection, and more particularly, to a pedestrian detection method and apparatus, device, storage medium, and program product with countermeasure capability. Background Pedestrian detection refers to accurately identifying pedestrians from complex scenes by using a computer algorithm. Current pedestrian detection algorithms are mainly based on deep learning algorithms, and thus face challenges against attacks, severely threatening the safety and reliability of pedestrian detectors. In particular, there are two types of challenge attacks, patch and texture, which typically make challenge samples that can be physically implemented in the real world. The physically realizable nature of these challenge samples makes them a real threat to safety critical applications such as automated monitoring and pedestrian detection in automated driving systems, as an attacker can either hold a piece of cardboard (i.e. patch-type challenge sample) or wear a challenge garment (i.e. textured challenge sample) to fool the detection system. The anti-attack mode adopted in the existing pedestrian detection system mainly aims at the patch type anti-attack with a relatively fixed size and shape, and the anti-attack effect of the texture type anti-attack with a variable size and shape is poor, so that the reliability and the accuracy of the existing pedestrian detection system are low. Disclosure of Invention In view of this, the present disclosure proposes a pedestrian detection method and apparatus, device, storage medium and program product with anti-defense capability, which can effectively defend against various types of attack (especially, texture type attack), thereby improving the anti-defense capability of pedestrian detection, that is, improving the accuracy and reliability of pedestrian detection. According to one aspect of the disclosure, a pedestrian detection method is provided, and the pedestrian detection method comprises the steps of obtaining a target image to be detected, converting the target image into a gray level image, detecting a countermeasure area of a plurality of groups of n-level frequency bands in a frequency domain through the gray level image, obtaining a countermeasure area mask image of any group of n-level frequency bands in the frequency domain, wherein the countermeasure area mask image of any group of any-level frequency band is used for indicating the countermeasure area of the gray level image in the frequency domain, the countermeasure area characterizes the areas for resisting pedestrian detection, n in the countermeasure area mask images of different groups of n-level frequency bands are different in numerical value, respectively fusing the countermeasure area mask images of the same group of n-level frequency bands in the plurality of groups of n-level frequency bands, obtaining a plurality of target mask images, and carrying out fuzzy processing on the countermeasure area in the target image based on each target mask image, wherein the target mask image is used for indicating the countermeasure area in the target image, and the pedestrian detection is respectively carried out on the plurality of preprocessed images through a pedestrian detection model, so that a target detection result is used for indicating the area in the target image. In one possible implementation manner, the method comprises the steps of detecting the countermeasure region of the gray level map in the frequency domain to obtain a plurality of sets of countermeasure region mask maps of n-level frequency bands in the frequency domain, performing n-level discrete wavelet transform on the gray level map to obtain a spectrum detail coefficient map of a plurality of sets of n-level frequency bands in the frequency domain, wherein the spectrum detail coefficient map of any one of the n-level frequency bands comprises a horizontal detail coefficient map, a vertical detail coefficient map and a diagonal detail coefficient map, stacking the spectrum detail coefficient map of each of the n-level frequency bands in the k-level frequency band into a three-channel feature map for the spectrum detail coefficient map of the n-level frequency band in the k-level frequency band in the frequency domain, calculating the L2 norm of the three-channel feature map in the channel dimension to obtain a spectrum intensity map of a single channel of each of the n-level frequency band in the k-level frequency band, performing smoothing processing on the spectrum detail coefficient map of each of the n-level frequency band in the k-level frequency band in the frequency domain to obtain a target intensity map of each of the n-level frequency band in the first set, and regarding the frequency mask map of the n-level frequency band in the frequency domain from the