CN-121982336-A - Image content-based self-adaptive multi-scale feature extraction method

CN121982336ACN 121982336 ACN121982336 ACN 121982336ACN-121982336-A

Abstract

The invention provides an image content self-adaptive multi-scale feature extraction method, which comprises the steps of carrying out multi-scale gradient calculation on an image according to an initial multi-scale configuration set to obtain a gradient amplitude graph sequence under each scale, generating a local complexity distribution map, calculating the mean value of all pixel variances, defining the mean value as a dynamic threshold, comparing the local area variance with the dynamic threshold, marking the local area variance as a high-detail area if the local area variance is higher than the dynamic threshold to obtain a high-detail area mask, adding scale weight to the area to obtain a weighted multi-scale configuration set, carrying out convolutional neural network multi-branch feature extraction on the received image according to the weighted multi-scale configuration set to generate a multi-layer feature graph sequence corresponding to the weighted multi-scale configuration set, so as to obtain more discriminative optimized feature representation, effectively improve the accuracy and robustness of tasks such as target detection and segmentation under a complex scene, and realize efficient allocation of computing resources and stable improvement of recognition accuracy.

Inventors

LUAN MENGJIE
ZHANG YANPENG
TANG ZHIGUO
QI XIULI
DING HUI
SUN WEI
YI YANG

Assignees

绥化学院

Dates

Publication Date: 20260505
Application Date: 20260112

Claims (9)

1. An image content-based adaptive multi-scale feature extraction method is characterized by comprising the following steps: processing a received image, determining an initial multi-scale configuration set, and performing multi-scale gradient calculation on the image according to the initial multi-scale configuration set to obtain a gradient amplitude graph sequence under each scale; Counting the cross-scale gradient variance pixel by pixel for the gradient amplitude map sequence, generating a local complexity distribution map, and calculating the average value of all pixel variances of the local complexity distribution map, wherein the average value is defined as a dynamic threshold; Comparing the local area variance of the local complexity distribution diagram with a dynamic threshold, if the local area variance is higher than the dynamic threshold, marking the area as a high detail area to obtain a high detail area mask, and adding scale weight to the area to obtain a weighted multi-scale configuration set; and carrying out convolutional neural network multi-branch feature extraction on the received image according to the weighted multi-scale configuration set, and generating a multi-layer feature map sequence corresponding to the weighted multi-scale configuration set.
2. The image content-based adaptive multi-scale feature extraction method according to claim 1, wherein the determining an initial multi-scale configuration set after processing the received image specifically comprises: and receiving an image to be processed, calculating global information entropy of the image to obtain an overall complexity value, and if the overall complexity value is higher than an entropy median value obtained by whole image pixel distribution statistics, expanding the multi-scale pyramid layer number range to determine an initial multi-scale configuration set.
3. The image content-based adaptive multi-scale feature extraction method according to claim 2, wherein the expanding the multi-scale pyramid layer number range, determining an initial multi-scale configuration set, specifically comprises: if the overall complexity value is higher than the entropy median value obtained by pixel distribution statistics, expanding the layer number range of the multi-scale pyramid to obtain an adjusted hierarchical structure; generating an initial configuration set according to the adjusted hierarchical structure, determining basic parameters of multi-scale analysis, extracting parameters of each level, acquiring a corresponding image decomposition mode, and constructing a multi-scale pyramid; extracting image details of each level according to the multi-scale pyramid to obtain feature information of different scales; and determining an initial multi-scale configuration set by carrying out integration analysis on the characteristic information of different scales and judging the performance difference of the image on each level.
4. The image content-based adaptive multi-scale feature extraction method according to claim 1, wherein the performing multi-scale gradient computation on the image according to an initial multi-scale configuration set to obtain a gradient magnitude map sequence at each scale comprises: decomposing the image through an initial multi-scale configuration set to obtain image layering data under different scales, and calculating a gradient value corresponding to each scale by adopting a gradient analysis method to obtain gradient distribution information under each scale; Performing amplitude extraction according to the gradient distribution information to obtain gradient amplitude data corresponding to each scale, and if the gradient amplitude data contains noise interference, performing smoothing treatment on the gradient amplitude data to obtain an optimized gradient amplitude set; and constructing an amplitude map under each scale according to the optimized gradient amplitude set, obtaining a preliminary gradient amplitude map sequence, and carrying out format unification treatment to obtain a final gradient amplitude map sequence.
5. The image content-based adaptive multi-scale feature extraction method of claim 1, wherein the pixel-by-pixel statistics of the sequence of gradient magnitude maps across-scale gradient variances generate a local complexity profile and calculate a mean of all pixel variances of the local complexity profile, defined as a dynamic threshold, comprising: Obtaining an amplitude sequence of the pixel position for a gradient amplitude value of the same pixel position in the gradient amplitude graph sequence, and then calculating variances of the amplitude sequence pixel by pixel to obtain a variance statistical result, wherein the variance statistical result is defined as a local complexity value; the local complexity values of all pixels are arranged, and a local complexity distribution map is generated; obtaining a complexity distribution map with enhanced contrast by carrying out histogram equalization on the local complexity distribution map; and calculating the average value of all pixel complexity in the local complexity distribution diagram to obtain a dynamic threshold value.
6. The method for extracting adaptive multi-scale features based on image content according to claim 1, wherein comparing the local region variance of the local complexity distribution map with a dynamic threshold, and if the local region variance is higher than the dynamic threshold, marking the region as a high detail region to obtain a high detail region mask, comprises: Comparing each pixel in the local complexity distribution diagram with a dynamic threshold, setting the pixel value to be 1 if the complexity of the current pixel is greater than the dynamic threshold, otherwise setting 0 to obtain a preliminary binary mask; Acquiring pixel coordinate sets with all values of 1 in the binary mask, forming a high-detail seed point set, expanding adjacent pixels around the high-detail seed point set through morphological expansion, and obtaining an expanded high-detail candidate region; Counting the pixel complexity average value of the original local complexity distribution map in the expanded high-detail candidate region, marking the pixel complexity average value as the region complexity average value, if the region complexity average value is still higher than a dynamic threshold value, reserving the candidate region, otherwise, removing the corresponding region to obtain a refined high-detail region; And calculating an circumscribed rectangle boundary for each high detail communication component to generate a final binary image of the high detail region mask.
7. The method for extracting image content-based adaptive multi-scale features according to claim 1, wherein said adding a scale weight to the region to obtain a weighted multi-scale configuration set comprises: acquiring distribution information of a high-detail region, and dividing the distribution information according to different levels of regional detail degrees to obtain a preliminary high-detail region classification result; Generating a corresponding region mask value according to a high-detail region classification result, marking a range needing to be subjected to weight adjustment in a mask coverage area, determining boundary information of the mask coverage area, scanning an initial configuration set according to the boundary information of the mask coverage area, and extracting a data structure of multi-scale weights to obtain a multi-scale weight set to be adjusted; Calculating a weight lifting value of the multi-scale weight set by adopting a weight distribution method, and limiting the scale adjustment amount of the region if the weight lifting value exceeds a preset threshold range to obtain a weight interval needing to be optimized; and (3) carrying out layer-by-layer processing on the scale adjustment quantity of the weight interval by using a configuration optimization method, updating the multi-scale weight in the mask coverage area to obtain an optimized weighted configuration set, checking the detail performance of each region by combining the scale matching degree, and if the matching degree does not reach the preset standard, recalculating the mask value of the region, determining a final configuration result and generating a complete weighted multi-scale configuration set.
8. The image content-based adaptive multi-scale feature extraction method according to claim 1, wherein the weighted multi-scale configuration set performs convolutional neural network multi-branch feature extraction on the received image, and generates a multi-layer feature map sequence corresponding to the weighted multi-scale configuration set, and the method comprises: loading all preset scales and corresponding weights thereof according to the weighted multi-scale configuration set, constructing independent branches with the same number as the scales through a convolutional neural network, and fixedly adopting corresponding scale configuration for each branch; convolving all independent branches of the image to obtain original feature images of all branches, and weighting the original feature images element by element according to the corresponding weights of all branches to obtain a weighted feature image sequence; the weighted feature map sequences are ordered from large scale to small scale to form an ordered multi-layer feature map sequence, and all weighted feature maps are uniformly adjusted to the resolution of the received image to obtain an aligned feature map sequence; And stacking the alignment feature map sequences channel by channel to obtain a fused multilayer feature map sequence.
9. The method for adaptively extracting multi-scale features based on image content according to claim 1, wherein the generating the multi-layer feature map sequence corresponding to the weighted multi-scale configuration set further comprises: Calculating the activation density of each feature map channel on the multi-layer feature map sequence according to the high-detail area mask to obtain a channel density distribution vector; And applying attention mechanism weighted fusion to the multi-layer feature map sequence through the channel density distribution vector, and distributing high weight to obtain the optimized feature representation.

Description

Image content-based self-adaptive multi-scale feature extraction method Technical Field The invention relates to the technical field of information, in particular to a self-adaptive multi-scale feature extraction method based on image content. Background In the field of image recognition, multi-scale feature fusion has become a core technical path for improving recognition accuracy, because a target object in a real scene often has significant size difference and detail richness at the same time, for example, in a city street view image, the feature of not only having a far building outline but also having a near pedestrian face texture directly determines whether the method can keep robustness in a complex environment, such as an outdoor monitoring scene of haze or illumination change. Most of the existing multi-scale fusion schemes can extract features of different receptive fields, but generally adopt uniform scale selection and sampling density configuration, so that the feature fusion process is difficult to match with actual content characteristics of each input image, the phenomenon that redundant calculation or key detail loss occurs in partial images due to the fact that the mismatch causes, and particularly, when the method runs on mobile equipment with limited resources, the problem of further enlarging calculation overhead is solved. Further, the drawbacks of this unified configuration strategy are also exposed to the vast differences in complexity of the image content. The information quantity distribution contained in different images is very unbalanced, some areas are smooth and the targets are sparse, some areas are dense in texture and the targets are overlapped, and the unified scale quantity and sampling density cannot simultaneously give consideration to the differences. When images with lower image complexity are distributed with excessive scales, a large amount of irrelevant noise is introduced, for example, when indoor self-shooting of a single background is processed, redundant fine-scale samples capture background noise points instead of useful information to interfere with the overall feature quality, and images with higher complexity are submerged in target boundary blurring or small object features due to insufficient scales, so that the adaptation difficulty caused by the complexity difference of the images becomes a key bottleneck for limiting the fusion effect. The deeper layer is that it is difficult to accurately judge how many scales are needed for one image and how dense each layer should be sampled only according to the global statistics index, and still there is no reliable decision basis. Although the information entropy of the image can reflect the overall complexity, the intensity of local gradient distribution and the target aggregation characteristic cannot be finely described, for example, in a landscape including a calm lake surface and a rapid waterfall, the information entropy may give a medium value, but the severe gradient change of the waterfall area requires dense sampling, while the lake surface only needs rough processing, and the two precisely determine whether the feature extraction needs to input more computing resources in some areas, such as the high-density target area preferentially distributes fine scales to avoid feature confusion among overlapped objects. As a result, existing strategies tend to produce distinct optimal scale configurations between images with similar information entropy, and the same set of parameters behave very unstably under different local characteristics, resulting in dramatic quality fluctuations of the fusion features. Therefore, how to dynamically and accurately determine the most suitable scale number and sampling density configuration range according to the information entropy, gradient distribution variance and target density clustering characteristics of an input image in the multi-scale feature fusion process, so as to realize efficient distribution of computing resources and stable improvement of recognition accuracy, and become a key problem of breakthrough in the current image recognition method based on multi-scale feature fusion. The problem is that the technical contradiction on business is highlighted, and the efficiency and the accuracy are sacrificed due to the fact that the individuation characteristic of the image is ignored while the universality is pursued to cover various scenes, so that the robustness of the system is difficult to guarantee in the process of edge computing equipment or large-scale data sets, and particularly in the application of high-precision requirements such as medical diagnosis or security monitoring, the contradiction directly affects the trust and the deployment cost of users. Disclosure of Invention The invention provides an image content-based self-adaptive multi-scale feature extraction method, which mainly comprises the following steps: processing a received image, determin