CN-121982670-A - Road vanishing point detection method based on multi-scale supervision and self-adaptive weighting
Abstract
The invention discloses a road vanishing point detection method based on multi-scale supervision and self-adaptive weighting, which relates to the technical field of high-resolution network identification and comprises the steps of introducing a double-pooling coordinate attention mechanism into a residual error module of an attention-enhanced uncertainty self-adaptive multi-scale high-resolution network so as to better model coordinate related context information and strengthen the response to strong directional structures such as road convergence lines, curves and the like, adopting multi-scale deep supervision, applying independent supervision to each resolution branch to provide direct gradient constraint, promoting balance and consistency of inter-scale representation learning, introducing an uncertainty-based self-adaptive weighting strategy in joint training, dynamically adjusting each scale loss contribution through a leavable uncertainty parameter, thereby solving gradient conflict and training instability caused by fixed weight and further obtaining higher detection precision in road vanishing point detection.
Inventors
- FAN XUE
- GAO ZHENGKANG
- XU RUOTONG
- FENG ZHIQUAN
- YANG XIAOHUI
- XU TAO
Assignees
- 济南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260126
Claims (10)
- 1. The road vanishing point detection method based on multi-scale supervision and self-adaptive weighting is characterized by comprising the following steps: S1, acquiring a road scene image; s2, inputting the road scene image into an uncertainty self-adaptive multi-scale high-resolution network with enhanced attention for road vanishing point detection; the attention-enhanced uncertainty adaptive multi-scale high resolution network comprises: a. Introducing a double-pooling coordinate attention mechanism into a residual error module of the high-resolution network HRNet so as to perform feature extraction on the road scene image to obtain a multi-scale feature map; b. integrating the multi-scale feature map into a plurality of prediction branches by adopting a multi-scale depth supervision mechanism so as to predict vanishing point thermodynamic diagrams; the vanishing point thermodynamic diagram comprises a main task output branch thermodynamic diagram and auxiliary output branch thermodynamic diagrams with the rest scales; Carrying out coordinate regression and confidence estimation on the main task output branch heat map to obtain main loss; Performing mean square error calculation on auxiliary output branch heat maps of other scales to obtain auxiliary loss; self-adaptive weighting of main loss and auxiliary loss is carried out by adopting a learnable logarithmic variance, so as to obtain total loss; c. the attention-enhanced uncertainty adaptive multi-scale high resolution network parameters are updated with total loss back propagation.
- 2. The method for detecting the vanishing point of the road based on multi-scale supervision and self-adaptive weighting according to claim 1, wherein a double-pooling coordinate attention mechanism adopts an average pooling and maximum pooling combined modeling mechanism and double-pooling enhancement is carried out; the specific contents of extracting the characteristics of the road scene image to obtain the multi-scale characteristic image comprise the following steps: Defining an input feature map Respectively carrying out average pooling and maximum pooling operation in the horizontal direction and the vertical direction to obtain a direction characteristic diagram of the horizontal direction subjected to the average pooling operation Directional feature map of horizontal direction through maximum pooling operation Directional feature map subjected to average pooling operation in vertical direction And a direction feature map subjected to a max pooling operation in a vertical direction ; Wherein avg and max represent average pooling and maximum pooling, respectively, and h and w represent horizontal and vertical directions, respectively; respectively splicing the average pooling characteristic and the maximum pooling characteristic in the horizontal direction and the vertical direction on the channel dimension to obtain a horizontal direction characteristic diagram containing various statistical information And vertical direction feature map ; Will be And (3) with Tandem formation of fusion features in the spatial dimension ; Feature compression using a 1 x 1 convolution, mapping the 2C channel to a low dimension m=max (8, C/r), where r is the compression ratio; The convolution output is subjected to batch normalization and hard-swish activation function processing to carry out double-pooling enhancement, and then fusion is carried out to obtain ; Features after fusion Split into two parts in vertical and horizontal directions, and generate a split horizontal attention weight graph through two independent 1×1 convolutions respectively And a vertically oriented attention weighting map ; Pairs of coordinate attention mechanisms through double pooling And Channel-by-channel weighting and outputting a multi-scale feature map 。
- 3. The method for detecting the vanishing point of the road based on multi-scale supervision and adaptive weighting according to claim 2, wherein, 、 Fusion features The expressions of (2) are respectively: ; ; ; Wherein Concat denotes a splicing operation, C denotes a channel, H denotes a height, W denotes a width, Representing the real number domain.
- 4. The method for detecting the vanishing point of the road based on multi-scale supervision and adaptive weighting according to claim 2, wherein, The expression of (2) is: ; Wherein Conv is a convolution operation, BN is a batch normalization layer, The function is activated.
- 5. The method for detecting the vanishing point of the road based on multi-scale supervision and adaptive weighting according to claim 2, wherein the attention weight map is characterized in that And The expression of (2) is: ; ; Wherein sigma represents a Sigmoid function, and Conv is a convolution operation.
- 6. The method for detecting the vanishing point of the road based on multi-scale supervision and self-adaptive weighting according to claim 2, wherein the expression of the multi-scale feature map is as follows: ; Wherein, the Representing the multiplication by element, The input characteristic diagram is represented by a graph of the input characteristics, The weighted feature map is represented.
- 7. The method for detecting the vanishing point of the road based on multi-scale supervision and self-adaptive weighting according to claim 1, wherein the specific content of integrating the multi-scale feature map into a plurality of prediction branches to predict the vanishing point thermodynamic diagram by adopting a multi-scale depth supervision mechanism in b comprises: defining a backbone network of the attention-enhanced uncertainty adaptive multi-scale high-resolution network to output a group of feature map sets, wherein the feature map sets comprise 1/4, 1/8, 1/16 and 1/32 scales, and the expression of the feature map sets is as follows: ; For the highest resolution 1/4 branch Applying a1 x 1 convolution to generate a thermodynamic diagram output outl at 1/4 resolution; The thermodynamic diagram output outl is spliced with the original 1/4 resolution characteristic in the channel dimension and then input into an up-sampling module to obtain an up-sampled characteristic representation h 1 , ; Applying 1×1 convolution to h 1 to obtain a 1/2 resolution thermodynamic diagram outh, and splicing the thermodynamic diagram outh and h 1 in the channel dimension to form a joint feature diagram chord; Outh downsampling by bilinear interpolation to match scale features Spatial resolution of (a); the adjusted outh and corresponding scale features in the channel dimension Splicing to obtain a feature map fusing high-resolution contexts ; Feature maps for each scale And configuring an independent multi-scale prediction module to generate a prediction result.
- 8. The method for detecting vanishing points on a road based on multi-scale supervision and adaptive weighting according to claim 7, wherein the feature map for each scale is The expression for configuring the independent multi-scale prediction module to generate the prediction result is as follows: ; ; ; Wherein, the Corresponding to the output branch of the main task, In correspondence with the auxiliary output branch(s), Features for coordinate regression and confidence prediction are represented, A heat map of different dimensions is shown, The operation of the splice is indicated and, Representing an i-th scale input feature map, A 1/2 resolution heat map is shown, And (5) representing an ith scale fusion characteristic diagram fused with the high-resolution context information.
- 9. The method for detecting the vanishing point of the road based on multi-scale supervision and self-adaptive weighting according to claim 1, wherein the self-adaptive weighting of the main loss and the auxiliary loss by adopting the learnable logarithmic variance is characterized in that the specific content of obtaining the total loss is as follows: For each auxiliary loss Introducing a learnable logarithmic variance parameter Wherein ; Constructing an uncertainty weighted term for the ith auxiliary loss, the uncertainty weighted term having the expression: ; Wherein, the Is an adaptive weight factor for Dynamically scaling the gradient contribution of (2); Is a penalty term related to uncertainty and is used for preventing The unbounded increase leads to weight degradation, and ensures the stability of weight learning; the expression of the total loss is: 。
- 10. the method for detecting vanishing points on a road based on multi-scale supervision and adaptive weighting according to claim 9, wherein the uncertainty adaptive multi-scale high-resolution network with enhanced attention is to be used in the training phase Adding the trainable parameters and the network parameters into an optimizer together; counter-propagating based on total loss L in each iteration, jointly updating network parameters and logarithmic variance parameters And realizing self-adaptive learning of auxiliary loss weights.
Description
Road vanishing point detection method based on multi-scale supervision and self-adaptive weighting Technical Field The invention relates to the technical field of high-resolution network identification, in particular to a road vanishing point detection method based on multi-scale supervision and self-adaptive weighting. Background Road vanishing point detection is one of the key tasks in road geometry understanding, and can provide important perspective geometry constraints for road direction estimation, vehicle navigation, camera calibration and the like in automatic driving. Related research has turned from traditional methods based on explicit geometric priors and manual features to end-to-end learning methods that are dominated by deep learning. The early method is mainly based on edge detection, hough transformation and geometric projection principles, and vanishing point positioning is achieved by extracting structural features such as lane lines, edge lines and the like. Such methods have higher accuracy in structured roads, but poor stability in light variation, texture breaking, severe occlusion, or unstructured road scenes. With the development of deep learning, convolutional neural networks are widely used for vanishing point detection tasks. The end-to-end feature learning method can automatically learn road geometric features through large-scale data, so that the detection precision and robustness are remarkably improved. The representative algorithm is HRNet based on a multi-resolution parallel branch structure and a trans-scale feature fusion mechanism, performs end-to-end modeling by utilizing multi-scale features, and obtains a better result in unstructured road vanishing point detection. The existing Vanishing Point (VP) detection method for unstructured road scenes is generally based on HRNet and multi-scale supervision training, but under the conditions of complex background interference, sparse directional cues, dynamic noise change, the algorithm has the problems of insufficient spatial structure characterization, unstable multi-scale optimization and the like, so that key area response is easy to fluctuate, forecast is easy to deviate, and finally positioning accuracy is influenced. The directional structure context modeling capability is insufficient, namely HRNet has the capability of multi-resolution parallel characterization, but the explicit modeling of context information related to a coordinate is lacked, and the dependency relationship between different spatial positions is difficult to stably capture. In addition, the traditional residual block mainly relies on channel dimension weighting to perform feature aggregation, and lacks explicit depiction of space direction information, so that response of a model at strong direction structures such as road boundaries, curve curvatures, lane convergence and the like is easy to be interfered, false detection and false positioning of vanishing points are caused, and accuracy and stability of an overall detection result are limited. The multi-scale loss fusion with fixed coefficient weighting is difficult to adapt to the dynamic change in the training process, and in the multi-scale training, the problems of magnitude difference, inconsistent convergence speed, dynamic change of noise level and the like exist in each branch loss. Through fixed coefficient linear weighting, reasonable balance is difficult to keep continuously in different training stages, and the problems of unbalanced scale learning, unstable training and the like caused by the fact that auxiliary loss is dominant or ignored for a long time easily occur. Furthermore, when a prediction of a certain scale is affected by noise, observed uncertainty, or labeling bias, still using a fixed coefficient weighting will continuously inject the error into the back propagation in a constant proportion, resulting in gradient disturbance accumulation, weakening the primary task learning, and reducing overall robustness. Therefore, there is a need for a road vanishing point detection method based on multi-scale supervision and adaptive weighting to improve the above-mentioned problems. Disclosure of Invention In order to solve the problems, the application provides a road vanishing point detection method based on multi-scale supervision and self-adaptive weighting, which maintains the advantages of multi-resolution characterization of a high-resolution network HRNet, and improves the stability and cross-scene robustness of vanishing point detection by the cooperation of direction and coordinate perception modeling, multi-scale explicit supervision and uncertainty self-adaptive weighting, and comprises the following steps: S1, acquiring a road scene image; s2, inputting the road scene image into an uncertainty self-adaptive multi-scale high-resolution network with enhanced attention for road vanishing point detection; the attention-enhanced uncertainty adaptive multi-scale high resolu