CN-116704199-B - Multi-task traffic panoramic sensing method and device and computer equipment

CN116704199BCN 116704199 BCN116704199 BCN 116704199BCN-116704199-B

Abstract

The application relates to a multi-task traffic panorama sensing method, a device and computer equipment. The method comprises the steps of carrying out shared feature extraction on an image to be processed through a feature extraction model to obtain multi-layer image features, multi-layer semantic features and multi-layer positioning features, determining target driving area information, target lane line information and target object positioning information according to the multi-layer image features, the multi-layer semantic features and the multi-layer positioning features through a target detection model, carrying out target object selection and tracking based on the target object positioning information, obtaining the motion state of the selected target object according to tracking results, and determining traffic panoramic sensing information according to the target driving area information, the target lane line information, the target object positioning information and the motion state of the target object. The method can improve the sensing speed and expand the sensing dimension.

Inventors

HAN CHENG

Assignees

中汽创智科技有限公司

Dates

Publication Date: 20260508
Application Date: 20230605

Claims (13)

1. A method of multi-tasking traffic panorama perception, the method comprising: The method comprises the steps of carrying out shared feature extraction on an image to be processed through a feature extraction model to obtain multi-layer image features, multi-layer semantic features and multi-layer positioning features, wherein the multi-layer image features are shallow feature information, the multi-layer semantic features and the multi-layer positioning features are deep feature information, the feature extraction model comprises a feature image extraction network, a feature pyramid network and a path aggregation network, each feature image extraction network, each feature pyramid network and each path aggregation network comprise a multi-channel fusion layer, each feature image extraction network comprises a plurality of first downsampling layers which are sequentially connected, each feature pyramid network comprises a plurality of upsampling layers which are sequentially connected, each path aggregation network comprises a plurality of second downsampling layers which are sequentially connected, each upsampling layer is provided with a corresponding first downsampling layer in each feature image extraction network, each path aggregation network is provided with a corresponding second downsampling layer, and each feature image extraction network and each feature pyramid network is provided with at least one first downsampling layer and at least one upsampling layer which are realized by adopting the multi-channel fusion layers; Determining target driving area information, target lane line information and target object positioning information according to the multilayer image features, the multilayer semantic features and the multilayer positioning features through a target detection model, wherein the target detection model comprises a region segmentation network, a lane line detection network and a target object detection network, the region segmentation network is configured to detect the target driving area information based on the multilayer image features, the lane line detection network is configured to detect the target lane line information based on the multilayer image features and the multilayer semantic features, the target object detection network is configured to detect the target object positioning information based on the multilayer positioning features, and the region segmentation network, the lane line detection network and the target object detection network all comprise the multichannel fusion layer; The multi-channel fusion layer of the lane line detection network is used for carrying out two-channel segmentation on input data to obtain first two-channel data and second two-channel data, carrying out two-channel segmentation on the second two-channel data to obtain first four-channel data and second four-channel data, carrying out two-channel segmentation on the second four-channel data to obtain first eight-channel data and second eight-channel data, and carrying out feature aggregation on the first two-channel data, the first four-channel data, the first eight-channel data and the second eight-channel data to obtain output data of the multi-channel fusion layer of the lane line detection network; Selecting and tracking the target object based on the target object positioning information, and acquiring the motion state of the selected target object according to the tracking result; And determining traffic panoramic sensing information according to the target driving area information, the target lane line information, the target object positioning information and the motion state of the target object.
2. The method according to claim 1, wherein the performing shared feature extraction on the image to be processed by the feature extraction model to obtain a multi-layer image feature, a multi-layer semantic feature and a multi-layer positioning feature comprises: Extracting each first downsampling layer in a network through the feature map, and carrying out downsampling processing on the image to be processed for a plurality of times based on the input data of each first downsampling layer to obtain the multi-layer image feature output by each first downsampling layer, wherein the input data of the first downsampling layer is the image to be processed, and the input data of other first downsampling layers is the multi-layer image feature output by the last first downsampling layer; Performing up-sampling processing on the multi-layer image features for a plurality of times based on the input data of each up-sampling layer through each up-sampling layer in the feature pyramid network to obtain the multi-layer semantic features output by each up-sampling layer, wherein the input data of the first up-sampling layer is the multi-layer image features output by the first down-sampling layer corresponding to the first up-sampling layer, and the input data of other up-sampling layers are the multi-layer image features output by the first down-sampling layer corresponding to the up-sampling layer and the multi-layer semantic features output by the last up-sampling layer; And performing multiple downsampling processing on the multi-layer semantic features based on input data of each second downsampling layer through each second downsampling layer in the path aggregation network to acquire the multi-layer positioning features output by each second downsampling layer, wherein the input data of the first second downsampling layer is the multi-layer semantic features output by the upsampling layer corresponding to the first second downsampling layer, and the input data of other second downsampling layers is the multi-layer semantic features output by the upsampling layer corresponding to the second downsampling layer and the multi-layer positioning features output by the previous second downsampling layer.
3. The method of claim 1, wherein the area splitting network further comprises a channel attention layer and an upsampling network; Detecting, by the area-segmentation network, the target travel area information based on the multi-layer image feature, including: Performing self-adaptive feature optimization processing based on an attention mechanism on the multi-layer image features through the channel attention layer to obtain optimized image features; Performing feature fusion processing on the optimized image features through the multi-channel fusion layer to obtain fused image features; the up-sampling network is used for up-sampling the fused image characteristics, and a two-channel running area gray level map is output; and acquiring the target driving area information according to the driving area gray level diagram.
4. The method of claim 3, wherein the lane line detection network further comprises a channel attention layer and a deconvolution sampling network, wherein detecting, by the lane line detection network, the target lane line information based on the multi-layer image features and the multi-layer semantic features comprises: Performing self-adaptive feature optimization based on an attention mechanism on the multi-layer image features and the multi-layer semantic features through the channel attention layer to obtain optimized semantic features; Performing feature fusion processing on the optimized semantic features through the multi-channel fusion layer to obtain fusion semantic features; Performing deconvolution up-sampling processing on the fusion semantic features through the deconvolution sampling network, and outputting a lane line gray level map of a single channel; And acquiring the target lane line information according to the lane line gray level map of the single channel.
5. The method of claim 4, wherein if the multi-channel fusion layer is the first downsampling layer, the input data of the multi-channel fusion layer is the input data of the first downsampling layer, and the output data of the multi-channel fusion layer is the multi-layer image feature output by the first downsampling layer; if the multi-channel fusion layer is the up-sampling layer, the input data of the multi-channel fusion layer is the input data of the up-sampling layer, and the output data of the multi-channel fusion layer is the multi-layer semantic feature output by the up-sampling layer; If the multi-channel fusion layer is positioned in the area division network, the input data of the multi-channel fusion layer is the optimized image characteristics, and the output data is the fused image characteristics; and if the multichannel fusion layer is positioned in the lane line detection network, the input data of the multichannel fusion layer is the optimized semantic features, and the output data is the fused semantic features.
6. The method of claim 4, wherein the channel attention layer is configured to perform feature classification processing on input data to obtain refined feature channel information and channel weight allocation information; according to the refined characteristic channel information and the channel weight distribution information, performing self-adaptive characteristic optimization based on an attention mechanism to obtain output data; if the channel attention layer is positioned in the area division network, the input data of the channel attention layer is a multi-layer image feature, and the output data is an optimized image feature; If the channel attention layer is positioned in the lane line detection network, the input data of the channel attention layer are multi-layer image features and multi-layer semantic features, and the output data are optimized semantic features.
7. The method of claim 1, wherein the object detection network comprises a weight distribution layer, an object positioning layer, and an object screening layer; Detecting, by the object detection network, the object positioning information based on the multilayer positioning feature, including: weighting distribution is carried out on the multi-layer positioning features through the weight distribution layer, so that multi-layer weighting features are obtained; Locking, by the target positioning layer, initial positioning information of the target object based on the multi-layer weighting feature, wherein the target object includes pedestrians, non-vehicles, and vehicles; And screening the initial positioning information of the target object by using a non-maximum suppression algorithm through the target screening layer to obtain the final positioning information of the target object.
8. The method of claim 1, wherein the region segmentation network is optimized based on a bi-classification cross entropy loss.
9. The method of claim 1, wherein the lane line detection network is optimized based on a combination of focus loss, cross-over loss, and tawny loss.
10. The method of claim 1, wherein the target detection network is optimized based on a combination of confidence loss, classification loss, and cross-ratio loss.
11. A multitasking traffic panorama sensing apparatus, the apparatus comprising: The feature extraction module is used for carrying out shared feature extraction on an image to be processed through a feature extraction model to obtain multi-layer image features, multi-layer semantic features and multi-layer positioning features, wherein the multi-layer image features are shallow feature information, the multi-layer semantic features and the multi-layer positioning features are deep feature information, the feature extraction model comprises a feature image extraction network, a feature pyramid network and a path aggregation network, the feature image extraction network, the feature pyramid network and the path aggregation network all comprise multichannel fusion layers, the feature image extraction network comprises a plurality of first downsampling layers which are sequentially connected, the feature pyramid network comprises a plurality of downsampling layers which are sequentially connected, the path aggregation network comprises a plurality of second downsampling layers which are sequentially connected, each upsampling layer is provided with a corresponding first downsampling layer in the feature image extraction network, and each upsampling layer is provided with a corresponding second downsampling layer in the path aggregation network, and the feature image extraction network and the feature pyramid network are provided with at least one first downsampling layer and at least one upsampling layer adopt the multichannel fusion layers; The system comprises a target detection module, a region segmentation network, a lane line detection network and a target object detection network, wherein the target detection module is used for determining target driving region information, target lane line information and target object positioning information according to the multilayer image characteristics, the multilayer semantic characteristics and the multilayer positioning characteristics through a target detection model, the target detection model comprises a region segmentation network, a lane line detection network and a target object detection network, the region segmentation network is used for detecting the target driving region information based on the multilayer image characteristics, the lane line detection network is used for detecting the target lane line information based on the multilayer image characteristics and the multilayer semantic characteristics, the target object detection network is used for detecting the target object positioning information based on the multilayer positioning characteristics, the region segmentation network, the lane line detection network and the target object detection network all comprise the multichannel fusion layer, the multichannel fusion layer of the lane line detection network is used for carrying out two-channel segmentation on input data to obtain first two-channel data and second two-channel data, carrying out two-channel segmentation on the second two-channel data to obtain first four-channel data and second four-channel data, the lane line detection network is used for carrying out four-channel segmentation on the second two-channel data and eight-channel data, and eight-channel data is used for carrying out eight-channel fusion on the second two-channel data, the two-channel data and eight-channel data, and eight-channel data is obtained by carrying out eight-channel fusion on the two-channel data, and eight-channel data; the tracking module is used for selecting and tracking the target object based on the target object positioning information and acquiring the motion state of the selected target object according to the tracking result; And the perception module is used for determining traffic panoramic perception information according to the target driving area information, the target lane line information, the target object positioning information and the motion state of the target object.
12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 10 when the computer program is executed.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.

Description

Multi-task traffic panoramic sensing method and device and computer equipment Technical Field The application relates to the technical field of automatic driving, in particular to a multi-task traffic panorama sensing method, a device and computer equipment. Background Traffic panorama perception is becoming increasingly mature in functional category and performance index as an important part of autopilot. Traffic panoramic perception refers to that an automobile processes a large amount of pavement information and road information in a panoramic image in real time through a perception system, so that the automobile is helped to make a safe and reasonable decision when driving. The automatic driving generally adopts a neural network model to carry out traffic panorama perception design, relies on visual image information extracted from a vehicle-mounted perception sensor to analyze, and understands traffic scenes through visual images, so that an auxiliary decision-making system controls the actions of vehicles, and the aim of safe driving is fulfilled. Real-time perceptual models in the conventional art include a single-task perceptual model and a multi-task perceptual model. The perception model of a single task is only focused on a single task of target detection, so that the information contained in the traffic panoramic image is extracted only in a limited way. The realization of the multi-task often uses a mode of connecting various perception models in series to do tasks one by one, but the series perception models respectively perform feature extraction, and repeated feature extraction operation can occupy limited vehicle-mounted computing resources and equipment reaction time. The conventional multi-task traffic panoramic perception still has a certain limitation on the analysis of complex road conditions, and the performance index still needs to be further improved. Disclosure of Invention Based on the foregoing, it is necessary to provide a method, a device and a computer device for multi-task traffic panorama sensing for improving the performance of a sensing model. In a first aspect, the present application provides a method of multitasking traffic panorama awareness. The method comprises the following steps: Carrying out shared feature extraction on the image to be processed through a feature extraction model to obtain multi-layer image features, multi-layer semantic features and multi-layer positioning features; determining target driving area information, target lane line information and target object positioning information according to the multilayer image features, the multilayer semantic features and the multilayer positioning features through a target detection model; selecting and tracking the target object based on the target object positioning information, and acquiring the motion state of the selected target object according to the tracking result; and determining traffic panorama sensing information according to the target driving area information, the target lane line information, the target object positioning information and the motion state of the target object. In one embodiment, the feature extraction model comprises a feature graph extraction network, a feature pyramid network and a path aggregation network, wherein the feature graph extraction network comprises a plurality of first downsampling layers which are sequentially connected, the feature pyramid network comprises a plurality of upsampling layers which are sequentially connected, and the path aggregation network comprises a plurality of second downsampling layers which are sequentially connected; Carrying out shared feature extraction on the image to be processed through a feature extraction model to obtain multi-layer image features, multi-layer semantic features and multi-layer positioning features, wherein the method comprises the following steps: extracting each first downsampling layer in the network through a feature map, and carrying out downsampling processing on the image to be processed for a plurality of times based on the input data of each first downsampling layer to obtain the multi-layer image characteristics output by each first downsampling layer, wherein the input data of the first downsampling layer is the image to be processed, and the input data of other first downsampling layers is the multi-layer image characteristics output by the last first downsampling layer; The method comprises the steps of carrying out up-sampling processing on multi-layer image features for multiple times based on input data of each up-sampling layer through each up-sampling layer in a feature pyramid network to obtain multi-layer semantic features output by each up-sampling layer, wherein the input data of a first up-sampling layer is the multi-layer image features output by a first down-sampling layer corresponding to the first up-sampling layer, and the input data of other up-sampling layers are the multi-layer image