CN-121999458-A - Model training and reasoning method, control device and automatic driving vehicle

CN121999458ACN 121999458 ACN121999458 ACN 121999458ACN-121999458-A

Abstract

The disclosure provides a model training and reasoning method, a control device and an automatic driving vehicle, and relates to the field of artificial intelligence, in particular to the field of automatic driving. The model training method comprises the steps of extracting characteristic information of first sensor data collected under first sensor configuration, determining space-time information of the first sensor data, training a task model in an automatic driving scene by taking the characteristic information and the space-time information of the first sensor data as training data, so that the task model learns a first association relation between the characteristic information and the space-time information of the sensor data and a task prediction result, and the task model automatically adjusts the task prediction result when the space-time information changes due to sensor configuration change. The migration cost of the model is reduced, and the compatibility of the model to the sensor configuration is improved.

Inventors

ZHENG YE

Assignees

北京京东乾石科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (19)

1. A model training method compatible with sensor configuration, comprising: extracting characteristic information of first sensor data acquired under the first sensor configuration; Determining spatiotemporal information of the first sensor data; And training a task model in an automatic driving scene by taking the characteristic information and the space-time information of the first sensor data as training data, so that the task model learns a first association relation between the characteristic information and the space-time information of the sensor data and a task prediction result, wherein the first association relation is used for enabling the task model to automatically adjust the task prediction result when the space-time information changes due to sensor configuration change.
2. The method of claim 1, wherein the spatiotemporal information comprises spatial position coding, and determining the spatiotemporal information of the first sensor data comprises: Projecting the position information of the first sensor data in the sensor coordinate system to a unified coordinate system through a projection matrix from the sensor coordinate system to the unified coordinate system to obtain the position information of the first sensor data in the unified coordinate system; And encoding the position information of the first sensor data in a unified coordinate system to obtain the spatial position code of the first sensor data.
3. The method of claim 2, wherein the first sensor data comprises at least one of image data of a camera, point cloud data of a radar; projecting the location information of the first sensor data in the sensor coordinate system to a unified coordinate system includes at least one of: sampling a plurality of position points along the ray direction of the optical center of the camera for each pixel point in the image data of the camera, and projecting the position coordinates of the plurality of position points from a camera coordinate system to a unified coordinate system; For each voxel in the point cloud data of the radar, the position coordinates of each voxel are projected from the radar coordinate system to a unified coordinate system.
4. The method of claim 2, wherein encoding the position information of the first sensor data in a unified coordinate system, resulting in spatial position encoding of the first sensor data comprises: encoding the position information of the first sensor data in a unified coordinate system by utilizing a trigonometric function to obtain a first code of the first sensor data; And encoding the first code of the first sensor data by using a first depth network to obtain a second code of the first sensor data, wherein the second code is used as the spatial position code of the first sensor data.
5. The method of claim 4, wherein encoding the location information of the first sensor data in a unified coordinate system using a trigonometric function, the first encoding of the first sensor data comprising: Generating the frequency of each coding dimension according to a preset temperature parameter; determining a response of the location information of the first sensor data in a unified coordinate system to the frequency of each encoded dimension; selecting a sine function or a cosine function to encode the corresponding response of each coding dimension according to the parity attribute of each coding dimension; and splicing the corresponding codes of each code dimension to serve as a first code of the first sensor data.
6. The method of claim 5, wherein encoding the location information of the first sensor data in a unified coordinate system using a trigonometric function, the first encoding of the first sensor data comprising: before determining the response, scaling the position information of the first sensor data in the unified coordinate system by using a scaling factor of a trigonometric function.
7. The method of any of claims 1-6, wherein the spatiotemporal information comprises a temporal encoding, and determining the spatiotemporal information of the first sensor data comprises: calculating the relative time of each two adjacent frames of data according to the time stamp of each frame of data in the first sensor data; And encoding the relative time by using a second depth network to obtain the time code of the first sensor data.
8. The method of any of claims 1-6, further comprising encoding attribute data of a first sensor that acquired the first sensor data; wherein training the task model includes: And training the task model by taking the characteristic information and the space-time information of the first sensor data and the codes of the attribute data of the first sensor as training data, so that the task model learns a second association relation between the characteristic information and the space-time information of the sensor data and the codes of the attribute data of the sensor and the task prediction result, wherein the second association relation is used for enabling the task model to automatically adjust the task prediction result when the space-time information or the attribute data of the sensor changes due to sensor configuration change.
9. The method of any of claims 1-6, wherein the first sensor configuration comprises at least one of a mounting configuration of a first sensor that collects first sensor data, attribute data.
10. The method of any of claims 1-6, wherein the task model comprises a target detection model, a space occupancy detection model, or a trajectory prediction model.
11. A model reasoning method compatible with sensor configuration, comprising: Extracting characteristic information of second sensor data acquired under the second sensor configuration; Determining spatiotemporal information of the second sensor data; Reasoning the characteristic information and the space-time information of the second sensor data by utilizing the first association relation learned by the task model to obtain a first task prediction result, Wherein the task model is trained using the method of any one of claims 1-7, 9-10.
12. A model reasoning method compatible with sensor configuration, comprising: Extracting characteristic information of second sensor data acquired under the second sensor configuration; Determining spatiotemporal information of the second sensor data; encoding attribute data of a second sensor that collects data of the second sensor; inferring the characteristic information and the space-time information of the second sensor data and the codes of the attribute data of the second sensor by using a second association relation learned by the task model to obtain a second task prediction result, Wherein the task model is trained using the method of claim 8.
13. The method of claim 11 or 12, wherein the spatiotemporal information comprises spatial position coding and determining the spatiotemporal information of the second sensor data comprises: Projecting the position information of the second sensor data in the sensor coordinate system to the unified coordinate system through a projection matrix from the sensor coordinate system to the unified coordinate system to obtain the position information of the second sensor data in the unified coordinate system; and encoding the position information of the second sensor data in a unified coordinate system to obtain the spatial position code of the second sensor data.
14. The method of claim 11 or 12, wherein the spatiotemporal information comprises a temporal encoding, and determining the spatiotemporal information of the second sensor data comprises: Calculating the relative time of each two adjacent frames of data according to the time stamp of each frame of data in the second sensor data; and encoding the relative time by using a second depth network to obtain the time code of the second sensor data.
15. A control device comprising one or more modules that perform the method of any of claims 1-14.
16. A control device comprising a memory, and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-14 based on instructions stored in the memory.
17. An autonomous vehicle comprising a control device configured to perform the method of any of claims 1-14.
18. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any of claims 1-14.
19. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 14.

Description

Model training and reasoning method, control device and automatic driving vehicle Technical Field The present disclosure relates to the field of artificial intelligence, and more particularly to the field of autopilot, and more particularly to a model training and reasoning method compatible with sensor configuration, a control device, and an autopilot vehicle. Background The automatic driving vehicle collects environmental information through various sensors, such as cameras, radars and the like, assembled on the vehicle, and performs intelligent operation and reasoning through various intelligent task models, such as a target detection model and the like, so as to realize automatic driving control. The task model needs to be trained and then applied in reasoning. During training, sensor configurations such as model/parameter, installation position angle and assembly vehicle type of the sensor are required to be clarified, sensor data are collected based on specific sensor configurations to serve as training data, and a task model is trained. The trained task model is very dependent on sensor configuration during training, and generalization capability is poor. Once the sensor configuration at the time of the inference application changes relative to the sensor configuration at the time of training, the detection effect of the task model is affected. However, there is often a variation in the actual implementation of the sensor configuration at the time of the vehicle production versus the time of the vehicle design, resulting in an influence on the detection effect of the mission model of the vehicle. There are some related techniques to retrain the task model based on the actual sensor configuration, with relatively high model migration costs. In the related technology, during model reasoning, based on a virtual sensor technology, data acquired by a sensor are corrected, and then a task model is input for reasoning. However, this method requires that the model/parameters of the sensor, the model of the assembly vehicle, the installation position and the model training are substantially identical, and only allows for a certain manufacturing tolerance or installation error of the sensor in a small range, otherwise the method will fail, and the compatibility of the model with the sensor configuration is very limited. Disclosure of Invention In order to reduce model migration cost and improve compatibility of a model to sensor configuration, the embodiment of the disclosure provides a model training and reasoning scheme. Some embodiments of the present disclosure provide a model training method compatible with a sensor configuration, including extracting feature information of first sensor data acquired under the first sensor configuration; Determining spatiotemporal information of the first sensor data; And training a task model in an automatic driving scene by taking the characteristic information and the space-time information of the first sensor data as training data, so that the task model learns a first association relation between the characteristic information and the space-time information of the sensor data and a task prediction result, wherein the first association relation is used for enabling the task model to automatically adjust the task prediction result when the space-time information changes due to sensor configuration change. In some embodiments, the spatiotemporal information comprises spatial position coding and determining the spatiotemporal information of the first sensor data comprises: Projecting the position information of the first sensor data in the sensor coordinate system to a unified coordinate system through a projection matrix from the sensor coordinate system to the unified coordinate system to obtain the position information of the first sensor data in the unified coordinate system; And encoding the position information of the first sensor data in a unified coordinate system to obtain the spatial position code of the first sensor data. In some embodiments, the first sensor data comprises at least one of image data of a camera, point cloud data of a radar; projecting the location information of the first sensor data in the sensor coordinate system to a unified coordinate system includes at least one of: sampling a plurality of position points along the ray direction of the optical center of the camera for each pixel point in the image data of the camera, and projecting the position coordinates of the plurality of position points from a camera coordinate system to a unified coordinate system; For each voxel in the point cloud data of the radar, the position coordinates of each voxel are projected from the radar coordinate system to a unified coordinate system. In some embodiments, encoding the position information of the first sensor data in a unified coordinate system, and obtaining the spatial position encoding of the first sensor data includes: encoding the position inform