CN-122024019-A - Method and device for encoding an object to be input into an autopilot model

CN122024019ACN 122024019 ACN122024019 ACN 122024019ACN-122024019-A

Abstract

The present disclosure relates to the field of autopilot, and in particular to a method and apparatus for encoding perception information about an object entered into an autopilot model, and a computer readable storage medium and computer program product containing computer programs/instructions for implementing the above methods. In the above method, by dynamically setting the number of data frames to be processed according to the first data frame and processing the data frames in order from near to far in time, improvement of coding efficiency and target expressive power is achieved under the constraint that the total number of token is fixed.

Inventors

REN SHAOQING
CHENG JIN
CHENG ZHENGXIN
LIU GUOYI
SHE XIAOLI
FU XIAOXIN
DENG HAOPING
WANG ZHUO

Assignees

安徽蔚来智驾科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260415

Claims (15)

1. A method for encoding perceptual information about a target entered into an autopilot model, wherein the target comprises a dynamic target and a static target, the input of the autopilot model comprising a fixed number of tokens, the method comprising: Determining a number of data frames based on a set number of tokens for encoding dynamic objects and a number of dynamic objects to be encoded in a first data frame, wherein the plurality of data frames includes the first data frame and a second data frame preceding the first data frame, and The encoding process of the dynamic object of the plurality of data frames is sequentially performed in order from near to far in time.
2. The method of claim 1, wherein the method further comprises: The set number is determined based on the fixed number and the number of static targets in the first data frame.
3. The method of claim 1, wherein the dynamic targets comprise moving targets and stationary targets, and the number of the plurality of data frames is determined in the following manner: If condition a is satisfied: Determining the number of the plurality of data frames as a maximum value; If condition B is satisfied: , determining the number of the plurality of data frames as ; If neither condition a nor B is met, the number of the plurality of data frames is determined to be a minimum value, Wherein static_count is the number of stationary objects to be encoded in the first data frame, dynamic_count is the number of moving objects to be encoded in the first data frame, n_state is the set number, h is the number of the plurality of data frames, and max_h and min_h are the maximum value and the minimum value of the number of the plurality of data frames, respectively.
4. A method as claimed in claim 3, wherein sequentially performing the encoding process of the dynamic object of the plurality of data frames comprises processing the first data frame in the following manner: If the condition A or B is met, all dynamic targets needing to be coded in the first data frame are coded, otherwise, the dynamic targets needing to be coded close to the own vehicle are coded preferentially.
5. The method of claim 4, wherein sequentially performing the encoding process of the dynamic object of the plurality of data frames comprises processing the second data frame in the following manner: If the dynamic target to be encoded in the second data frame processed currently is already present in the data frame processed previously and is a moving target, encoding the dynamic target; If the dynamic object to be encoded in the second data frame currently processed is already present in the previously processed data frame and is a stationary object, the dynamic object is not encoded, and If the dynamic target to be coded in the second data frame processed currently does not appear in the data frame processed previously, the dynamic target which is close to the own vehicle is coded preferentially.
6. The method of any of claims 1-5, wherein the dynamic object to be encoded is determined based on a direction of motion or a location of the dynamic object relative to the host vehicle.
7. An apparatus for encoding perceptual information about a target entered into an autopilot model, wherein the target comprises a dynamic target and a static target, the input of the autopilot model comprising a fixed number of tokens, the apparatus comprising: At least one processor; At least one memory, and Computer program/instructions stored on the memory, which when run on the processor, cause the following operations: determining a number of data frames based on a set number of tokens allocatable to dynamic object encoding and a number of dynamic objects to be encoded in a first data frame, wherein the plurality of data frames includes the first data frame and a second data frame preceding the first data frame, and The encoding process of the dynamic object of the plurality of data frames is sequentially performed in order from near to far in time.
8. The apparatus of claim 7, wherein execution of the computer program/instructions on the processor further causes the operations of: The set number is determined based on the fixed number and the number of static targets in the first data frame.
9. The apparatus of claim 7, wherein the dynamic targets comprise moving targets and stationary targets, and the number of the plurality of data frames is determined in a manner that: If condition a is satisfied: Determining the number of the plurality of data frames as a maximum value; If condition B is satisfied: , determining the number of the plurality of data frames as ; If neither condition a nor B is met, the number of the plurality of data frames is determined to be a minimum value, Wherein static_count is the number of stationary objects to be encoded in the first data frame, dynamic_count is the number of moving objects to be encoded in the first data frame, n_state is the set number, h is the number of the plurality of data frames, and max_h and min_h are the maximum value and the minimum value of the number of the plurality of data frames, respectively.
10. The apparatus of claim 9, wherein sequentially performing the encoding process of the dynamic object of the plurality of data frames comprises processing the first data frame in the following manner: If the condition A or B is met, all dynamic targets needing to be coded in the first data frame are coded, otherwise, the dynamic targets needing to be coded close to the own vehicle are coded preferentially.
11. The apparatus of claim 10, wherein sequentially performing the encoding process of the dynamic object of the plurality of data frames comprises processing the second data frame in the following manner: If the dynamic target to be encoded in the second data frame processed currently is already present in the data frame processed previously and is a moving target, encoding the dynamic target; If the dynamic object to be encoded in the second data frame currently processed is already present in the previously processed data frame and is a stationary object, the dynamic object is not encoded, and If the dynamic target to be coded in the second data frame processed currently does not appear in the data frame processed previously, the dynamic target which is close to the own vehicle is coded preferentially.
12. The apparatus of any of claims 7-11, wherein the dynamic object to be encoded is determined based on a direction of motion or a location of the dynamic object relative to the host vehicle.
13. A non-transitory computer readable storage medium having stored thereon a computer program/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-6.
14. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method according to any of claims 1-6.
15. A vehicle comprising an apparatus as claimed in any one of claims 7 to 12.

Description

Method and device for encoding an object to be input into an autopilot model Technical Field The present disclosure relates to the field of autopilot, and in particular to a method and apparatus for encoding perception information about an object entered into an autopilot model, a computer readable storage medium and a computer program product containing a computer program/instructions for implementing the above method, and a vehicle containing the above apparatus. Background Currently, in an end-to-end autopilot model, a perception module generates perception information of various targets (including static targets and dynamic targets) based on sensed data (e.g., image and point cloud data, etc.), which describes the position and size of the targets in physical space. The information is encoded to form a fixed number of token before it is input into the model. Dynamic objects generally refer to objects whose position or size may vary, including pedestrians, non-vehicles, and vehicles, for example. In many cases, the number of dynamic targets contained in the sensed data acquired at different times is variable, which increases the complexity of target encoding. Disclosure of Invention One embodiment of the present disclosure relates to a method for encoding perception information about a target entered into an autopilot model, wherein the target comprises a dynamic target and a static target, the autopilot model input having a fixed number of tokens, the method comprising: Determining a number of data frames based on a set number of tokens for encoding dynamic objects and a number of dynamic objects to be encoded in a first data frame, wherein the plurality of data frames includes the first data frame and a second data frame preceding the first data frame, and The encoding process of the dynamic object of the plurality of data frames is sequentially performed in order from near to far in time. Optionally, in the above method, the type of the dynamic object includes a motor vehicle, a non-motor vehicle, and a pedestrian, and the type of the static object includes a lane line, a road edge, a guardrail, a roadblock, and a static obstacle. Optionally, the method further comprises: The set number is determined based on the fixed number and the number of static targets in the first data frame. Optionally, in the above method, the first data frame and the second data frame are video frames acquired by using an image sensor or are generated by fusing sensing data of a plurality of sensors, which are any combination of image sensors, laser radars, millimeter wave radars, and ultrasonic radars. Optionally, in the above method, the dynamic target includes a moving target and a static target, and the number of the plurality of data frames is determined in a manner that: If condition a is satisfied: Determining the number of the plurality of data frames as a maximum value; If condition B is satisfied: , determining the number of the plurality of data frames as ; If neither condition a nor B is met, the number of the plurality of data frames is determined to be a minimum value, Wherein static_count is the number of stationary objects to be encoded in the first data frame, dynamic_count is the number of moving objects to be encoded in the first data frame, n_state is the set number, h is the number of the plurality of data frames, and max_h and min_h are the maximum value and the minimum value of the number of the plurality of data frames, respectively. Optionally, in the above method, sequentially performing the encoding process of the dynamic object of the plurality of data frames includes processing the first data frame in the following manner: If the condition A or B is met, all dynamic targets needing to be coded in the first data frame are coded, otherwise, the dynamic targets needing to be coded close to the own vehicle are coded preferentially. Optionally, in the above method, sequentially performing the encoding process of the dynamic object of the plurality of data frames includes processing the second data frame in the following manner: If the dynamic target to be encoded in the second data frame processed currently is already present in the data frame processed previously and is a moving target, encoding the dynamic target; If the dynamic object to be encoded in the second data frame currently processed is already present in the previously processed data frame and is a stationary object, the dynamic object is not encoded, and If the dynamic target to be coded in the second data frame processed currently does not appear in the data frame processed previously, the dynamic target which is close to the own vehicle is coded preferentially. Optionally, in the above method, the dynamic target to be encoded is determined based on a moving direction or a position of the dynamic target relative to the own vehicle. Another embodiment of the present disclosure relates to an apparatus for encoding perception information a