CN-121989294-A - Adaptation method for visual navigation test and related equipment

CN121989294ACN 121989294 ACN121989294 ACN 121989294ACN-121989294-A

Abstract

The embodiment of the application provides a visual navigation test adaptation method and related equipment, belonging to the field of robot navigation and machine learning. The method comprises the steps of selecting actions according to current environment observation at each decision moment of the robot, actively carrying out future state simulation on candidate actions, predicting future observations possibly obtained after each action is executed, and evaluating decision uncertainty of navigation strategies in the future states. By fusing the current motion probability with the predicted future entropy, a motion is selected that can guide the robot into a low uncertainty future state. Meanwhile, a self-supervision loss function is constructed by utilizing the calculated future entropy, and navigation model parameters are updated on line in a test stage, so that the model parameters are learned to make a more distant decision. The method overcomes the limitation that the traditional method only optimizes the current certainty, can effectively avoid the robot from sinking into a local optimal path, and remarkably improves the robustness and the task success rate of long-range navigation in an unknown dynamic environment.

Inventors

TAN MINGKUI
WANG ZIXU
ZHANG HUAQUAN
XIONG ZHENDONG
HU JINWU
WEN ZHIQUAN
DU QING

Assignees

超级机器人研究院(黄埔)
华南理工大学

Dates

Publication Date: 20260508
Application Date: 20251231

Claims (10)

1. A method of adaptation at the time of visual navigation testing, the method comprising the steps of: acquiring current environmental observation of the mobile robot on the environment in the current time step; generating at least one candidate action based on the current environmental observations according to a navigation strategy model; for each candidate action, processing the current environmental observation according to the candidate action, and predicting future environmental observations to be obtained by the mobile robot after the candidate action is executed; Inputting each predicted future environmental observation into the navigation strategy model to obtain corresponding future action probability distribution, and calculating future decision uncertainty measurement of each future action probability distribution; Determining a target action to be performed from the candidate actions based at least on the future decision uncertainty measure; Controlling the mobile robot to execute the target action; And constructing a self-supervision loss function based on the future decision uncertainty measure, and updating parameters of the navigation strategy model according to the self-supervision loss function.
2. The method of claim 1, wherein the processing the current environmental observations in accordance with the candidate action predicts future environmental observations that the mobile robot will obtain after performing the candidate action, comprising at least one of: Driving an image acquisition device on the robot to perform physical movement so as to acquire actual simulated future environmental observation; Generating, by a neural rendering model, simulated future environmental observations based on the current environmental observation rendering; generating simulated future environmental observations based on the current environmental observation transformation by a geometric transformation and image restoration model; a representation of a future environment is generated based on the current environmental observations and the candidate actions by a pre-trained world model.
3. The method of claim 1, wherein the future decision uncertainty metric is information entropy 。
4. The method of claim 1, wherein the determining a target action to be performed from the candidate actions based at least on the future decision uncertainty metric comprises: For each candidate action Acquiring current execution probability given by the navigation strategy model based on the current environment observation ; Calculate each candidate action Is a comprehensive score value of (2) : Wherein, the Representing candidate actions A corresponding future decision uncertainty measure is provided, Representing a predicted future environmental observation of the vehicle, Is a balance coefficient; And selecting the candidate action with the minimum comprehensive grading value as the target action.
5. The method of claim 1, wherein the self-supervising loss function At least including future entropy loss term The calculation mode is as follows: wherein the summation traverses all candidate actions 。
6. The method of claim 5, wherein the self-supervising loss function Also included is at least one of the following canonical terms: Parameter regularization term For constraining updated model parameters Initial model parameters before beginning the test phase Is a difference in (2); knowledge distillation regularization term The method is used for restraining the difference of the updated navigation strategy model and a teacher model with fixed parameters in prediction distribution.
7. The method of claim 6, wherein the parametric regularization term The calculation mode of (a) is as follows: The knowledge distillation regularization term The calculation mode of (a) is as follows: Wherein, the Representing the output profile of the current policy, Representing the output distribution of the teacher policy.
8. The method of claim 1, wherein only parameters of a normalization layer in the model are updated when parameters of the navigation strategy model are updated.
9. An electronic device comprising a memory storing a computer program and a processor implementing the method of any of claims 1 to 8 when the computer program is executed by the processor.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any one of claims 1 to 8.

Description

Adaptation method for visual navigation test and related equipment Technical Field The application relates to the technical fields of robotics, deep learning, computer vision and robot autonomous navigation, in particular to an adaptation method and related equipment during vision navigation test. Background With the development of deep learning technology, vision-based navigation has become a key capability of mobile robots. Typically, robots rely on deep learning models (i.e., strategy models) pre-trained on large amounts of simulation or real data to output motion instructions (e.g., forward, steer) in real time based on current sensor observations (e.g., RGB images or RGB-D images). However, when deployed in an open scene of the real world, robots often encounter dynamic changes that are not seen during training, such as changes in lighting conditions, furniture position movements, new objects appearing, etc. This difference in distribution between the training environment and the testing environment can lead to a significant degradation in the performance of the pre-trained model. To solve this problem, a Test-Time Adaptation (TTA) technique is proposed. TTA allows the model to use only the test data stream during the test phase (i.e., the actual deployment phase) and adjust its own parameters online to accommodate the new environment without a real tag. Among them, the Test-Time Entropy Minimization minimum entropy (Test-Time Entropy Minimization) is a mainstream and effective TTA method. The method has the core concept that the prediction of the model to the new environment is enabled to be more definite and credible by minimizing the information entropy (Entropy) of the model to the current input prediction result, so that the self-supervision online adaptation is realized. However, in such sequence decision tasks for visual navigation, such methods have a fundamental disadvantage of decision shortness. It only optimizes the certainty of the decision at the current moment, and ignores the effect of the currently selected action on the future state entirely. This may result in the robot selecting a path that appears clear in the short term but that will introduce it in the long term into a dead or highly cluttered area (e.g., entering an entrance to a narrow corridor, with a wide field of view in the short term but no way to walk after a few steps). This short-looking decision process can severely compromise the long Cheng Daohang robustness and overall task success rate of the robot in a complex unknown environment. In the prior art, some efforts have been made to improve the performance of TTA in navigation tasks, for example, by balancing adaptation speed and stability through a fast and slow update mechanism, or by introducing an active learning mechanism to obtain artificial feedback. These approaches have not addressed the core problem of decision shortsightedness because they do not explicitly take into account the long-term consequences of actions in the decision process. Therefore, there is an urgent need for a visual navigation method that can perform online adaptation during testing and has "prospective" decision making capability, so as to overcome the deficiencies of the prior art. Disclosure of Invention The embodiment of the application mainly aims to provide a visual navigation test time adaptation method, electronic equipment, storage medium and program product based on future state motion entropy minimization, which are used for simulating and evaluating long-term influences of different motions before decision making and guiding current motion selection and model parameter updating by utilizing the information, so that the navigation success rate and the robustness of a robot in unknown and dynamic environments are remarkably improved. To achieve the above object, an aspect of an embodiment of the present application provides a visual navigation test-time adaptation method, which includes: acquiring current environmental observation of the mobile robot on the environment in the current time step; generating at least one candidate action based on the current environmental observations according to a navigation strategy model; for each candidate action, processing the current environmental observation according to the candidate action, and predicting future environmental observations to be obtained by the mobile robot after the candidate action is executed; Inputting each predicted future environmental observation into the navigation strategy model to obtain corresponding future action probability distribution, and calculating future decision uncertainty measurement of each future action probability distribution; Determining a target action to be performed from the candidate actions based at least on the future decision uncertainty measure; Controlling the mobile robot to execute the target action; And constructing a self-supervision loss function based on the future decision uncerta