CN-121190529-B - Optical flow estimation method and system based on hybrid expert network

CN121190529BCN 121190529 BCN121190529 BCN 121190529BCN-121190529-B

Abstract

The application discloses a mixed expert network-based optical flow estimation method and a system, belonging to the field of computer vision and artificial intelligence. The method comprises the steps of preprocessing input two frames of images, extracting low-resolution features through a mixed expert feature extractor (MoEE) with a sparse activation mechanism to reduce redundant calculation, performing dot product operation on the low-resolution feature images to construct a 4D correlation volume to capture a pixel motion matching relationship, then utilizing a mixed expert updater (MoEU) to iteratively update hidden states and return to residual error stream increment through a dynamic expert selection mechanism to optimize optical flow precision, and finally reconstructing a high-resolution optical flow field through a multi-scale up-sampling module to output a high-precision optical flow result. The algorithm realizes dynamic resource allocation through the MoE architecture, obviously reduces the calculation cost while guaranteeing the optical flow estimation precision, can be suitable for resource limited scenes such as automatic driving, unmanned aerial vehicles and the like, and has high-efficiency reasoning and flexible deployment.

Inventors

ZHOU ZHIHU
Zhong Sisi
YE WEIPING
WANG DANJUN
SUN YUAN
ZHAO YONGBIAO
MAO KEJI

Assignees

杭州飞引科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251124

Claims (5)

1. A hybrid expert network-based optical flow estimation method, the method comprising: The method comprises the steps of preprocessing input two-frame images, extracting low-resolution features through a mixed expert feature extractor (MoEE) to obtain a low-resolution feature map, preprocessing the input two-frame images, extracting the low-resolution features through the mixed expert feature extractor (MoEE), normalizing the input two-frame images to the range of [0,1] and adjusting the image size to be a multiple of 8, wherein the mixed expert feature extractor (MoEE) comprises a depth separable convolution layer, a layer normalization and mixed expert (MoE) layer, the mixed expert (MoE) layer is provided with a shared expert and a routing expert, calculating resources are dynamically allocated through a Top-K sparse activation strategy, and the low-resolution feature map with the dimension of preset resolution is obtained through extraction; Performing dot product operation on the low-resolution feature images of the two frames to construct 4D related volume information for capturing the motion matching relationship between the pixels, wherein the performing dot product operation on the low-resolution feature images of the two frames to construct 4D related volume information for capturing the motion matching relationship between the pixels comprises setting the low-resolution feature images of the two frames to be f1 and f2 respectively, and obtaining the motion matching relationship between the pixels by the formula Calculating a 4D correlation volume, wherein (i, j) is the pixel coordinate of the feature map f1, (k, l) is the pixel coordinate of the feature map f2, D is the feature channel dimension, corr (f 1, f 2) (i, j, k, l) is the pixel matching value of the corresponding position in the 4D correlation volume; In the training and reasoning stage, the hidden state is updated by a mixed expert updater (MoEU) through a dynamic expert selection mechanism and the residual error stream increment is regressed, wherein in the training and reasoning stage, the hidden state is updated by a mixed expert updater (MoEU) through a dynamic expert selection mechanism and the residual error stream increment is regressed, the hidden state h0 and an initial optical flow field mu 0 are initialized, the optimization steps of the preset times are iteratively executed, the optimization steps comprise searching matching characteristics from a 4D related volume according to the current optical flow field, calculating the motion characteristics through a motion encoder, inputting the motion characteristics, the current hidden state and contextual characteristics of a first frame image into the mixed expert updater (MoEU), adaptively selecting the hidden state through a dynamic routing strategy, processing the updated hidden state through an optical flow head network, regressing the residual error stream increment, updating the optical flow field based on the residual error stream increment, and the dynamic routing strategy adopts a gating mechanism and is controlled through a formula G (x) Softmax (TopK (W) g X, k)), wherein For the routing weight matrix, x is the characteristic input to the mixed expert updater (MoEU), k is the number of activated experts, the TopK function is used for screening out k experts with the largest weight, and the Softmax function is used for normalizing the weight of the screened experts; and upsampling the optimized low-resolution optical flow field through a multiscale upsampling module, reconstructing a high-resolution optical flow field, and outputting a final high-precision optical flow estimation result.
2. The method as recited in claim 1, further comprising: the mixed expert updater (MoEU) adopts a ConvNeXt backbone network, combines a dynamic routing strategy, and adaptively selects the most relevant expert to perform optical flow optimization calculation according to the motion characteristics and the context information of the input characteristics in a time sequence dimension; the multi-scale up-sampling module adopts a mode of combining bilinear interpolation and a convolution layer, performs preliminary up-sampling on the low-resolution optical flow field through bilinear interpolation, performs feature refinement on the optical flow field after preliminary up-sampling through the convolution layer, and gradually restores the high-resolution optical flow field.
3. A hybrid-expert-network-based optical flow estimation system for implementing the hybrid-expert-network-based optical flow estimation method of claim 1 or 2, comprising: The preprocessing and feature extraction module is used for preprocessing two input frames of images, extracting low-resolution features through the mixed expert feature extractor (MoEE) and outputting a low-resolution feature image, and normalizing the two input frames of images to the range of [0,1] and adjusting the size of the images to be a multiple of 8, wherein the mixed expert feature extractor (MoEE) comprises a depth separable convolution layer, a layer normalization and mixed expert (MoE) layer, the mixed expert (MoE) layer is provided with a sharing expert and a routing expert, and computing resources are dynamically allocated through a Top-K sparse activation strategy to extract the low-resolution feature image with the dimension of preset resolution; The related volume construction module is used for carrying out dot product operation on the low-resolution characteristic images of the two frames of images to construct 4D related volume information, and is also used for setting the low-resolution characteristic images of the two frames of images to be f1 and f2 respectively and passing through the formula Calculating a 4D correlation volume, wherein (i, j) is the pixel coordinate of the feature map f1, (k, l) is the pixel coordinate of the feature map f2, D is the feature channel dimension, corr (f 1, f 2) (i, j, k, l) is the pixel matching value of the corresponding position in the 4D correlation volume; the optical flow optimization module is used for iteratively optimizing an optical flow by utilizing a mixed expert updater (MoEU) in a training and reasoning stage, updating hidden states through a dynamic expert selection mechanism and returning residual error flow increment, initializing the hidden states h0 and an initial optical flow field mu 0, iteratively executing optimization steps for preset times, wherein the optimization steps comprise searching matching features from a 4D related volume according to a current optical flow field, calculating the moving features through a motion encoder, inputting the moving features, the current hidden states and contextual features of a first frame image into the mixed expert updater (MoEU), adaptively selecting the hidden states by adopting a dynamic routing strategy, processing the updated hidden states through an optical flow head network, returning residual error flow increment, updating the optical flow field based on the residual error flow increment, wherein the dynamic routing strategy adopts the gating mechanism, and is characterized in that the equation G (x) =softmax (TopK (W) g X, k)), wherein For the routing weight matrix, x is the characteristic input to the mixed expert updater (MoEU), k is the number of activated experts, the TopK function is used for screening out k experts with the largest weight, and the Softmax function is used for normalizing the weight of the screened experts; and the high-resolution reconstruction module is used for upsampling the optimized low-resolution optical flow field through the multiscale upsampling module, reconstructing the high-resolution optical flow field and outputting a final high-precision optical flow estimation result.
4. The system of claim 3, further comprising a hardware adaptation module, wherein the hardware adaptation module is configured to adjust the number of experts in the hybrid expert network and the number of active experts in the Top-K sparse activation policy according to the computing power configuration of different hardware devices, so as to implement the deployment of the method on different performance hardware.
5. A computer readable storage medium, characterized in that it has stored therein a computer program that is loaded and executed by a processor to implement the hybrid expert network based optical flow estimation method according to claim 1 or 2.

Description

Optical flow estimation method and system based on hybrid expert network Technical Field The embodiment of the application relates to the field of computer vision and artificial intelligence, in particular to a method and a system for estimating optical flow based on a hybrid expert network. Background The invention relates to the field of computer vision and artificial intelligence, in particular to the requirements of high efficiency and high precision modeling of an optical flow estimation task, which are suitable for resource-limited scenes such as automatic driving, robot vision, unmanned aerial vehicle perception and the like. Optical flow estimation is a fundamental task of computer vision, and aims to calculate pixel-level motion vectors (optical flow fields) between adjacent frames of video, and provide core support for downstream motion analysis, target tracking and scene understanding. In recent years, the optical flow estimation method (such as RAFT and improved algorithm thereof) based on deep learning improves the estimation precision by constructing a 4D related volume and cyclic updating mechanism, but needs a large amount of calculation resources and memory, has large parameter quantity and intensive floating point operation, and is difficult to be deployed on edge equipment with limited calculation power (such as a vehicle terminal and an unmanned aerial vehicle main control). In order to balance precision and efficiency, the prior art tries two optimization paths, namely, designing a lightweight network (such as Fast-FlowNet, liteFlowNet), accelerating reasoning by reducing parameters and operand, but often accompanying a significant decline in precision, and adopting a transducer variant or a diffusion model (such as MambaFlow, flowDiffuser) to promote modeling capability, but still not solving the problem of unreasonable global computing resource allocation, namely, adopting a 'static unified processing' mode in most methods, allocating the same computing resources to all image areas, resulting in more redundant computation and poor expandability, and being unable to adapt to the computing power difference of different hardware. The hybrid expert network (MoE) realizes global dynamic calculation through a 'sparse expert activation' mechanism, so that the calculation cost can be reduced while the precision is ensured (such as the efficient resource allocation capability demonstrated by DeepSeekMoE), but the prior art does not effectively integrate the dynamic calculation into the whole flow of optical flow estimation, namely, the end-to-end MoE architecture of 'feature extraction-correlation matching-optical flow optimization-up sampling' is not formed due to the lack of MoE module design (such as an expert cooperation mechanism of feature extraction and iterative optimization) aiming at optical flow tasks, and the optical flow estimation still faces the double bottlenecks of 'insufficient precision' and 'low efficiency' under a resource limited scene. Therefore, there is a need to develop an optical flow estimation algorithm and system based on a hybrid expert network, which combines dynamic resource allocation with a lightweight module, so as to ensure high accuracy and reduce the calculation cost, and realize efficient deployment in a resource-constrained scene. Disclosure of Invention The embodiment of the application provides a method and a system for estimating optical flow based on a hybrid expert network. The technical scheme is as follows: According to one aspect of the present application, there is provided a hybrid expert network-based optical flow estimation method, including: Preprocessing the input two frames of images, and extracting low-resolution features through a mixed expert feature extractor (MoEE) to obtain a low-resolution feature map; performing dot product operation on the low-resolution feature images of the two frames of images, and constructing 4D related volume information for capturing a motion matching relationship between pixels; In the training and reasoning stage, a mixed expert updater (MoEU) is utilized to iteratively optimize the optical flow, the hidden state is updated through a dynamic expert selection mechanism, and the residual error flow increment is regressed; and upsampling the optimized low-resolution optical flow field through a multiscale upsampling module, reconstructing a high-resolution optical flow field, and outputting a final high-precision optical flow estimation result. Optionally, the preprocessing the two input frames of images, extracting low resolution features by a hybrid expert feature extractor (MoEE), includes: normalizing the input two frames of images to the range of 0,1, and adjusting the size of the images to be a multiple of 8; The mixed expert feature extractor (MoEE) comprises a depth separable convolution layer, a layer normalization and mixed expert (MoE) layer, wherein the mixed expert (MoE) layer is provided with