US-20260127755-A1 - METHOD AND APPARATUS FOR ESTIMATING DEPTH OF MOVING OBJECT, ELECTRONIC DEVICE, AND STORAGE MEDIUM

US20260127755A1US 20260127755 A1US20260127755 A1US 20260127755A1US-20260127755-A1

Abstract

Disclosed are a method and apparatus for estimating a depth of a moving object, an electronic device, and a storage medium. The method for estimating a depth of a moving object includes: determining a video processing type; determining a target processing mode for estimating the depth of the moving object based on the video processing type; and determining an estimated depth value of the moving object in a to-be-processed video frame based on the target processing mode.

Inventors

Jiawei Wen
Xiaodong Song
Hengkai GUO

Assignees

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20230824
Priority Date: 20220922

Claims (20)

1 . A method for estimating a depth of a moving object, comprising: determining a video processing type; determining a target processing mode for estimating the depth of the moving object based on the video processing type; and determining an estimated depth value of the moving object in a to-be-processed video frame based on the target processing mode.
2 . The method according to claim 1 , wherein the video processing type comprises a real-time processing type and a post-processing type.
3 . The method according to claim 2 , wherein the target processing mode comprises a depth mean estimation mode corresponding to the real-time processing type or an inverse depth estimation mode corresponding to the post-processing type.
4 . The method according to claim 1 , wherein the target processing mode comprises a depth mean estimation mode, and wherein determining the estimated depth value of the moving object in the to-be-processed video frame based on the target processing mode comprises: determining a capturing parameter corresponding to the to-be-processed video frame and a pixel point parameter of the moving object; determining a target pixel point based on the capturing parameter, the pixel point parameter, and a constraint condition; and determining the estimated depth value of the moving object based on point cloud data of the target pixel point.
5 . The method according to claim 4 , wherein determining the target pixel point based on the capturing parameter, the pixel point parameter, and the constraint condition comprises: performing triangulation processing on the capturing parameter and the pixel point parameter, to obtain the point cloud data corresponding to the pixel point parameter; determining a backprojection pixel parameter based on the point cloud data and the constraint condition; and determining the target pixel point based on the pixel point parameter and the backprojection pixel parameter.
6 . The method according to claim 4 , wherein determining the estimated depth value of the moving object based on point cloud data of the target pixel point comprises: determining at least two to-be-used video frames to which the target pixel point belongs based on the point cloud data of the target pixel point; and determining the estimated depth value of the moving object based on depth values of the target pixel point in the at least two to-be-used video frames.
7 . The method according to claim 1 , wherein the target processing mode comprises an inverse depth estimation mode, and wherein determining the estimated depth value of the moving object in the to-be-processed video frame based on the target processing mode comprises: performing triangulation processing on each to-be-processed video frame in a target video, to obtain an inverse depth value of each pixel point in each to-be-processed video frame; and determining the estimated depth value of the moving object by clustering a plurality of inverse depth values in a same to-be-processed video frame.
8 . The method according to claim 7 , wherein determining the estimated depth value of the moving object by clustering the plurality of inverse depth values in the same to-be-processed video frame comprises: determining, based on ranking the plurality of inverse depth values, a depth difference between two adjacent inverse depth values; and acquiring two target inverse depth values having a maximum depth difference, and determining the estimated depth value of the moving object based on multiple inverse depth values greater than the target inverse depth values.
9 . The method according to claim 8 , wherein before determining the estimated depth value of the moving object based on the multiple inverse depth values greater than the target inverse depth values, the method further comprises: in response to a ratio between a number of inverse depth values greater than or less than the target inverse depth values and a total number of the inverse depth values being less than a preset ratio, deleting the inverse depth values greater than or less than the target inverse depth values, and re-performing an operation of determining target inverse depth values.
10 . The method according to claim 8 , wherein determining the estimated depth value of the moving object based on the multiple inverse depth values greater than the target inverse depth values comprises: performing averaging processing on the multiple inverse depth values greater than the target inverse depth values, to obtain an inverse depth average value, and determining the estimated depth value of the moving object based on the inverse depth average value.
11 . (canceled)
12 . An electronic device, comprising: at least one processor; and a storage apparatus configured to store at least one program which, when executed by the at least one processor, configures the at least one processor to: determine a video processing type; determine a target processing mode for estimating a depth of a moving object based on the video processing type; and determine an estimated depth value of the moving object in a to-be-processed video frame based on the target processing mode.
13 . (canceled)
14 . A computer program product comprising a computer program carried on a non-transitory computer-readable medium, wherein the computer program comprises a program code configured to: determine a video processing type; determine a target processing mode for estimating a depth of a moving object based on the video processing type; and determine an estimated depth value of the moving object in a to-be-processed video frame based on the target processing mode.
15 . The electronic device according to claim 12 , wherein the video processing type comprises a real-time processing type and a post-processing type.
16 . The electronic device according to claim 15 , wherein the target processing mode comprises a depth mean estimation mode corresponding to the real-time processing type or an inverse depth estimation mode corresponding to the post-processing type.
17 . The electronic device according to claim 12 , wherein the target processing mode comprises a depth mean estimation mode, and wherein, to determine the estimated depth value of the moving object in the to-be-processed video frame based on the target processing mode, the at least one processor is configured to: determine a capturing parameter corresponding to the to-be-processed video frame and a pixel point parameter of the moving object; determine a target pixel point based on the capturing parameter, the pixel point parameter, and a constraint condition; and determine the estimated depth value of the moving object based on point cloud data of the target pixel point.
18 . The electronic device according to claim 17 , wherein, to determine the target pixel point based on the capturing parameter, the pixel point parameter, and the constraint condition, the at least one processor is configured to: perform triangulation processing on the capturing parameter and the pixel point parameter, to obtain the point cloud data corresponding to the pixel point parameter; determine a backprojection pixel parameter based on the point cloud data and the constraint condition; and determine the target pixel point based on the pixel point parameter and the backprojection pixel parameter.
19 . The electronic device according to claim 17 , wherein, to determine the estimated depth value of the moving object based on point cloud data of the target pixel point, the at least one processor is configured to: determine at least two to-be-used video frames to which the target pixel point belongs based on the point cloud data of the target pixel point; and determine the estimated depth value of the moving object based on depth values of the target pixel point in the at least two to-be-used video frames.
20 . The electronic device according to claim 12 , wherein the target processing mode comprises an inverse depth estimation mode, and wherein, to determine the estimated depth value of the moving object in the to-be-processed video frame based on the target processing mode, the at least one processor is configured to: perform triangulation processing on each to-be-processed video frame in a target video, to obtain an inverse depth value of each pixel point in each to-be-processed video frame; and determine the estimated depth value of the moving object by clustering a plurality of inverse depth values in a same to-be-processed video frame.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S) The disclosure claims the priority to Chinese Patent Application No. 202211160924.9, filed with the Chinese Patent Office on Sep. 22, 2022, which is incorporated herein by reference in its entirety. FIELD The disclosure relates to the technical field of image processing, for example, to a method and apparatus for estimating a depth of a moving object, an electronic device, and a storage medium. BACKGROUND With the development of computer vision technology, a simultaneous localization and mapping (SLAM) algorithm has been applied in a wide range of fields such as augmented reality, virtual reality, autonomous driving, and localization and navigation of robots or unmanned aerial vehicles. In the related art, by inputting an image into an SLAM system, and extracting scenario depth information from the image by means of the SLAM system, a depth of an object in the image is estimated based on the scenario depth information. However, such a method for estimating a depth is applicable to static objects only, and can hardly effectively estimate depths of dynamic objects in a video. SUMMARY The disclosure provides a method and apparatus for estimating a depth of a moving object, an electronic device, and a storage medium, to realize the effect of accurately estimating depth information of a moving object in a video. In a first aspect, the disclosure provides a method for estimating a depth of a moving object. The method includes: determining a video processing type: determining a target processing mode for estimating the depth of the moving object based on the video processing type; and determining an estimated depth value of the moving object in a to-be-processed video frame based on the target processing mode. In a second aspect, the disclosure further provides an apparatus for estimating a depth of a moving object. The apparatus includes: a video processing type determination module configured to determine a video processing type: a target processing mode determination module configured to determine a target processing mode for estimating the depth of the moving object based on the video processing type; and an estimated depth value determination module configured to determine an estimated depth value of the moving object in a to-be-processed video frame based on the target processing mode. In a third aspect, the disclosure further provides an electronic device. The electronic device includes: one or more processors; and a storage apparatus configured to store one or more programs which, when executed by the one or more processors, causes the one or more processors to implement the above method for estimating a depth of a moving object. In a fourth aspect, the disclosure further provides a storage medium. The storage medium includes a computer-executable instruction which is configured to, when executed by a computer processor, execute the above method for estimating a depth of a moving object. In a fifth aspect, the disclosure further provides a computer program product. The computer program product includes a computer program carried on a non-transitory computer-readable medium, where the computer program includes a program code configured to execute the above method for estimating a depth of a moving object. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic flowchart of a method for estimating a depth of a moving object according to an embodiment of the disclosure; FIG. 2 is a schematic flowchart of another method for estimating a depth of a moving object according to an embodiment of the disclosure: FIG. 3 is a schematic structural diagram of an apparatus for estimating a depth of a moving object according to an embodiment of the disclosure; and FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. DETAILED DESCRIPTION OF EMBODIMENTS Embodiments of the disclosure will be described below in conjunction with the accompanying drawings. Although some embodiments of the disclosure are shown in the accompanying drawings, the disclosure can be implemented in various forms, and these embodiments are provided to understand the disclosure. The accompanying drawings and the embodiments of the disclosure are merely illustrative. Various steps described in a method embodiment of the disclosure can be executed in different orders and/or in parallel. In addition, the method embodiment can include additional steps and/or will not execute the steps shown. The scope of the disclosure is not limited in this respect. The terms “comprise”, “include”, and their variations used herein indicate open-ended inclusions, i.e. “comprise, but is not limited to” and “include, but is not limited to”. The term “based on” means “at least partially based on”. The term “an embodiment” indicates “at least one embodiment”. The term “another embodiment” indicates “at least one further embodiment”. The term “some embodiments” indicates “at least