CN-115731293-B - Image data processing method, apparatus, storage medium, and program product

CN115731293BCN 115731293 BCN115731293 BCN 115731293BCN-115731293-B

Abstract

Responding to a first operation of a user on a target space area, acquiring first image information of the target space area and first pose information corresponding to the first image information; the method comprises the steps of determining position estimation information of three-dimensional space points corresponding to all pixel points in first image information in a target space area according to first pose information and first image information, determining distance information of all the three-dimensional space points in the target space area and the surface of a target object according to the first image information and the first pose information, determining depth information of all the three-dimensional space points in the target space area according to the distance information and the position estimation information, and displaying the pixel points corresponding to all the three-dimensional space points on an interactive interface in a first mode according to the depth information. The method and the device realize that which points in the image information are scanned can be determined without a depth map, and the scanned points are highlighted.

Inventors

LI CHANGLIN
LIU KUILONG

Assignees

阿里巴巴（中国）有限公司

Dates

Publication Date: 20260508
Application Date: 20221128

Claims (10)

1. A method for processing image data, the method comprising: responding to a first operation of a user on a target space region, and acquiring first image information of the target space region and first pose information corresponding to the first image information, wherein the first image information at least comprises a pixel point group obtained by shooting partial three-dimensional space points in the target space region, and the pixel point group is a two-dimensional pixel point; According to the first pose information, each pixel point in the first image information is projected into a three-dimensional space, and the obtained three-dimensional projection points are used as position estimation information of each three-dimensional space point in the target space region; Inputting the first image information and the first pose information into a preset prediction model, and outputting the distance information between each three-dimensional space point corresponding to each pixel point in the first image information and the surface of a target object in the target space region; determining a vector difference value between the vector of the position estimation information and the vector of the distance information, wherein the obtained vector difference value is used as depth information of each three-dimensional space point in the target space region; Storing all three-dimensional space points corresponding to the depth information in a scanned point set, and displaying pixel points corresponding to all three-dimensional space points on the first image information in a first mode; The predictive model is trained by: Acquiring multichannel characteristics of a sample image and pose information of a preset frame image in the sample image, wherein the preset frame image is an intermediate frame image in a sample image frame; mapping the multichannel features to a three-dimensional feature space according to pose information of the preset frame image to obtain a three-dimensional feature map of the sample image; And training a preset neural network by adopting the three-dimensional feature map to obtain the prediction model.
2. The method as recited in claim 1, further comprising: Responding to a second operation of the user on the target space region, and acquiring second image information of the target space region and second pose information corresponding to the second image information; Determining depth information of three-dimensional space points corresponding to each pixel point to be searched in the second image information in the target space region according to the second image information and the second pose information; And determining the scanning state of each pixel point to be searched according to the depth information, and displaying the scanned points and the non-scanned points in the second image information in a distinguishing way on an interactive interface.
3. The method according to claim 2, wherein determining the scanning state of each pixel to be retrieved in the second image information according to the depth information, and displaying the scanned point and the non-scanned point in the second image information on the interactive interface, includes: judging whether each pixel point to be searched in the second image information is stored in a scanned point set or not according to the depth information; and displaying the first to-be-searched point stored in the scanned point set in the second image information in the first mode on an interactive interface.
4. The method according to claim 3, wherein determining the scanning state of each pixel to be retrieved in the second image information according to the depth information, and displaying the scanned point and the non-scanned point in the second image information on the interactive interface, further comprises: And displaying a second point to be searched which is not stored in the scanned point set in the second image information on an interactive interface in a second mode, wherein the second mode is different from the first mode.
5. A method according to claim 3, wherein determining whether each pixel to be retrieved in the second image information is stored in a scanned set of points according to the depth information comprises: and searching in the scanned point set according to the corresponding depth information for each pixel point to be searched in the second image information, if the point with the depth information identical to that of the current pixel point to be searched is searched, determining that the current pixel point to be searched is stored in the scanned point set, otherwise, determining that the current pixel point to be searched is not stored in the scanned point set.
6. The method according to claim 1 or 2, further comprising: and processing all pixel points in the image information in parallel.
7. An image data processing method, comprising: responding to shooting operation of a user on an indoor scene, and acquiring first image information of the indoor scene and first pose information corresponding to the first image information; carrying out three-dimensional reconstruction on the indoor scene according to the first image information and the first pose information; the method according to any one of claims 1-6, wherein each pixel point in the first image information is processed, and scanned points and non-scanned points in the first image information are displayed differently.
8. An electronic device, comprising: At least one processor, and A memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the electronic device to perform the method of any one of claims 1-7.
9. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method of any of claims 1-7.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.

Description

Image data processing method, apparatus, storage medium, and program product Technical Field The present application relates to the field of data processing technologies, and in particular, to an image data processing method, apparatus, storage medium, and program product. Background In the decorative painting industry, about 5000 billions of pictures are consumed nationwide each year, and on average, 4-5 pictures can be hung every 100 flat. Along with the improvement of the living standard of people, the demand of people for changing decorative pictures is increasing. Traditional decorative painting replacement requires users to purchase the decorative painting on site, and is time-consuming and labor-consuming. With the development of information technology, online purchasing of pictures is a new choice for people due to the advantages of convenience and rapidness. The online shopping platform generally displays the decoration picture and description information of optional shopping for users, and the users can choose to purchase by themselves. In the process of user picture selection, a user can shoot possible picture hanging areas such as an indoor wall body and the like, a shooting result is uploaded to a platform, the user needs to judge the picture hanging position in the shooting process, the user needs to memorize the shot specific position in the process, and judges whether the shooting is completed or not, if the shooting is completed, the user needs to manually repair the missed area. The whole process is not only unable to guarantee the shooting integrity, but also low in user interaction and time-consuming and labor-consuming. Disclosure of Invention The main purpose of the embodiment of the application is to provide an image data processing method, device, storage medium and program product, which can obtain depth information of three-dimensional space points in an image without inputting a depth map, and prompt a user about which places to miss shooting by displaying scanned points in a preset pattern, so that manual judgment of the user is not needed, and the terminal interaction performance is improved. In a first aspect, an embodiment of the present application provides an image data processing method, which includes acquiring first image information of a target space area and first pose information corresponding to the first image information in response to a first operation of a user on the target space area, where the first image information at least includes pixel point groups obtained by shooting partial three-dimensional space points in the target space area, determining position estimation information of each three-dimensional space point corresponding to each pixel point in the first image information in the target space area according to the first pose information and the first image information, determining distance information between each three-dimensional space point and a target object surface in the target space area according to the first image information and the first pose information, determining depth information of each three-dimensional space point in the target space area according to the distance information and the position estimation information, and displaying pixels corresponding to each three-dimensional space point in a first mode on an interactive interface according to the depth information. In an embodiment, the determining the position estimation information of the three-dimensional space point corresponding to each pixel point in the first image information in the target space region according to the first pose information and the first image information includes performing three-dimensional projection on each pixel point in the first image information according to the first pose information, and the obtained three-dimensional projection point is used as the position estimation information of each three-dimensional space point in the target space region. In an embodiment, determining the distance information between each three-dimensional space point and the surface of the target object in the target space region according to the first image information and the first pose information includes inputting the first image information and the first pose information into a preset prediction model, and outputting the distance information between each three-dimensional space point corresponding to each pixel point in the first image information and the surface of the target object in the target space region. In one embodiment, the step of training the prediction model includes the steps of obtaining multichannel features of a sample image and pose information of a preset frame image in the sample image, mapping the multichannel features to a three-dimensional feature space according to the pose information of the preset frame image to obtain a three-dimensional feature map of the sample image, and training a preset neural network by using the three-dimensional f