US-12626367-B2 - Video image processing method and apparatus

US12626367B2US 12626367 B2US12626367 B2US 12626367B2US-12626367-B2

Abstract

Embodiments of this application disclose a video image processing method and apparatus. A specific solution is as follows: obtaining identity information and location information of each subject in an i th video image frame; determining M main subjects from the i th video image frame based on identity information of subjects in N video image frames before the i th video image frame, where the identity information of the subjects in the N video image frames includes identity information of the M main subjects; cropping the i th video image frame based on location information of the main subjects, where a cropped i th video image frame includes the M main subjects; and scaling down or scaling up the cropped i th video image frame, so that a display displays the cropped i th video image frame based on a preset display specification.

Inventors

Yong Wu
Houqiang ZHAO
Wei Song

Assignees

HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20240917
Priority Date: 20190831

Claims (19)

1 . A video image processing method comprising: obtaining first information and location information of each subject of a plurality of subjects in an i th video image frame, wherein i is greater than 1, wherein the first information is used to uniquely identify the subject; determining M main subjects from the i th video image frame based on the first information of each subject in N video image frames before the i th video image frame, wherein M and N are greater than or equal to 1, and the first information of each subject in the N video image frames comprises first information of the M main subjects; cropping the i th video image frame based on location information of the main subjects, wherein a cropped i th video image frame comprises the M main subjects; and scaling down or scaling up the cropped i th video image frame, so that a display displays the cropped i th video image frame based on a preset display specification.
2 . The method according to claim 1 , wherein the determining M main subjects from the i th video image frame comprises: determining that among the N video image frames a quantity of image frames comprising a subject is greater than or equal to a first preset threshold.
3 . The method according to claim 1 , wherein the N video image frames are collected by one or more cameras, and the cameras are from one or more electronic devices.
4 . The method according to claim 1 , wherein the method further comprises: dividing the i th video image frame into Y areas, and configuring a preset threshold corresponding to each of the Y areas, wherein a preset threshold corresponding to a kth area of the Y areas is a kth preset threshold, the kth area is any area in the Y areas, Y is greater than or equal to 2, and k is greater than or equal to 1 and less than or equal to Y; and the determining M main subjects from the i th video image frame comprises: determining that among the N video image frames a quantity of video image frames comprising a subject is greater than or equal to preset thresholds corresponding to areas in which the subject is located.
5 . The method according to claim 1 , wherein the method further comprises: obtaining subject information of each subject of the plurality of subjects in the i th video image frame, wherein the subject information comprises one or more of the following information: information about whether that subject speaks, or priority information; and the determining M main subjects from the i th video image frame comprises: determining that among the N video image frames a quantity of video image frames comprising the subject speaking is greater than or equal to a second preset threshold; or determining that among the N video image frames priority information of the subject is greater than a third preset threshold.
6 . The method according to claim 1 , wherein the cropping the i th video image frame comprises: determining a cropping box comprising a minimum external rectangular frame of the M main subjects; and cropping the i th video image frame by using the cropping box.
7 . The method according to claim 6 , wherein the determining the cropping box comprises: obtaining a distance between a center point of a first to-be-selected cropping box and a center point of a cropping box of a previous video image frame, wherein the first to-be-selected cropping box comprises the minimum external rectangular frame of the M main subjects; determining a second cropping box based on the distance being greater than or equal to a distance threshold, wherein a center point of the second cropping box is the center point of the cropping box of the previous video image frame plus an offset, and a size of the second cropping box is the same as a size of the cropping box of the previous video image frame; and when the second cropping box comprises the minimum external rectangular frame, using a third cropping box as the cropping box, wherein the third cropping box is the second cropping box, or the third cropping box is a cropping box obtained by narrowing the second cropping box to comprise the minimum external rectangular frame; or when the second cropping box does not comprise the minimum external rectangular frame, enlarging the second cropping box to comprise the minimum external rectangular frame, and using an enlarged second cropping box as the cropping box.
8 . The method according to claim 1 , further comprising: displaying the cropped i th video image frame based on the preset display specification.
9 . A non-transitory computer-readable storage medium comprising: computer software instructions, wherein when the computer software instructions are run on an electronic device, the electronic device is enabled to perform the video image processing method according to claim 1 .
10 . A computer comprising a computer program product, wherein when the computer program product is run on the computer, the computer is enabled to perform the video image processing method according to claim 1 .
11 . A video image processing apparatus comprising: an obtaining circuit, configured to obtain first information and location information of each subject of a plurality of subjects in an i th video image frame, wherein i is greater than 1, wherein the first information is used to uniquely identify the subject; a determining circuit, configured to determine M main subjects from the i th video image frame based on first information of each subject in N video image frames before the i th video image frame, wherein M and N are greater than or equal to 1; a cropping circuit, configured to crop the i th video image frame based on location information that is of the main subjects and that is determined by the determining circuit, wherein a cropped i th video image frame comprises the M main subjects, and the first information of each subject in the N video image frames comprises first information of the M main subjects; and a scaling circuit, configured to scale down or scale up the cropped i th video image frame, so that a display displays the cropped i th video image frame based on a preset display specification.
12 . The apparatus according to claim 11 , wherein the determining circuit is configured to: determine that a subject of the plurality of subjects is one of the M main subjects by determining that among the N video image frames a quantity of image frames comprising the subject is greater than or equal to a first preset threshold.
13 . The apparatus according to claim 11 , wherein the N video image frames are collected by one or more cameras, and the cameras are from one or more electronic devices.
14 . The apparatus according to claim 11 , wherein the determining circuit is configured to: divide the i th video image frame into Y areas, and configure a preset threshold corresponding to each of the Y areas, wherein a preset threshold corresponding to a k th area of the Y areas is a k th preset threshold, the k th area is any area in the Y areas, Y is greater than or equal to 2, and k is greater than or equal to 1 and less than or equal to Y; and determine a subject of the plurality of subjects is one of the M main subjects by determining that among the N video image frames a quantity of video image frames comprising the subject is greater than or equal to preset thresholds corresponding to areas in which the subject is located.
15 . The apparatus according to claim 11 , wherein the obtaining circuit is further configured to: obtain subject information of each subject of the plurality of subjects, wherein the subject information comprises one or more of the following information: information about whether that subject speaks and priority information; and the determining circuit is configured to: determine a subject of the plurality of subjects in the i th video image frame is one of the M main subjects by determining that among the N video image frames a quantity of video image frames comprising the subject speaking is greater than or equal to a second preset threshold; or determine that the subject of the plurality of subjects in the i th video image frame is one of the M main subjects by determining that among the N video image frames priority information of the subject is greater than a third preset threshold.
16 . The apparatus according to claim 11 , wherein the cropping circuit is configured to: determine a cropping box comprising a minimum external rectangular frame of the M main subjects; and crop the i th video image frame by using the cropping box.
17 . The apparatus according to claim 16 , wherein the cropping circuit is configured to: obtain a distance between a center point of a first to-be-selected cropping box and a center point of a cropping box of a previous video image frame, wherein the first to-be-selected cropping box comprises the minimum external rectangular frame of the M main subjects; determine a second cropping box based on the distance being greater than or equal to a distance threshold, wherein a center point of the second cropping box is the center point of the cropping box of the previous video image frame plus an offset, and a size of the second cropping box is the same as a size of the cropping box of the previous video image frame; and when the second cropping box comprises the minimum external rectangular frame, use a third cropping box as the cropping box, wherein the third cropping box is the second cropping box, or the third cropping box is a cropping box obtained by narrowing the second cropping box to comprise the minimum external rectangular frame; or when the second cropping box does not comprise the minimum external rectangular frame, enlarge the second cropping box to comprise the minimum external rectangular frame, and use an enlarged second cropping box as the cropping box.
18 . The apparatus according to claim 11 , wherein the apparatus further comprises: a display, configured to display the cropped i th video image frame based on the preset display specification.
19 . An electronic device comprising: a processor coupled to a memory storing computer program code comprising computer instructions, and when the computer instructions are executed by the processor, the electronic device is enabled to perform a video image processing method comprising: obtaining first information and location information of each subject of a plurality of subjects in an i th video image frame, wherein i is greater than 1, wherein the first information is used to uniquely identify the subject; determining M main subjects from the i th video image frame based on first information of each subject in N video image frames before the i th video image frame, wherein M and N are greater than or equal to 1, and the first information of each subject in the N video image frames comprises first information of the M main subjects; cropping the i th video image frame based on location information of the main subjects, wherein a cropped i th video image frame comprises the M main subjects; and scaling down or scaling up the cropped i th video image frame, so that a display displays the cropped i th video image frame based on a preset display specification.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 17/680,889, filed on Feb. 25, 2022, which is a continuation of International Application No. PCT/CN2020/087634, filed on Apr. 28, 2020, which claims priority to Chinese Patent Application No. 201910819774.X, filed on Aug. 31, 2019. All of the afore-mentioned patent applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This application relates to the field of image processing, and in particular, to a video image processing method and apparatus. BACKGROUND With the rapid development of image technologies, users have greater requirements for displaying video pictures. For example, the user has a greater requirement for video picture display in a video call process and video picture display in a surveillance scenario. A conventional video collection and display process is as follows: A collection device collects a video image, correspondingly crops and scales the collected video image based on a display specification, then encodes the video image, and sends an encoded image to a display device for display. Usually, collection and display are implemented based on a fixed hardware platform, and a video image of a fixed field of view is collected by a collection camera. When a location of a subject on a collection side changes, because the collection camera does not perceive the subject, a picture on a display side is always displayed in a fixed field of view. Therefore, an effect of “a picture moves with a subject” cannot be achieved, and user experience is poor. In view of this, a subject sensing technology is applied to an image collection and display process in the industry. A specific solution is as follows: A camera performs large-resolution collection based on a fixed field of view, performs human body detection and tracking on a collected video image by using the subject sensing technology, and positions a location of a subject in real time, so that when the location of the subject moves, corresponding cropping and scaling can be performed on a large-resolution video image based on a real-time location of the subject (a location of the subject after movement), to obtain a small-resolution image that adapts to a display specification and in which the subject is located in a specific area in the image, thereby implementing real-time adjustment of a displayed picture based on the location of the subject, and achieving the effect of “a picture moves with a subject”. However, when a device environment on the collection side is complex (for example, a background picture is complex or another subject frequently enters or leaves a picture), in the foregoing method, erroneous detection and missing detection may occur and result in inaccurate subject locations positioned in some frames, the cropped and scaled small-resolution image cannot display or cannot completely display the subject, and accordingly pictures of a presented main subject are non-consecutive. SUMMARY This application provides a video image processing method and apparatus, to implement a consecutive effect of “a picture moves with a subject” of displayed pictures during a video call. To achieve the foregoing objective, the following technical solutions are used in this application. According to a first aspect, a video image processing method is provided. The method may include: obtaining identity information and location information of each subject in an ith video image frame, where i is greater than 1; determining M main subjects from the ith video image frame based on identity information of subjects in N video image frames before the ith video image frame, where M and N are greater than or equal to 1; cropping the ith video image frame based on location information of the main subjects, where a cropped ith video image frame includes the M main subjects; and scaling down or scaling up the cropped ith video image frame, so that a display displays the cropped ith video image frame based on a preset display specification. According to the video image processing method provided in this application, a main subject of a video image is determined in combination with identity information of a subject in a current image frame and identity information of subjects in N video image frames before the current frame, so that accuracy of a subject sensing process is greatly improved, and accuracy of a determined location of the main subject is correspondingly improved. In this way, it can be ensured that the main subject can be completely displayed in a small-resolution image obtained after cropping and scaling based on the main subject, to ensure consecutive presented pictures of the main subject, and implement, through software, a consecutive effect of “a picture moves with a subject” of pictures in an image collection and display process. The identity information of the subject is used to uniquely indicate a same subje