CN-122002029-A - Video encoding method, video encoding device, electronic device, storage medium and program product

CN122002029ACN 122002029 ACN122002029 ACN 122002029ACN-122002029-A

Abstract

The present disclosure relates to video encoding methods, apparatus, and electronic devices, storage media, and program products. The video coding method comprises the steps of sampling continuous frames in a video stream to be processed to obtain sampled frames and non-sampled frames, detecting interesting regions ROI of the sampled frames based on touch drive and/or content perception, identifying the ROI regions in the sampled frames, multiplexing or fine-tuning the ROI regions in the non-sampled frames through lightweight inter-frame difference analysis and ROI state judgment, and carrying out differential coding on the ROI regions and the non-ROI regions in the sampled frames and the non-sampled frames when video coding is carried out. The method and the device can ensure the video quality of the key area and simultaneously can obviously reduce the overall calculation and coding cost.

Inventors

MA HAISHOU
YU WENQING
CHEN ZHENXU
ZHOU SHAOTAO
WANG ZIHUI
GAO WEIJIE
Zhong Fengfeng
LI KUANG
CHEN MANTING

Assignees

中移互联网有限公司
中国移动通信集团有限公司

Dates

Publication Date: 20260508
Application Date: 20251225

Claims (11)

1. A video encoding method, comprising: sampling continuous frames in the video stream to be processed to obtain sampled frames and non-sampled frames; Detecting a region of interest (ROI) of the sampling frame based on touch driving and/or content perception, and identifying an ROI region in the sampling frame; multiplexing or fine-tuning the ROI area in the non-sampling frame through lightweight inter-frame difference analysis and ROI state judgment; And in video coding processing, distinguishing and coding the ROI region and the non-ROI region in the sampling frame and the non-sampling frame.
2. The method of claim 1, wherein sampling successive frames in the video stream to be processed comprises: determining a sampling interval according to the user interaction frequency, the image frame difference, the coding rate and the current available bandwidth; Sampling the continuous frames in the video stream to be processed according to the sampling interval.
3. The method of claim 1, wherein the detecting the ROI of interest for the sample frame based on touch driving and/or content perception, identifying the ROI area in the sample frame, comprises: Performing ROI detection on the sampling frame based on high information density region detection of image entropy and/or motion region detection of pixel difference under the condition that a touch event is not captured by a client, obtaining a first candidate ROI based on content perception as the ROI region in the sampling frame, or Under the condition that the client captures a touch event, acquiring the touch event captured by the client, performing coordinate mapping, event type matching and region generation according to the touch event to obtain a second candidate ROI region based on the touch event, performing ROI detection on the sampling frame based on high information density region detection of image entropy and/or motion region detection based on pixel difference to obtain a third candidate ROI region based on content perception, and determining the ROI region in the sampling frame according to the second candidate ROI region and the third candidate ROI region.
4. The method of claim 1, wherein multiplexing or fine-tuning the ROI area in the non-sampled frame by lightweight inter-frame difference analysis and ROI status determination comprises: determining a global frame difference intensity index of the non-sampling frame based on the frame difference of the non-sampling frame and the frame above the non-sampling frame; Under the condition that the global frame difference intensity index is larger than a first threshold and smaller than or equal to a second threshold, determining that the ROI area of the previous frame is effective according to the ROI state, starting an optical flow algorithm to finely tune the ROI area of the previous frame, and taking the finely tuned ROI area as the ROI area of the non-sampling frame, wherein the finely tuned mode comprises optical flow translation and/or size stretching; And multiplexing the ROI area of the previous frame as the ROI area of the non-sampling frame under the condition that the global frame difference intensity index is smaller than or equal to the first threshold value.
5. The method according to claim 4, wherein the method further comprises: and determining the global region of the non-sampling frame as a non-ROI region in the case that the global frame difference intensity index is greater than a third threshold or the global frame difference intensity index is greater than the second threshold and less than or equal to the third threshold.
6. The method according to claim 1 or 4, characterized in that the method further comprises: updating the ROI state of the video frame in the video stream to be processed according to the global frame difference intensity index between frames and the result of the ROI time consistency check, wherein the ROI time consistency check is used for verifying the time continuity of the ROI so as to keep the ROI stable among different frames, and the ROI state comprises an effective state, an invalid state and a delay state, and the delay state is used for indicating that the ROI region is not reusable.
7. The method according to claim 1, wherein the method further comprises: and embedding the ROI information into supplemental enhancement information SEI of a video frame in the video stream to be processed for transmission, wherein the ROI information comprises position information and/or size information of the ROI region.
8. A video encoding apparatus, comprising: The frame analysis module is used for sampling continuous frames in the video stream to be processed to obtain sampled frames and non-sampled frames; The ROI identification module is used for detecting a region of interest (ROI) of the sampling frame based on touch control drive and/or content perception and identifying an ROI in the sampling frame; The ROI identification module is further used for multiplexing or fine-tuning the ROI area through lightweight inter-frame difference analysis and ROI state judgment in the non-sampling frame; And the coding module is used for carrying out differentiated coding on the ROI region and the non-ROI region in the sampling frame and the non-sampling frame when carrying out video coding processing.
9. An electronic device, comprising: One or more processors; Wherein the electronic device is adapted to perform the method of any of claims 1-7.
10. A storage medium storing instructions that, when executed on an electronic device, cause the electronic device to perform the method of any one of claims 1-7.
11. A program product comprising at least one of a program, instructions, characterized in that the at least one of a program, instructions, when executed by an electronic device, implements the steps of the method of any of claims 1-7.

Description

Video encoding method, video encoding device, electronic device, storage medium and program product Technical Field The present disclosure relates to the field of cloud computing and video processing, and in particular, to a video encoding method, apparatus, electronic device, storage medium, and program product. Background With the development of cloud computing and virtualization technologies, cloud mobile phone services are becoming an emerging mobile application solution. The cloud mobile phone is a mobile phone running a local storage, calculation, rendering and migration cloud, an operating system and various applications on a cloud end together, and can be controlled in multiple ends such as APP (Application), H5 (HTML 5, referring to a mobile terminal interaction page), an applet and the like. The user can install and run various application programs on the cloud mobile phone, such as games, video playing, news information and the like, and quickly access the cloud mobile phone resources at the mobile device side. In the existing cloud mobile phone system, the cloud mobile phone and the user terminal can interact through video streams, and usually, the cloud mobile phone generates video stream codes and sends the video stream codes to the user terminal. However, the cloud mobile phone video stream coding mode in the related technology generally has the defects of large overall calculation overhead and high coding response time delay, and is difficult to adapt to the actual deployment environment with high frame rate and low time delay of the cloud mobile phone. Disclosure of Invention The disclosure provides a video coding method, a video coding device, an electronic device, a storage medium and a program product, which can solve the problems of large overall calculation overhead, high coding response time delay and the like in a cloud mobile phone video stream coding mode in the related technology. In a first aspect, an embodiment of the present disclosure provides a video encoding method, including: sampling continuous frames in the video stream to be processed to obtain sampled frames and non-sampled frames; Detecting a region of interest (ROI) of the sampling frame based on touch driving and/or content perception, and identifying an ROI region in the sampling frame; multiplexing or fine-tuning the ROI area in the non-sampling frame through lightweight inter-frame difference analysis and ROI state judgment; And in video coding processing, distinguishing and coding the ROI region and the non-ROI region in the sampling frame and the non-sampling frame. In a second aspect, an embodiment of the present disclosure provides a frequency encoding apparatus, including: The frame analysis module is used for sampling continuous frames in the video stream to be processed to obtain sampled frames and non-sampled frames; The ROI identification module is used for detecting a region of interest (ROI) of the sampling frame based on touch control drive and/or content perception and identifying an ROI in the sampling frame; The ROI identification module is further used for multiplexing or fine-tuning the ROI area through lightweight inter-frame difference analysis and ROI state judgment in the non-sampling frame; And the coding module is used for carrying out differentiated coding on the ROI region and the non-ROI region in the sampling frame and the non-sampling frame when carrying out video coding processing. In a third aspect, an embodiment of the present disclosure provides an electronic device, including: One or more processors; wherein the electronic device is configured to perform the method of the foregoing first aspect. In a fourth aspect, embodiments of the present disclosure provide a storage medium storing instructions that, when executed on an electronic device, cause the electronic device to perform the method of the first aspect. In a fifth aspect, embodiments of the present disclosure provide a program product comprising at least one of a program, instructions, which when executed by an electronic device, implement the steps of the method of the first aspect. According to the technical scheme, complex image content analysis can be not needed for each frame of image in the video stream, repeated execution of high-load image analysis operation can be avoided, the requirement of low time delay of a cloud mobile phone scene can be met, and stable tracking of the ROI can be realized by combining with the ROI state. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Drawings The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. Fig. 1 is a flowchart illustrating a video encoding method accordin