CN-121985153-A - Video picture processing terminal and video picture processing method

CN121985153ACN 121985153 ACN121985153 ACN 121985153ACN-121985153-A

Abstract

The application provides a video picture processing terminal which comprises an image processing module, a visual processing module, a voice processing module, a decision processing module and a picture processing module. The image processing module is used for extracting picture characteristic information from image picture information, wherein the picture characteristic information comprises character information and object information. The visual processing module is used for extracting visual characteristic information from the real-time local image. The voice processing module is used for extracting voice characteristic information from the real-time voice information. The decision processing module is used for deciding the updated content of the basic image picture according to the picture characteristic information, the visual characteristic information and the voice characteristic information. The picture processing module is used for generating an enhanced image picture according to the updated content and the basic image picture, and the enhanced image picture is used as a video picture to be transmitted to an external remote device.

Inventors

ZHANG ZHIXIONG
YANG GUIRONG
YANG QIAOYIN

Assignees

昆山联滔电子有限公司

Dates

Publication Date: 20260505
Application Date: 20260320

Claims (20)

1. A video picture processing terminal, comprising: The image processing module extracts picture characteristic information from image picture information, wherein the picture characteristic information comprises character information and object information; the visual processing module extracts visual characteristic information from the real-time local image; the voice processing module extracts voice characteristic information from the real-time voice information; A decision processing module for deciding the updated content of the basic image picture according to the picture characteristic information, the visual characteristic information and the voice characteristic information, and And the picture processing module generates an enhanced image picture according to the updated content and the basic image picture, and the enhanced image picture is used as a video picture to be transmitted to an external remote device.
2. The video picture processing terminal of claim 1, wherein the image picture information is an application display picture of an electronic device, and the image picture information includes the base image picture.
3. The video frame processing terminal of claim 1, wherein the image frame information is the real-time local image, the image processing module receives the real-time local image, and the image frame range and the text information and the object information in the image frame range are recognized from the real-time local image.
4. The video picture processing terminal of claim 1, wherein the image processing module corrects a plurality of the text information in the image picture information.
5. The video frame processing terminal of claim 1, wherein the visual processing module is configured to determine a feature type of the real-time local image and confirm an indication coordinate or a gesture command corresponding to the feature type, and the visual feature information includes the indication coordinate and/or the gesture command.
6. The video picture processing terminal of claim 1, wherein the speech processing module is configured to convert the real-time speech information into text content and perform semantic analysis on the text content to generate the speech feature information.
7. The video frame processing terminal of claim 1, wherein the frame processing module is configured to confirm primitive coordinates of an indication point according to the updated content, adjust a parameter configuration of the indication point, and generate the enhanced image frame including the indication point based on the primitive coordinates and the parameter configuration.
8. The video picture processing terminal of claim 7, wherein the picture processing module is configured to enable a history coordinate cache to generate transition coordinates from linear differences, the transition coordinates being the primitive coordinates.
9. The video frame processing terminal of claim 1, wherein the frame processing module is configured to determine positioning information and an operation type according to the updated content, determine the corresponding text information and/or the object information according to the positioning information, and adjust a visual effect of the text information and/or the object information according to the operation type to generate the enhanced image frame.
10. The video picture processing terminal of claim 1, wherein the picture processing module generates the base image picture with the text information and the object information.
11. The video picture processing terminal of claim 1, wherein the video picture processing terminal is coupled to the electronic device via a standard interface.
12. A video frame processing method, comprising: Extracting picture characteristic information from image picture information, wherein the picture characteristic information comprises text information and object information, visual characteristic information is extracted from real-time local images, and voice characteristic information is extracted from real-time voice information; determining updated content of the basic image picture according to the picture characteristic information, the visual characteristic information and the voice characteristic information, and And generating an enhanced image picture according to the updated content and the basic image picture, wherein the enhanced image picture is used as a video picture for transmitting an external remote device.
13. The method according to claim 12, wherein the image information is a software display of an electronic device, and the image information includes the basic image.
14. The method according to claim 12, wherein when the image frame information is the real-time local image, an image frame range and the text information and the object information in the image frame range are recognized from the real-time local image.
15. The method according to claim 12, wherein the basic image is generated based on the text information and the object information.
16. The method according to claim 12, wherein the feature type of the real-time local image is determined, and an indication coordinate or a gesture command corresponding to the feature type is confirmed, and the visual feature information includes the indication coordinate and/or the gesture command.
17. The video picture processing method of claim 12, wherein the real-time speech information is converted into text content and the text content is semantically analyzed to generate the speech feature information.
18. The video picture processing method according to claim 12, wherein the video picture processing method comprises: Confirming the primitive coordinates of the indication points according to the updated content; adjusting the parameter configuration of the indication point, and And generating the enhanced image picture comprising the indication points based on the primitive coordinates and the parameter configuration.
19. The method of claim 18, wherein a history coordinate cache is enabled to generate transition coordinates from linear differences, the transition coordinates being the primitive coordinates.
20. The video picture processing method according to claim 12, wherein the video picture processing method comprises: determining positioning information and operation type according to the updated content; determining the corresponding text information and/or the object information according to the positioning information, and And updating the basic image picture according to the operation type to generate the enhanced image picture.

Description

Video picture processing terminal and video picture processing method Technical Field The present application relates to the field of video frame processing, and more particularly, to a video frame processing terminal and a video frame processing method. Background At present, remote conferences or remote teaching are often carried out in a mode of sharing a screen or playing a brief report, and instant images of a lecturer are taken in an auxiliary mode, so that remote participants can acquire demonstration contents. Typically, during the sharing process, the presenter adds a mark or indication to the shared content (e.g., presentation or file) via a mouse or other input means to prompt the remote participants. However, the input tool is controlled during the sharing process, and the presenter cannot concentrate on the sharing of the content, which causes inconvenience to the presenter in explaining the display. In addition, in the case of live projection, the presenter may also indicate the content of the projection by means of a laser pen or a pointer. However, the remote participant can only roughly determine the location indicated by the presenter from the live image, and cannot accurately determine the content indicated by the presenter. Therefore, it is one of the problems to be solved in the art to provide a terminal and method for increasing interactivity and indicating clarity. Disclosure of Invention The embodiment of the application provides a video picture processing terminal which comprises an image processing module, a visual processing module, a voice processing module, a decision processing module and a picture processing module. The image processing module is used for extracting picture characteristic information from image picture information, wherein the picture characteristic information comprises character information and object information. The visual processing module is used for extracting visual characteristic information from the real-time local image. The voice processing module is used for extracting voice characteristic information from the real-time voice information. The decision processing module is used for deciding the updated content of the basic image picture according to the picture characteristic information, the visual characteristic information and the voice characteristic information. The picture processing module is used for generating an enhanced image picture according to the updated content and the basic image picture, and the enhanced image picture is used as a video picture to be transmitted to an external remote device. The embodiment of the application provides a video picture processing method, which comprises the steps of extracting picture characteristic information from image picture information, wherein the picture characteristic information comprises text information and object information, extracting visual characteristic information from a real-time local image, extracting voice characteristic information from real-time voice information, determining updating content of a basic image picture according to the picture characteristic information, the visual characteristic information and the voice characteristic information, and generating an enhanced image picture according to the updating content and the basic image picture, wherein the enhanced image picture is used as a video picture to be transmitted to an external remote device. The embodiment of the application provides a video picture processing terminal which comprises a memory, an image capturing device and a microprocessor. The memory stores programs of the video frame processing system. The image capturing device is used for obtaining a real-time local image. The microprocessor is coupled with the memory and the image capturing device and used for executing a program of a video image processing system to realize a corresponding video image processing method, and the video image processing method comprises the steps of extracting image characteristic information from image information, wherein the image characteristic information comprises text information and object information, extracting visual characteristic information from the real-time local image, extracting voice characteristic information from the real-time voice information, determining updated contents of a basic image according to the image characteristic information, the visual characteristic information and the voice characteristic information, and generating an enhanced image according to the updated contents and the basic image, wherein the enhanced image is used as the video image to be transmitted to an external far-end device. According to the embodiment of the application, the updating content of the basic image picture is determined according to the picture characteristic information, the visual characteristic information and the voice characteristic information, and the enhanced image picture is automatically generated