CN-121979465-A - Display device control method, display device and computer program product

CN121979465ACN 121979465 ACN121979465 ACN 121979465ACN-121979465-A

Abstract

The application is suitable for the technical field of display equipment, and provides a display equipment control method, display equipment and a computer program product, comprising the steps of identifying a received voice control instruction to obtain an intention identification result; based on the intention recognition result, a matching path is determined from the constructed control path instruction database, each path data stored in the control path instruction database comprises an application name, a function description, an operation path, a coordinate sequence and a semantic index, and an operation corresponding to the matching path is executed. The application does not need to do image recognition any more in the real-time interaction stage of the user voice control, determines the matching path through the control path instruction database, obviously reduces the recognition delay, and simultaneously, because the matching path comprises a coordinate sequence, the application can realize operation through the coordinate sequence, replaces interface control and realizes stable adaptation across platforms. Therefore, the application solves the problems of difficult cross-platform adaptation and recognition delay in the prior art.

Inventors

BIAN ZHONGLIN
XIAO ZHOU
XIAO DAOCAN
Ding Chongbin

Assignees

深圳市艾比森光电股份有限公司
惠州市艾比森光电有限公司

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. A display device control method, characterized by comprising: identifying the received voice control instruction to obtain an intention identification result; determining a matching path from a constructed control path instruction database based on the intention recognition result, wherein the control path instruction database is a structured database for storing a plurality of path data, and each path data comprises an application name, a function description, an operation path, a coordinate sequence and a semantic index; And executing the operation corresponding to the matching path to realize the control of the display equipment.
2. The display device control method according to claim 1, wherein the determining a matching path from the constructed control path instruction database based on the intention recognition result includes: generating a search request based on the intention recognition result; Invoking a first large language model, and controlling the first large language model to retrieve a plurality of candidate paths from the control path instruction database based on the retrieval request, wherein each candidate path carries a confidence level; And taking the candidate path with the highest confidence as the matching path.
3. The display device control method according to claim 1, wherein the performing an operation corresponding to the matching path includes: extracting a target operation path and a target coordinate sequence related to the operation from the matching path; and generating an executable script based on the target operation path and the target coordinate sequence, and executing the executable script.
4. The display device control method of claim 3, wherein the executing the executable script comprises: And based on the step execution sequence in the executable script, simulating control operation on each interactable object positioned in the target coordinate in turn to realize control on the display equipment, wherein the step execution sequence is determined by the operation path, and each target coordinate is determined by the target coordinate sequence.
5. The display device control method according to any one of claims 1 to 4, characterized by further comprising, before the recognition of the received voice control instruction to obtain the intention recognition result: In a non-real-time stage, acquiring a plurality of different application interface images of the display device; Inputting each application interface image into a visual language model in sequence for processing to obtain an interface analysis result corresponding to each application interface image; the interface analysis result comprises an interface interactable object, object attribute information and object coordinate information, wherein the object attribute information comprises a semantic annotation of the interface interactable object, and the object coordinate information is used for describing coordinate information of the interface interactable object in the corresponding application interface image; Generating each operation path based on semantic labels of each interactable object and each object coordinate information; Generating respective path data based on the respective operation paths, and storing the respective path data to the control path instruction database.
6. The display device control method according to claim 5, wherein the generating respective path data based on the respective operation paths includes: for any operation path, extracting semantic features of semantic labels corresponding to the operation path to obtain functional description of the interactable object; Performing semantic expansion processing on the function description to obtain a semantic index of the interactable object; And associating the operation path with the function description and the semantic index, and generating a corresponding piece of path data.
7. The display device control method of claim 5, wherein the acquiring a plurality of different application interface images of the display device comprises: Shooting graphical user interfaces of the display equipment in different scenes through the camera equipment to obtain an image sequence; Correspondingly, the step of sequentially inputting each application interface image into a visual language model for processing to obtain an interface analysis result corresponding to each application interface image comprises the following steps: Inputting the image sequence into the visual language model for processing to reconstruct a hierarchical structure of the image user interface; And obtaining the interface analysis result based on the hierarchical structure.
8. The display device control method according to claim 1, wherein before the received voice control instruction is recognized, further comprising: when detecting an operation learning flow request, executing screen recording operation on the display equipment, wherein the screen recording operation is used for shooting the process of executing the graphical user interface operation in the display equipment by a user; Stopping the screen recording operation when the graphical user interface operation is detected to be executed, and sending the shot video to a server; The target path data are obtained by processing the video through a multi-mode large model by the server, wherein the multi-mode large model is a deep learning model obtained by model training based on multi-type data such as text, images, video and audio; and updating the control path instruction database based on the target path data.
9. A display device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the display device control method according to any one of claims 1 to 8 when executing the computer program.
10. A computer program product comprising a computer program which, when run, implements the display device control method of any one of claims 1 to 8.

Description

Display device control method, display device and computer program product Technical Field The present application relates to the field of display devices, and in particular, to a display device control method, a display device, and a computer program product. Background Currently, when a user controls a display device (such as a light emitting Diode (LIGHT EMITTING Diode)/Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD) conference integrated machine, a smart television and other devices with a graphical interface), the user can generally control the display device through a voice control mode. However, the existing voice control method needs to call a system API interface and/or needs to perform image recognition in real time after voice recognition, so that the problem of difficult cross-platform adaptation and recognition delay exists. Disclosure of Invention The embodiment of the application provides a display equipment control method, display equipment and a computer program product, which are used for solving the problems that cross-platform adaptation is difficult and recognition delay exists in the prior art because a system API (application program interface) is required to be called and/or image recognition is required to be realized in real time after voice recognition. In a first aspect, an embodiment of the present application provides a display device control method, including: identifying the received voice control instruction to obtain an intention identification result; determining a matching path from a constructed control path instruction database based on the intention recognition result, wherein the control path instruction database is a structured database for storing a plurality of path data, and each path data comprises an application name, a function description, an operation path, a coordinate sequence and a semantic index; And executing the operation corresponding to the matching path to realize the control of the display equipment. Compared with the prior art, the embodiment of the application has the beneficial effects that: The display equipment control method provided by the embodiment of the application obtains an intention recognition result by recognizing a received voice control instruction, determines a matching path from a constructed control path instruction database based on the intention recognition result, wherein the control path instruction database is a structured database for storing a plurality of path data, each path data comprises an application name, a function description, an operation path, a coordinate sequence and a semantic index, and executes an operation corresponding to the matching path. Compared with the prior art, the application does not need to do image recognition any more in the real-time interaction stage of user voice control, but directly determines a matching path through a control path instruction database after intention recognition of a voice control instruction, thereby remarkably reducing the calculated amount and response delay. Meanwhile, the traditional voice control mode depends on standardized interfaces (such as API interfaces) provided by applications, the interface specifications of different operating systems and different application developers are greatly different, an interface calling module is required to be independently developed for each application in the adaptation process, and compared with the prior art, the matching path in the application comprises a coordinate sequence, so that the operation can be realized through the coordinate sequence, the action of manually clicking the screen by a person is directly simulated, an application interface is not required to be called, the dependence on the application interface is completely eliminated, the interface control is replaced, and the stable adaptation across platforms is realized. In an implementation manner of the first aspect, the determining, based on the intention recognition result, a matching path from the constructed control path instruction database includes: generating a search request based on the intention recognition result; Invoking a first large language model, and controlling the first large language model to retrieve a plurality of candidate paths from the control path instruction database based on the retrieval request, wherein each candidate path carries a confidence level; And taking the candidate path with the highest confidence as the matching path. In the above embodiment, keyword matching is mostly adopted in conventional path searching, only words which are completely consistent with the path semantic index in the instruction can be identified, the core intention and the context of the user instruction cannot be understood, and matching deviation or matching failure easily occurs due to instruction expression differences (such as spoken language expression and synonym replacement). Compared with the prior art, when the first large langua