CN-121996117-A - Man-machine interaction method, electronic equipment and storage medium

CN121996117ACN 121996117 ACN121996117 ACN 121996117ACN-121996117-A

Abstract

The application provides a man-machine interaction method, electronic equipment and a storage medium, relates to the technical field of man-machine interaction, and is used for reducing user operation steps and improving user experience. The method comprises the steps of responding to detection of data selection operation of a user on a current display screen, obtaining data content selected by the user, receiving an operation instruction input by the user, calling a first large language model to conduct processing operation indicated by the operation instruction on the data content to obtain a processing result of the data content, and displaying the processing result on the current display screen.

Inventors

MIAO SHUMENG
YE ZEPING
ZHU ZIQUN
WANG YINI

Assignees

南航数智科技(广东)有限公司

Dates

Publication Date: 20260508
Application Date: 20260115

Claims (10)

1. A man-machine interaction method is characterized by being applied to a user terminal, and comprises the following steps: responding to the detection of the selection operation of a user on the data on the current display screen, acquiring the data content selected by the user, and receiving an operation instruction input by the user; The method comprises the steps of receiving an operation instruction input by a user, displaying a toolbar window in the current display screen, wherein the toolbar window comprises a plurality of operation controls, one operation control corresponds to one operation instruction, receiving selection operation of a target operation control in the plurality of operation controls by the user, determining the operation instruction corresponding to the target operation control as the operation instruction input by the user, or determining different key positions to correspond to different operation instructions, acquiring working states of the keys in a selection process of the selected operation, determining the operation instruction corresponding to the key position matched in the selection process of the selected operation as the operation instruction input by the user; invoking a first large language model to perform processing operation indicated by the operation instruction on the data content to obtain a processing result of the data content; And displaying the processing result on the current display screen.
2. The method of claim 1, wherein invoking the first large language model to perform the processing operation indicated by the operation instruction on the data content results in a processing result of the data content, comprises: based on the operation instruction and the data content, constructing a structured prompt word, wherein the structured prompt word is used for indicating the processing operation indicated by the operation instruction on the data content; And inputting the structured prompt word into the first large language model to obtain a processing result of the first large language model on the data content.
3. The method of claim 1, wherein the obtaining the user-selected data content comprises: Determining effective selection operation of the user; calling an accessibility interface provided by an operating system to acquire the data content selected by the selected operation, or Sending a simulated copy keyboard message to an operating system global message queue, receiving data content updated by a clipboard in response to the simulated copy keyboard message, determining the clipboard updated data content as the selected data content for the selected operation, or And carrying out screenshot on the region selected by the selected operation, and identifying the screenshot to obtain the data content selected by the selected operation.
4. A method according to claim 3, wherein said determining that the user is performing a valid selection operation comprises: continuously detecting coordinate displacement data and a cursor shape state of a mouse; when the cursor shape state of the mouse is detected to be the selected shape state, and the coordinate displacement data of the mouse represent that the accumulated physical displacement of the mouse is larger than a preset distance, determining that the user performs effective selected operation.
5. The method of claim 3, wherein prior to sending the simulated copy keyboard message to the operating system global message queue, the method further comprises: Storing all content in the clipboard; The method for transmitting the simulated copy keyboard message to the operating system global message queue, receiving the updated data content of the clipboard in response to the simulated copy keyboard message, determining the updated data content of the clipboard as the data content selected by the selecting operation comprises the following steps: Sending a simulated copy keyboard message to an operating system global message queue, and marking a preset trigger zone bit as a first value, wherein the simulated copy keyboard message is used for indicating copy operation, and the first value is used for representing that copy operation corresponding to content in a current clipboard is initiated by the user terminal; Detecting a change event of the clipboard, and detecting whether the preset trigger zone bit is a first value; If the clipboard changes and the preset trigger flag bit is a first value, reading data content from the clipboard, and determining the data content read from the clipboard as the data content selected by the selecting operation; after determining the data content read from the clipboard as the data content selected by the selecting operation, the method further comprises: Restoring the content in the clipboard stored before the simulated copy keyboard message is sent to the operating system global message queue; and marking the preset trigger zone bit as a second value, wherein the second value is used for representing that the copy operation corresponding to the content in the current clipboard is not initiated by the user terminal.
6. The method of claim 1, wherein the step of determining the position of the substrate comprises, The toolbar window is positioned at the top layer of the desktop in the current display screen, or The toolbar window is embedded into a display interface of the currently activated application program; In a case where the toolbar window is located at a top layer of a desktop in the current display screen, the displaying the toolbar window in the current display screen includes: Determining a selected starting point and a selected ending point of the selected operation; Calculating a displacement difference value of the selected starting point and the selected ending point in the vertical direction; If the displacement difference value is a positive number, determining the sum of the selected end point and a preset variable as an anchor point ordinate of the toolbar window, and displaying the toolbar window in the current display screen based on the anchor point ordinate of the toolbar window, the preset window size and the boundary coordinate of the display screen; And if the displacement difference value is a negative number, determining the difference between the selected ending point and a target value as an anchor ordinate of the toolbar window, and displaying the toolbar window in the current display screen based on the anchor ordinate of the toolbar window, a preset window size and a display screen boundary coordinate, wherein the target value is the sum of the height of the toolbar window in the preset window size and the preset variable.
7. The method according to claim 2, wherein said inputting the structured prompting words into the first large language model, to obtain the processing result of the data content by the first large language model, includes: determining a first large language model from a plurality of models that matches the structured prompt; And inputting the structured prompt word into the first large language model to obtain a processing result of the first large language model on the data content.
8. The method of any of claims 1-7, wherein the displaying the processing results on the current display screen comprises: Creating a single answer window on a current display screen, wherein the single answer window comprises a replacement instruction control and a continuous question control; displaying the processing result in the single answer window; The method further comprises the steps of: closing the single answer window under the condition that the selection operation of the user on the replacement instruction control is received, so as to re-receive the operation instruction input by the user; Under the condition that the selection operation of the user on the continuous question control is received, creating a multi-round dialogue window, wherein the multi-round dialogue window comprises a display area and an input area, the display area is used for displaying historical dialogue information, and the historical dialogue information comprises data content selected by the selected operation, an operation instruction input by the user and the processing result; And receiving the questions input by the user in the input area, and calling a second large language model to answer the questions in combination with the historical dialogue information.
9. An electronic device, wherein the apparatus comprises a memory and a processor; the memory is coupled to the processor; The memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the computer instructions, when executed by the processor, cause the apparatus to perform the method of any of claims 1-8.
10. A computer readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 8.

Description

Man-machine interaction method, electronic equipment and storage medium Technical Field The present application relates to the field of man-machine interaction technologies, and in particular, to a man-machine interaction method, an electronic device, and a storage medium. Background Along with the rapid development of artificial intelligence technology, a large language model has strong capability in natural language processing task, and how to efficiently and conveniently utilize the large language model to assist daily work and study has become an important subject in the technical field of man-machine interaction. When a user reads data content (such as a document, a web page, a picture, an electronic book and the like), if the user wants to acquire interpretation, translation, summary, extension information and the like of a large language model with respect to specific content, a series of tedious operations including manually selecting text, copying, switching to a large language model application or a web page tag page, pasting content, organizing problems and submitting are generally required, and finally, the processing result of the data content can be obtained. This process not only interrupts the existing workflow of the user, but also reduces the efficiency of information acquisition. At present, some large language model service providers have browser plug-ins, which allow users to call specific functions through right-click menus after selecting texts in web pages, so that the number of switching steps is reduced to a certain extent, but the depth of the functions depends on specific browser environments, so that the functions cannot be commonly used in the global or all desktop applications of an operating system, and the application scenes are limited. The large language model application software also supports a global shortcut key calling floating window, so that a user can input a problem in an interactive window, but the scheme still needs the manual input or pasting of the user, is not connected with the data selection action of the user, and has insufficient naturalness of man-machine interaction. Therefore, a scheme is needed that can reduce the steps of user operation and improve the smoothness of human-computer interaction. Disclosure of Invention The application aims to provide a man-machine interaction method, electronic equipment and a storage medium, which are used for reducing user operation steps and improving user experience. In a first aspect, a human-computer interaction method is provided and applied to a user terminal. The method comprises the following steps: responding to the detection of the selection operation of the user on the current display screen, acquiring the data content selected by the user, and receiving an operation instruction input by the user; The method comprises the steps of receiving operation instructions input by a user, wherein the operation instructions comprise that a toolbar window is displayed in a current display screen, the toolbar window comprises a plurality of operation controls, one operation control corresponds to one operation instruction, receiving selection operation of a user on a target operation control in the plurality of operation controls, determining the operation instruction corresponding to the target operation control as the operation instruction input by the user, or different key positions correspond to different operation instructions, acquiring working states of all key positions in a selection process of a selected operation, determining the operation instruction corresponding to the key position matched in the selection process of the selected operation as the operation instruction input by the user; Invoking a processing operation indicated by an operation instruction of the first large language model for the data content to obtain a processing result of the data content; and displaying the processing result on the current display screen. The technical scheme provided by the application has at least the following beneficial effects: The whole flow from selecting data to displaying processing results is completed in the current display screen, the user does not need to perform multi-step cross-interface operation, even feedback closed loops for processing and displaying the selected data are formed, the interruption of the existing workflow of the user is avoided, the information acquisition efficiency of the user can be improved, the man-machine interaction fluency of the user in the large language model process is improved, and the user use experience is improved. The toolbar window and the operation control are set to provide visual interaction selection for a user, so that an operation threshold is reduced, complex rules are not required to be memorized, and efficient calling of a large language model is realized. The selected operation and the key position are mutually matched, so that the user operation using t