CN-122027381-A - Device binding guiding method and device based on multi-mode model and storage medium

CN122027381ACN 122027381 ACN122027381 ACN 122027381ACN-122027381-A

Abstract

The application discloses a device binding guiding method, a device and a storage medium based on a multi-mode model, which relate to the technical field of intelligent home/smart families, wherein the device binding guiding method based on the multi-mode model comprises the following steps that a client acquires an operation instruction input by a user; analyzing the operation instruction to obtain interface information and/or voice information, and sending the interface information and/or voice information to the server. The server inputs interface information and/or voice information into the multi-modal large language model to obtain a text prompt, then converts the text prompt to obtain a voice prompt, and then sends the text prompt and the voice prompt to the client. The client receives the text prompt and the voice prompt sent by the server and guides the user to perform the equipment binding operation. The method provides real-time, multi-mode and personalized guiding feedback for the binding of the Internet of things equipment, so that the binding process is more convenient and smoother, misoperation is reduced, and efficiency is remarkably improved.

Inventors

HUANG TAO
WANG ZHONGFENG
CHEN MOHAN

Assignees

海尔优家智能科技（北京）有限公司
青岛海尔科技有限公司
海尔智家股份有限公司

Dates

Publication Date: 20260512
Application Date: 20241111

Claims (10)

1. The device binding guiding method based on the multi-mode model is characterized by being applied to a server and comprising the following steps: Acquiring interface information and/or voice information sent by a client; Inputting the interface information and/or the voice information into the multi-mode large language model to obtain text reminding, wherein the multi-mode large language model is obtained through training according to a fixed flow of equipment binding; converting the text prompt to obtain a voice prompt; and sending the text prompt and the voice prompt to the client, wherein the text prompt and the voice prompt are used for guiding a user to perform equipment binding operation.
2. The method of claim 1, wherein prior to said inputting said interface information and/or said speech information into said multimodal large language model, said method further comprises: acquiring a user binding flow of equipment to be bound; Decomposing the binding procedure to obtain a plurality of step keywords of the binding procedure and a connection relation between each step; And training the large language model according to the step keywords and the connection relation to obtain the multi-modal large language model.
3. The method of claim 1, wherein the converting the text alert to obtain a voice alert comprises: Converting the text prompt to obtain semantic information; Determining a matching template according to the semantic information and a preset sentence template, wherein the matching template is the preset sentence template matched with the semantic information; And synthesizing the semantic information and the matching template into a voice prompt by utilizing a voice synthesis service.
4. The device binding guiding method based on the multi-mode model is characterized by being applied to a client and comprising the following steps: Acquiring an operation instruction input by a user, wherein the operation instruction comprises a voice instruction and/or a touch instruction; Performing binding flow check on the operation instruction, and performing analysis processing on the operation instruction under the condition of conforming to the binding flow to obtain interface information and/or voice information; The interface information and/or the voice information is sent to the server, so that the server generates a text prompt and a voice prompt of a device binding process based on the interface information and/or the voice information; And receiving a text prompt and a voice prompt sent by the server, wherein the text prompt and the voice prompt are used for guiding a user to perform equipment binding operation.
5. The method of claim 4, wherein the parsing the operation command to obtain interface information and voice information comprises: denoising the voice command to obtain voice information; determining the position and action of the touch instruction according to the touch parameters of the touch instruction, wherein the touch parameters refer to coordinate parameters and pressure parameters of the touch instruction; And matching the position with the action and an interface element of a display interface to determine interface information of the touch instruction, wherein the display interface is an interface for inputting the touch instruction by a user.
6. The method according to claim 4, wherein the method further comprises: Monitoring the content on the display interface in real time, and acquiring an abnormality reason under the condition that the bound flow is abnormal; and converting the abnormal reasons to obtain interface information, and sending the interface information to the server.
7. A device binding guide apparatus based on a multimodal model, applied to a server, comprising: The acquisition module is used for acquiring interface information and/or voice information sent by the client; The input module is used for inputting the interface information and/or the voice information into the multi-mode large language model to obtain text reminding, and the multi-mode large language model is obtained through training according to a fixed process of equipment binding; the processing module is used for converting the text prompt to obtain a voice prompt; the sending module is used for sending the text prompt and the voice prompt to the client, and the text prompt and the voice prompt are used for guiding a user to perform equipment binding operation.
8. A device binding guide apparatus based on a multimodal model, applied to a client, comprising: the acquisition module is used for acquiring an operation instruction input by a user, wherein the operation instruction comprises a voice instruction and/or a touch instruction; the processing module is used for carrying out binding procedure checking on the operation instruction, and carrying out analysis processing on the operation instruction under the condition of conforming to the binding procedure to obtain interface information and/or voice information; the sending module is used for sending the interface information and/or the voice information to the server so that the server generates a text prompt and a voice prompt of a device binding process based on the interface information and/or the voice information; the receiving module is used for receiving the text prompt and the voice prompt sent by the server, and the text prompt and the voice prompt are used for guiding a user to perform equipment binding operation.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 6.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of claims 1 to 6 by means of the computer program.

Description

Device binding guiding method and device based on multi-mode model and storage medium Technical Field The application relates to the field of intelligent home/intelligent families, in particular to a device binding guiding method and device based on a multi-mode model and a storage medium. Background With the rapid development of the internet of things technology, more and more devices need to access a network, and the convenience and usability requirements of users on device binding are higher and higher. The continuous progress of voice technology makes voice guidance a more visual and natural interaction mode. The binding process of the internet of things device generally comprises device selection, network configuration, device setting and final device binding confirmation, which requires a user to input a large amount of information, confirms multiple times among steps, and may face various connection or setting errors at any time. Most of the prior technical schemes rely on static instructions or simple in-application prompts, and the methods cannot adapt to specific operation steps of users in real time and are difficult to correct errors of the users in time. This approach often results in poor installation and use experience for the user, with a high likelihood of mishandling. Disclosure of Invention The application provides a device binding guiding method, device and storage medium based on a multi-mode model, which are used for solving the problems of high complexity of the device binding process of the Internet of things, high misoperation performance, repeated checking of specifications by a user and complicated operation steps in the prior art. In a first aspect, the present application provides a device binding guiding method based on a multi-mode model, applied to a server, including: Acquiring interface information and/or voice information sent by a client; Inputting the interface information and/or the voice information into the multi-mode large language model to obtain text reminding, wherein the multi-mode large language model is obtained through training according to a fixed flow of equipment binding; converting the text prompt to obtain a voice prompt; and sending the text prompt and the voice prompt to the client, wherein the text prompt and the voice prompt are used for guiding a user to perform equipment binding operation. Optionally, before the interface information and/or the voice information is input into the multimodal big language model, the method further includes: acquiring a user binding flow of equipment to be bound; Decomposing the binding procedure to obtain a plurality of step keywords of the binding procedure and a connection relation between each step; And training the large language model according to the step keywords and the connection relation to obtain the multi-modal large language model. Optionally, the converting the text alert to obtain a voice alert includes: Converting the text prompt to obtain semantic information; Determining a matching template according to the semantic information and a preset sentence template, wherein the matching template is the preset sentence template matched with the semantic information; And synthesizing the semantic information and the matching template into a voice prompt by utilizing a voice synthesis service. In a second aspect, the present application provides a device binding guiding method based on a multi-mode model, applied to a client, including: Acquiring an operation instruction input by a user, wherein the operation instruction comprises a voice instruction and/or a touch instruction; Performing binding flow check on the operation instruction, and performing analysis processing on the operation instruction under the condition of conforming to the binding flow to obtain interface information and/or voice information; The interface information and/or the voice information is sent to the server, so that the server generates a text prompt and a voice prompt of a device binding process based on the interface information and/or the voice information; And receiving a text prompt and a voice prompt sent by the server, wherein the text prompt and the voice prompt are used for guiding a user to perform equipment binding operation. Optionally, the analyzing the operation instruction to obtain interface information and voice information includes: denoising the voice command to obtain voice information; determining the position and action of the touch instruction according to the touch parameters of the touch instruction, wherein the touch parameters refer to coordinate parameters and pressure parameters of the touch instruction; And matching the position with the action and an interface element of a display interface to determine interface information of the touch instruction, wherein the display interface is an interface for inputting the touch instruction by a user. Optionally, the method further comprises: Monitoring the cont