CN-121996191-A - Voice interaction method and electronic equipment

CN121996191ACN 121996191 ACN121996191 ACN 121996191ACN-121996191-A

Abstract

A voice interaction method and electronic equipment are used for improving convenience of voice editing modes and improving user experience. The method comprises the steps of displaying a voice input inlet on a display screen, detecting first selection operation of a user on the voice input inlet, displaying an editing inlet, detecting first voice information input by the user to obtain a first input result, detecting second voice information input by the user to obtain a first editing instruction after detecting second selection operation of the user on the editing inlet, editing a second input result according to the first editing instruction to obtain a third input result, wherein the second input result comprises the first input result and/or a historical input result, and displaying the third input result on the display screen.

Inventors

WU SIJU
LI RUI
YANG HUIXIONG
ZHOU WEI
LIN CHONGXI
Manpreet Singh Taka
Umapuret Singh

Assignees

华为技术有限公司

Dates

Publication Date: 20260508
Application Date: 20241105

Claims (18)

1. A voice interaction method, applied to an electronic device, the electronic device having a display screen, comprising: displaying a voice input entry on the display screen; Detecting a first selection operation of a user on the voice input inlet, displaying an editing inlet, and detecting first voice information input by the user to obtain a first input result; Detecting second voice information input by the user after detecting second selection operation of the user on the editing entrance, and obtaining a first editing instruction; Editing a second input result according to the first editing instruction to obtain a third input result, wherein the second input result comprises the first input result and/or a historical input result; and displaying the third input result on the display screen.
2. The method of claim 1, wherein the second voice information includes information of an original keyword in the second input result and/or information of a target keyword in the third input result; wherein, the information of the original key word comprises at least one of the following: The pinyin of the original key word, the common character combination of the original key word, the information of the component parts of the original key word or the reference word of the original key word; The information of the target keyword includes at least one of: Pinyin of the target keyword, common character combinations of the target keyword, information of constituent parts of the target keyword, or a reference word of the target keyword.
3. The method of claim 2, wherein the information of the original keyword includes a reference word of the original keyword; The method further comprises the steps of: and determining the original keywords from the second input result according to the reference words of the original keywords.
4. The method of claim 2, wherein the second voice information includes information of the target keyword; Determining the target keyword according to the information of the target keyword; determining the original keywords according to the types and/or semantics of the target keywords; The editing the second input result according to the first editing instruction to obtain a third input result includes: And modifying the original keywords in the second input result into the target keywords to obtain the third input result.
5. The method of any one of claims 2-4, wherein the method further comprises: determining a plurality of candidate results of keywords according to the second voice information, wherein the keywords comprise the original keywords and/or the target keywords; displaying the plurality of candidate results on the display screen; detecting a third selection operation of the user on a target candidate result in the plurality of candidate results; And determining the keywords according to the target candidate result.
6. The method of claim 5, wherein the second speech information includes pinyin for the keyword and information for components of the keyword, the information for components of the keyword including radical information; The determining a plurality of candidate results of the keyword according to the second voice information includes: Determining radicals of the keywords according to the radical information of the keywords; and determining a plurality of candidate results of the key words according to the radicals of the key words and the pinyin of the key words.
7. The method of claim 6, wherein the information of the constituent parts of the key word further includes pinyin for non-radical parts; The determining a plurality of candidate results of the keyword according to the radicals of the keyword and the pinyin of the keyword comprises the following steps: And determining a plurality of candidate results of the key words according to the radicals of the key words, the pinyin of the non-radical parts and the pinyin of the key words.
8. The method of any of claims 2-7, wherein the original keyword comprises discontinuous text in the second input result; the editing the second input result according to the first editing instruction to obtain a third input result, including: And editing the discontinuous texts and text information positioned between the discontinuous texts according to the first editing instruction to obtain the third input result.
9. The method of claim 8, wherein the text information between the two discontinuous text is a first type of symbol, the second speech information includes information indicating editing the discontinuous text, the first type of symbol is used to segment the text information within a sentence, or The second voice information includes information for editing the discontinuous text and text information located between the discontinuous text.
10. The method of any of claims 2-9, wherein the information of the original keyword includes pinyin of the original keyword, the method further comprising: And determining the keyword from the second input result according to the pinyin and the cursor position of the keyword.
11. The method of claim 10, wherein the determining the keyword from the second input result based on the pinyin and the cursor position of the keyword comprises: Determining the key words in n1 characters positioned in front of the cursor position in the second input result according to the pinyin of the key words, wherein n1 is a positive integer, and/or, And determining the key words in n2 characters positioned behind the cursor position in the second input result according to the pinyin of the key words, wherein n2 is a positive integer.
12. The method of any of claims 2-11, wherein the information of the original keyword includes pinyin of the original keyword, the method further comprising: and determining the key words according to the pinyin of the key words and the voice input records and/or pinyin input records of the second input results.
13. The method of any of claims 1-12, wherein the detecting the second voice information input by the user to obtain the first editing instruction comprises: After the second voice information is detected, the second voice information is recognized, and an input result corresponding to the second voice information is obtained; and carrying out semantic analysis on an input result corresponding to the second voice information to obtain the first editing instruction.
14. The method of claim 13, wherein the performing semantic analysis on the input result corresponding to the second voice information to obtain the first editing instruction includes: Determining one or more word segmentation in the input result corresponding to the second voice information through a word segmentation model; Determining classification information corresponding to the segmented words through an editing parameter classification model, wherein the classification information of any segmented word is used for indicating the attribute of the segmented word in the first editing instruction; And determining the first editing instruction according to the segmentation and the classification information.
15. The method of claim 13, wherein the performing semantic analysis on the input result corresponding to the second voice information to obtain the first editing instruction includes: Determining the attribute of one or more word segments in the input result corresponding to the second voice information in the first editing instruction through a large-scale language model; and determining the first editing instruction according to the segmentation and the attribute.
16. An electronic device, comprising: A processor, a memory, and one or more programs; wherein the one or more programs are stored in the memory, the one or more programs comprising instructions, which when executed by the processor, cause the electronic device to perform the method steps of any of claims 1-15.
17. A computer readable storage medium for storing a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 15.
18. A computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 15.

Description

Voice interaction method and electronic equipment Technical Field The present application relates to the field of electronic devices, and in particular, to a voice interaction method and an electronic device. Background At present, a voice input mode has become a main interaction mode of large-screen equipment such as a smart phone, a tablet personal computer and the like. However, under the influence of homophones, user accents, or environmental noise, etc., when an electronic device recognizes a voice input by a user to convert the voice into a corresponding text, a conversion error often occurs. At this time, the user is required to manually edit to modify the error text, and the modification convenience is poor. Therefore, how to improve the convenience of the voice editing mode, the user experience is poor, and the method is still an important problem to be solved. Disclosure of Invention The application provides a voice interaction method and electronic equipment, which are used for improving convenience of a voice editing mode and improving user experience. In a first aspect, a voice interaction method is provided, which is applied to an electronic device with a display screen, and the electronic device may be a mobile phone, for example. The method comprises the steps of displaying a voice input inlet on a display screen, detecting first selection operation of a user on the voice input inlet, displaying an editing inlet, detecting first voice information input by the user to obtain a first input result, detecting second voice information input by the user to obtain a first editing instruction after detecting second selection operation of the user on the editing inlet, editing a second input result according to the first editing instruction to obtain a third input result, wherein the second input result comprises the first input result and/or a historical input result, and displaying the third input result on the display screen. Based on the method of the first aspect, after detecting the first selection operation of the user, the mobile phone obtains the voice information input by the user, and displays the editing entry on the display screen. The mobile phone can also acquire second voice information input by the user after detecting the second selection operation of the user on the editing entrance, wherein the second voice information is the voice of the editing instruction. The electronic equipment can also obtain an editing instruction according to the second voice information and edit the input result. In the editing mode, the user does not need to manually modify the input result, and only needs to input the editing instruction through voice, so that the convenience of the voice editing mode can be improved, and the user experience is improved. In one possible design, the second voice information includes information of the original keyword in the second input result and/or information of the target keyword in the third input result. The information of the original keywords comprises at least one of pinyin of the original keywords, common character combinations of the original keywords, information of components of the original keywords, and meaning words, hyponyms, synonyms or similar words in semantic description information of the original keywords or in the original keywords, and the information is used for supporting the electronic equipment to realize accurate and efficient recognition of the original keywords according to the voice information of the original keywords. The information of the target keyword comprises at least one of pinyin of the target keyword, common character combinations of the target keyword, information of components of the target keyword, semantic description information of the target keyword or reference words, paraphrasing words, synonyms or similar words of the target keyword, and the information is used for supporting the electronic equipment to realize accurate and efficient recognition of the original keyword according to the voice information of the target keyword. In one possible design, the information of the original keyword includes a reference word of the original keyword, and accordingly, the electronic device may determine the original keyword from the second input result according to the reference word of the original keyword. Based on the design, accurate and efficient determination of the keywords can be realized when the second voice information does not contain pinyin of the original keywords. In one possible design, the second voice information includes information of the target keyword, and accordingly, the electronic device may determine the target keyword according to the information of the target keyword in the second voice information, and determine the original keyword from the second input result according to the type and/or the semantic of the target keyword. Therefore, the accurate and efficient determination of the original keywords