CN-121996227-A - Graphical configuration method, device, equipment and storage medium for voice function

CN121996227ACN 121996227 ACN121996227 ACN 121996227ACN-121996227-A

Abstract

The application discloses a graphical configuration method, a device, equipment and a storage medium of a voice function, which relate to the technical field of intelligent voice interaction, wherein the method comprises the steps of responding to the selection operation of a user on a standardized component in a graphical interface, adding the selected standardized component to a flow canvas; the process canvas is a visual area for showing the logic relation of a standardized component, the standardized component is a preset voice function module, the standardized component is converted into a configuration model of a corresponding voice function process based on the connection operation of a user on the process canvas, the configuration model is converted into a corresponding executable code through a code generator, and the executable code is integrated into an operation supporting platform, wherein the operation supporting platform is an environment supporting the voice function process corresponding to the executable code. Through the combination of the graphical interface and the automatic code generation, the technical threshold of voice function configuration is obviously reduced.

Inventors

HUANG TAO
LI QINGQING
YIN DESHUAI
WANG MIAO

Assignees

青岛海尔科技有限公司
海尔优家智能科技（北京）有限公司
青岛海尔智能家电科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251225

Claims (10)

1. A method for graphically configuring a voice function, comprising: responding to the selected operation of a user on the standardized component in a graphical interface, and adding the selected standardized component to a flow canvas, wherein the flow canvas is a visual area for displaying the logic relationship of the standardized component, and the standardized component is a preset voice function module; converting the standardized component into a configuration model of a corresponding voice function flow based on the connection operation of a user on the flow canvas; converting the configuration model into corresponding executable code by a code generator; And integrating the executable code into an operation supporting platform, wherein the operation supporting platform is a platform supporting the voice function flow corresponding to the executable code.
2. The method of claim 1, wherein the converting the standardized component into a configuration model of a corresponding voice function flow based on a user's connection operation at the flow canvas comprises: based on the connection operation, acquiring a source standardized component and a corresponding target standardized component from the standardized component, wherein the source standardized component is a standardized component with an output port triggered by the connection operation, and the target standardized component is a standardized component with an input port triggered by the connection operation; establishing a target logic relationship between the standardized components based on the source standardized components and the corresponding target standardized components; and acquiring a configuration model of the voice function flow based on the standardized component and the target logic relationship.
3. The method of claim 2, wherein prior to the obtaining a configuration model of a voice function flow based on the normalized component and the target logical relationship, the method further comprises: Determining a context feature vector according to component information of a standardized component moved to the flow canvas; Inputting the context feature vector into a pre-trained context perception model, and predicting the component type of the associated standardized component to be configured in the next step; Acquiring at least one recommended standardized component according to the component type, and displaying the recommended standardized component on the graphical interface; Responsive to a user selection operation in the at least one recommendation-normalization component, a corresponding recommendation-normalization component is added to the flow canvas.
4. The method of any of claims 1-3, wherein, in response to a user's selected manipulation of a standardized component within a graphical interface, before adding the selected standardized component to a flow canvas, the method further comprises: determining a voice function flow to be created according to a configuration application input by a user; Querying a voice function model library to determine whether a reference voice function model associated with the voice function flow exists, wherein the voice function model in the voice function model library is determined based on historical executable codes generated by a historical configuration application; If yes, the reference voice function model is called, and the target voice function model is obtained in response to the adjustment operation of the user on the reference voice function model; And integrating the target voice function model into the operation supporting platform.
5. The method of claim 2, wherein after integrating the executable code into a runtime support platform, the method further comprises: when detecting the modification operation of a user on the executable code, analyzing the modification operation into a changed configuration model, and generating a corresponding code change log; storing the code change log and the changed configuration model to obtain a version chain, wherein the version chain is used for rolling back to an initial configuration model when the changed configuration model has configuration errors; comparing the configuration model with the modified configuration model to obtain difference data; And displaying the configuration model and the changed configuration model in the flow canvas, and highlighting the difference data through preset parameters.
6. The method of claim 1, wherein after integrating the executable code into a runtime support platform, further comprising: When the executable code is integrated on the operation support platform for the first time or the executable code is detected to be adjusted, a corresponding test voice instruction is obtained; Inputting the test voice command through a test simulator, and executing the voice command based on the executable code to obtain a corresponding test result; And comparing the test result with an expected result corresponding to the voice function flow, and outputting prompt information when the comparison result indicates inconsistent.
7. The method of claim 6, wherein the obtaining the corresponding test voice instruction comprises: Acquiring an initial test voice instruction corresponding to the executable code or the adjusted executable code, wherein the initial test voice instruction is a noiseless digital voice signal; Selecting at least one noise type from a preset environmental noise library; and carrying out frequency spectrum superposition on the noise signal corresponding to the noise type and the initial test voice command according to a preset signal-to-noise ratio to generate a test voice command with noise.
8. The graphical configuration device for the voice function is characterized by comprising an acquisition module, a conversion module and a test module, wherein: The acquisition module is used for responding to the selected operation of the standardized assembly by a user in the graphical interface and adding the selected standardized assembly to a flow canvas, wherein the flow canvas is a visual area for displaying the logic relationship of the standardized assembly, and the standardized assembly is a preset voice function module; The obtaining module is further used for converting the standardized component into a configuration model of a corresponding voice function process based on connection operation of a user on the process canvas; The conversion module is used for converting the configuration model into corresponding executable codes through a code generator; the test module is used for integrating the executable code into an operation supporting platform, wherein the operation supporting platform is an environment supporting a voice function flow corresponding to the executable code.
9. An electronic device comprising a processor and a memory communicatively coupled to the processor; The memory stores computer-executable instructions; The processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to implement a graphical configuration method for speech functionality according to any of claims 1 to 7.

Description

Graphical configuration method, device, equipment and storage medium for voice function Technical Field The application relates to the technical field of intelligent voice interaction, in particular to a graphical configuration method, a graphical configuration device, graphical configuration equipment and a graphical storage medium of voice functions. Background The intelligent customer service system, the voice assistant development, the intelligent home control, the vehicle-mounted voice system, the multi-language speech service and other scenes, and users or developers need to quickly construct and debug voice functions, such as defining recognition logic of voice instructions, designing voice interaction flow, configuring voice synthesis parameters and the like. In the traditional development mode, a developer needs to manually write codes according to a programming language (such as Python, javaScript) and a voice SDK (such as Google ASSISTANT SDK and Azure SPEECH SERVICES), and logic of voice functions is realized line by line, so that the developer is required to have higher programming capability, and further needs to have deep understanding on underlying technologies such as voice recognition, natural Language Processing (NLP) and the like. In addition, in the debugging stage, the developer needs to repeatedly modify codes and redeploy the system, so that the development period is long and the cost is high. The existing voice function configuration scheme relies on codes to realize voice functions, needs to write a large number of codes or maintain complex configuration files, is not friendly to non-technical users, and has high use threshold. Disclosure of Invention The application provides a graphical configuration method, a graphical configuration device, graphical configuration equipment and a graphical storage medium for a voice function, which are used for solving the problem that the use threshold is high because a large number of codes are required to be written or complex configuration files are required to be maintained in the existing voice function configuration scheme depending on codes. In a first aspect, the present application provides a method for graphically configuring a voice function, including: responding to the selected operation of a user on the standardized component in a graphical interface, and adding the selected standardized component to a flow canvas, wherein the flow canvas is a visual area for displaying the logic relationship of the standardized component, and the standardized component is a preset voice function module; converting the standardized component into a configuration model of a corresponding voice function flow based on the connection operation of a user on the flow canvas; converting the configuration model into corresponding executable code by a code generator; and integrating the executable code into an operation supporting platform, wherein the operation supporting platform is an environment supporting the voice function flow corresponding to the executable code. In one possible implementation manner, the converting the standardized component into the configuration model of the corresponding voice function procedure based on the connection operation of the user on the procedure canvas includes: based on the connection operation, acquiring a source standardized component and a corresponding target standardized component from the standardized component, wherein the source standardized component is a standardized component with an output port triggered by the connection operation, and the target standardized component is a standardized component with an input port triggered by the connection operation; establishing a target logic relationship between the standardized components based on the source standardized components and the corresponding target standardized components; and acquiring a configuration model of the voice function flow based on the standardized component and the target logic relationship. In one possible implementation manner, before the obtaining the configuration model of the voice function flow based on the standardized component and the target logic relationship, the method further includes: Determining a context feature vector according to component information of a standardized component moved to the flow canvas; Inputting the context feature vector into a pre-trained context perception model, and predicting the component type of the associated standardized component to be configured in the next step; Acquiring at least one recommended standardized component according to the component type, and displaying the recommended standardized component on the graphical interface; Responsive to a user selection operation in the at least one recommendation-normalization component, a corresponding recommendation-normalization component is added to the flow canvas. In one possible implementation, before the adding the selected standardized co