CN-121999774-A - Voice data processing method, system, electronic equipment and storage medium

CN121999774ACN 121999774 ACN121999774 ACN 121999774ACN-121999774-A

Abstract

The invention provides a voice data processing method, a voice data processing system, electronic equipment and a storage medium. If the voice data to be processed is not successfully transmitted to the server, determining a voice escape result of the voice data to be processed by using a first voice model deployed at the terminal equipment. And then searching a function matched with the voice escape result of the voice data to be processed from the local cache of the terminal equipment. The dependence on the network is reduced when the voice data is processed, and the usability of the intelligent voice model is improved.

Inventors

LIU XIN
LAI ZHEN
BAO ZHENWEN

Assignees

深圳市元征科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260206

Claims (10)

1. A method for processing voice data, the method comprising: acquiring voice data to be processed; Transmitting the voice data to be processed to a server through the data transmission path selected by the screening; If the voice data to be processed is not successfully transmitted to the server, determining a voice escape result of the voice data to be processed by using a first voice model deployed in terminal equipment; And inquiring a function matched with the voice escape result of the voice data to be processed from the local cache of the terminal equipment, wherein the local cache of the terminal equipment at least stores a plurality of voice escape results, the IDs of the voice escape results and the function matched with the voice escape results.
2. The method of claim 1, wherein transmitting the voice data to be processed to a server via the selected data transmission path comprises: Optimizing a data transmission path; Selecting a data transmission path with optimal network quality from the optimized data transmission paths; And sending the voice data to be processed to a server through a data transmission path with optimal network quality.
3. The method according to claim 1, further comprising, after transmitting the voice data to be processed to a server through the data transmission path selected by the screening: If the voice data to be processed is successfully sent to the server, receiving a processing result fed back by the server, wherein the processing result comprises a voice escape result and an ID of the voice escape result of the voice data to be processed, or the processing result comprises an ID of the voice escape result of the voice data to be processed, wherein the voice escape result and the ID of the voice data to be processed are obtained by processing the voice data to be processed by a second voice model deployed on the server, and the magnitude of the second voice model is larger than that of the first voice model; when the processing result comprises a voice escape result and an ID of the voice data to be processed, executing the step of inquiring the function matched with the voice escape result of the voice data to be processed; And when the processing result comprises the ID of the voice escape result of the voice data to be processed, searching the voice escape result of the voice data to be processed from the local cache of the terminal equipment according to the ID of the voice escape result of the voice data to be processed, and executing the step of inquiring the function matched with the voice escape result of the voice data to be processed.
4. A method according to claim 3, further comprising: And when the processing result comprises the voice escape result and the ID of the voice data to be processed, storing the processing result into a local cache of the terminal equipment.
5. The method according to any one of claims 1-4, further comprising, after querying a function matching a speech escape result of the speech data to be processed: When the function matched with the voice escape result of the voice data to be processed is queried, executing the function matched with the voice escape result of the voice data to be processed so as to respond to the voice data to be processed; And when the function matched with the voice escape result of the voice data to be processed is not queried, not responding to the voice data to be processed.
6. A system for processing speech data, the system comprising: The acquisition unit is used for acquiring voice data to be processed; A transmitting unit for transmitting the voice data to be processed to a server through the data transmission path selected by the screen; the determining unit is used for determining a voice escape result of the voice data to be processed by using a first voice model deployed at the terminal equipment if the voice data to be processed is not successfully transmitted to the server; And the query unit is used for querying the function matched with the voice escape result of the voice data to be processed from the local cache of the terminal equipment, and the local cache of the terminal equipment at least stores a plurality of voice escape results, the IDs of the voice escape results and the function matched with the voice escape results.
7. The system of claim 6, wherein the transmitting unit comprises: The optimizing module is used for optimizing the data transmission path; the selection module is used for selecting a data transmission path with optimal network quality from the optimized data transmission paths; and the sending module is used for sending the voice data to be processed to a server through a data transmission path with optimal network quality.
8. The system of claim 6, further comprising: The receiving unit is used for receiving a processing result fed back by the server if the voice data to be processed is successfully sent to the server, wherein the processing result comprises a voice escape result and an ID (identity) of the voice data to be processed or the processing result comprises an ID of the voice escape result of the voice data to be processed, the voice escape result and the ID of the voice data to be processed are obtained by processing the voice data to be processed by a second voice model deployed on the server, and the magnitude of the second voice model is larger than that of the first voice model; And when the processing result comprises the ID of the voice escape result of the voice data to be processed, searching the voice escape result of the voice data to be processed from a local cache of the terminal equipment according to the ID of the voice escape result of the voice data to be processed, and executing the query unit.
9. A computer device comprising a processor and a memory, the processor and the memory being connected by a bus, wherein the processor is configured to call and execute a program stored in the memory, and wherein the memory is configured to store a program for implementing the method for processing speech data according to any one of claims 1 to 5.
10. A storage medium having stored therein computer executable instructions for performing the method of processing speech data according to any of claims 1-5.

Description

Voice data processing method, system, electronic equipment and storage medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a method and system for processing voice data, an electronic device, and a storage medium. Background With the development of internet technology, intelligent voice AI is widely used in various industries. The intelligent voice AI agent is usually stored in a server, the intelligent voice AI agent needs to rely on a network when performing voice recognition and voice escape, and when the network delays or the network shakes, the intelligent voice AI agent cannot output content or output wrong content, so that the availability of the intelligent voice AI agent is not stable enough. Disclosure of Invention In view of this, the embodiments of the present invention provide a method, a system, an electronic device, and a storage medium for processing voice data, so as to improve the usability of intelligent voice AI. In order to achieve the above object, the embodiment of the present invention provides the following technical solutions: The first aspect of the embodiment of the invention discloses a method for processing voice data, which comprises the following steps: acquiring voice data to be processed; Transmitting the voice data to be processed to a server through the data transmission path selected by the screening; If the voice data to be processed is not successfully transmitted to the server, determining a voice escape result of the voice data to be processed by using a first voice model deployed in terminal equipment; And inquiring a function matched with the voice escape result of the voice data to be processed from the local cache of the terminal equipment, wherein the local cache of the terminal equipment at least stores a plurality of voice escape results, the IDs of the voice escape results and the function matched with the voice escape results. Preferably, the sending the voice data to be processed to the server through the data transmission path selected by the screening includes: Optimizing a data transmission path; Selecting a data transmission path with optimal network quality from the optimized data transmission paths; And sending the voice data to be processed to a server through a data transmission path with optimal network quality. Preferably, after the voice data to be processed is sent to the server through the data transmission path selected by the screening, the method further includes: If the voice data to be processed is successfully sent to the server, receiving a processing result fed back by the server, wherein the processing result comprises a voice escape result and an ID of the voice escape result of the voice data to be processed, or the processing result comprises an ID of the voice escape result of the voice data to be processed, wherein the voice escape result and the ID of the voice data to be processed are obtained by processing the voice data to be processed by a second voice model deployed on the server, and the magnitude of the second voice model is larger than that of the first voice model; when the processing result comprises a voice escape result and an ID of the voice data to be processed, executing the step of inquiring the function matched with the voice escape result of the voice data to be processed; And when the processing result comprises the ID of the voice escape result of the voice data to be processed, searching the voice escape result of the voice data to be processed from the local cache of the terminal equipment according to the ID of the voice escape result of the voice data to be processed, and executing the step of inquiring the function matched with the voice escape result of the voice data to be processed. Preferably, the method further comprises: And when the processing result comprises the voice escape result and the ID of the voice data to be processed, storing the processing result into a local cache of the terminal equipment. Preferably, after the function of matching the voice escape result of the voice data to be processed is queried, the method further comprises: When the function matched with the voice escape result of the voice data to be processed is queried, executing the function matched with the voice escape result of the voice data to be processed so as to respond to the voice data to be processed; And when the function matched with the voice escape result of the voice data to be processed is not queried, not responding to the voice data to be processed. A second aspect of an embodiment of the present invention discloses a system for processing voice data, the system including: The acquisition unit is used for acquiring voice data to be processed; A transmitting unit for transmitting the voice data to be processed to a server through the data transmission path selected by the screen; the determining unit is used for determining a voice escape