CN-121303125-B - College voice consultation data processing method, device, equipment and medium

CN121303125BCN 121303125 BCN121303125 BCN 121303125BCN-121303125-B

Abstract

A method, a device, equipment and a medium for processing voice consultation data of universities relate to the field of data processing. The method comprises the steps of obtaining a voice consultation request input by a user, carrying out voice recognition processing on the voice consultation request to obtain an initial text sequence corresponding to the voice consultation request and a parallel phoneme sequence aligned with the initial text sequence word by word, carrying out two-way parallel vocabulary processing on the initial text sequence and the parallel phoneme sequence based on a preset college knowledge base to obtain a first candidate vocabulary and a second candidate vocabulary, merging the first candidate vocabulary and the second candidate vocabulary to obtain a merged vocabulary, carrying out context disambiguation on the merged vocabulary to obtain a normalized query text, and carrying out information retrieval in the preset college knowledge base according to the normalized query text to generate a corresponding consultation answer. By implementing the technical scheme provided by the application, the accuracy of the voice consultation service of the universities is improved.

Inventors

FENG DEQI
JI YANXIN

Assignees

北京中联北方信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20251028

Claims (9)

1. A method for processing voice consultation data of a college, the method comprising: acquiring a voice consultation request input by a user; Performing voice recognition processing on the voice consultation request to obtain an initial text sequence corresponding to the voice consultation request and a parallel phoneme sequence aligned with the initial text sequence word by word; based on a preset college knowledge base, performing two-way parallel vocabulary processing on the initial text sequence and the parallel phoneme sequence to obtain a first candidate vocabulary and a second candidate vocabulary; Combining the first candidate vocabulary and the second candidate vocabulary to obtain a combined vocabulary; performing context disambiguation on the combined vocabulary to obtain normalized query text; Information retrieval is carried out in the preset college knowledge base according to the normalized query text, and corresponding consultation answers are generated; the method specifically includes the steps of performing two-way parallel vocabulary processing on the initial text sequence and the parallel phoneme sequence based on a preset college knowledge base to obtain a first candidate vocabulary and a second candidate vocabulary, wherein the two-way parallel vocabulary processing includes: Performing character string matching on the initial text sequence and an alias set in the preset college knowledge base, and taking a first alias successfully matched and a corresponding formal name as the first candidate vocabulary; Extracting a corresponding phoneme subsequence from the parallel phoneme sequence as a phoneme sequence to be matched aiming at a word element region which is not matched by a character string in the initial text sequence; Calculating the weighted editing distance between the phoneme sequence to be matched and the standard phoneme sequence of each standard entry in the preset college knowledge base based on a preset phoneme confusion matrix to obtain an acoustic similarity score corresponding to each standard entry; determining a similarity threshold corresponding to the length according to the length of the phoneme sequence to be matched; And taking the target standard vocabulary entry with the acoustic similarity score higher than the similarity threshold value and the target formal name corresponding to the target standard vocabulary entry as the second candidate vocabulary.
2. The method according to claim 1, wherein the determining a similarity threshold corresponding to the length according to the length of the phoneme sequence to be matched specifically includes: Constructing an evaluation set comprising positive sample pairs and negative sample pairs, wherein the positive sample pairs are composed of phoneme sequences with different pronunciations and identical word senses, and the negative sample pairs are composed of phoneme sequences with identical pronunciations and different word senses; Calculating the acoustic similarity scores of positive sample pairs and negative sample pairs of each phoneme sequence with different lengths in the evaluation set to form score distribution of the phoneme sequences with different lengths; for each length, determining a score critical point through analysis of a receiver operation characteristic curve, and taking the score critical point as an initial similarity threshold corresponding to the length; Performing function fitting on initial similarity thresholds of all lengths to generate a mapping function containing the correspondence between the phoneme sequence lengths and the similarity thresholds; and determining a similarity threshold corresponding to the length based on the mapping function.
3. The method of claim 1, wherein the performing contextual disambiguation on the merged vocabulary to obtain normalized query text specifically comprises: constructing corresponding disambiguation context aiming at the words to be arbitrated with a plurality of candidate formal names in the merged words, wherein the disambiguation context consists of all word elements except the words to be arbitrated in the initial text sequence; calculating and distributing corresponding context importance weights for each word element; calculating semantic association strength total points of each candidate formal name of the vocabulary to be arbitrated and the disambiguation context based on the context importance weight; And selecting a target candidate formal name corresponding to the highest semantic association strength total score from a plurality of candidate formal names based on the semantic association strength total score to replace the vocabulary to be arbitrated, so as to obtain the normalized query text.
4. A method according to claim 3, wherein said calculating, based on said context importance weights, a semantic association strength total score for each candidate formal name of said vocabulary to be arbitrated and said disambiguation context, comprises: acquiring a preset vocabulary co-occurrence relationship graph, wherein nodes of the preset vocabulary co-occurrence relationship graph represent vocabularies, and edges of the preset vocabulary co-occurrence relationship graph represent co-occurrence relationships among vocabularies; Determining a starting point node corresponding to the candidate formal names and an end point node corresponding to a target word element in the preset vocabulary co-occurrence relation map aiming at each candidate formal name, wherein the target word element is any word element in a plurality of word elements included in the disambiguation context; calculating the shortest path distance from the starting point node to the end point node, and taking the reciprocal of the shortest path distance as a basic semantic association score between the candidate formal name and the target word element; multiplying the basic semantic association score corresponding to each word element in the disambiguation context with the context importance weight corresponding to each word element to obtain a weighted association score corresponding to the word element; and summing the weighted association scores of all the words in the disambiguation context to generate the semantic association strength total score of the candidate formal name.
5. The method of claim 1, wherein the performing a speech recognition process on the speech advisory request obtains an initial text sequence corresponding to the speech advisory request and a parallel phoneme sequence aligned word by word with the initial text sequence, specifically comprising: extracting acoustic features of the voice consultation request to generate an acoustic feature sequence; inputting the acoustic feature sequence into a preset acoustic model, and generating a phoneme probability distribution sequence aligned with the frames of the acoustic feature sequence; generating the initial text sequence and a word element time boundary corresponding to each word element in the initial text sequence through decoding based on the phoneme probability distribution sequence; Determining an optimal phoneme path based on the phoneme probability distribution sequence, and merging phonemes which are continuously the same in the optimal phoneme path to generate a compressed phoneme sequence; And according to the word element time boundary, segmenting the compressed phoneme sequence to generate a parallel phoneme sequence aligned with the initial text sequence word by word element.
6. The method of claim 1, wherein the merging the first candidate vocabulary and the second candidate vocabulary to obtain a merged vocabulary specifically includes: performing union operation on the first candidate vocabulary and the second candidate vocabulary to obtain a union vocabulary set; extracting position information corresponding to each union vocabulary aiming at each union vocabulary contained in the union vocabulary set, wherein the position information comprises a starting position and an ending position of the union vocabulary in the initial text sequence; Based on the position information, combining a first union vocabulary and a second union vocabulary in the union vocabulary set to obtain a combined vocabulary, adding the rest vocabulary into the combined vocabulary, wherein the first union vocabulary and the second union vocabulary are any two union vocabularies with a cross relation or a containing relation in a plurality of union vocabularies included in the union vocabulary set, and the rest vocabulary is the union vocabulary except the first union vocabulary and the second union vocabulary in the union vocabulary set.
7. A college voice advisory data processing apparatus for performing the method of any one of claims 1-6, the apparatus comprising a voice request acquisition module (301), a voice recognition processing module (302), a parallel vocabulary processing module (303), a combined vocabulary module (304), a context disambiguation module (305), and an advisory reply generation module (306), wherein: The voice request acquisition module (301) is used for acquiring a voice consultation request input by a user; the voice recognition processing module (302) is used for performing voice recognition processing on the voice consultation request to obtain an initial text sequence corresponding to the voice consultation request and a parallel phoneme sequence aligned with the initial text sequence word by word; the parallel vocabulary processing module (303) is configured to perform two-way parallel vocabulary processing on the initial text sequence and the parallel phoneme sequence based on a preset college knowledge base, so as to obtain a first candidate vocabulary and a second candidate vocabulary; The combined vocabulary module (304) is configured to combine the first candidate vocabulary and the second candidate vocabulary to obtain a combined vocabulary; The context disambiguation module (305) is configured to perform context disambiguation on the combined vocabulary to obtain a normalized query text; The consultation reply generation module (306) is used for carrying out information retrieval in the preset college knowledge base according to the normalized query text to generate a corresponding consultation reply.
8. An electronic device comprising a processor (401), a memory (405), a user interface (403) and a network interface (404), the memory (405) being configured to store instructions, the user interface (403) and the network interface (404) being configured to communicate to other devices, the processor (401) being configured to execute the instructions stored in the memory (405) to cause the electronic device (400) to perform the method according to any of claims 1-6.
9. A computer readable storage medium storing instructions which, when executed, perform the method of any one of claims 1-6.

Description

College voice consultation data processing method, device, equipment and medium Technical Field The application relates to the field of data processing, in particular to a method, a device, equipment and a medium for processing voice consultation data of universities. Background With the vigorous development and the continuous and deep informative construction of higher education institutions, each university needs to process a great number of consultation requests from examinees, students, parents and social circles every year, and the consultations relate to aspects of the policies of recruitment, professional settings, campus life, educational administration and the like. Traditional automatic voice response systems (IVR) have a solidified function, and can only navigate through keys, so that the user cannot understand the complex or personalized problems presented by natural language, and the information acquisition path is long and the satisfaction degree is low. Therefore, how to perform high-efficiency and accurate automatic processing on massive unstructured voice consultation data, so as to improve the intelligent level of the high-school consultation service, and the method has become a technical problem to be solved in the current informatization construction of universities. In order to solve the problem that the conventional consultation mode cannot effectively understand the natural language intention of the user, an intelligent processing scheme combining automatic voice recognition and natural language understanding technology is adopted in the prior art. The workflow of the scheme is that the voice query of a user is converted into a text sequence in real time through an automatic voice recognition engine, then the converted text is analyzed by utilizing a natural language understanding model, key intention and entity information contained in the text is extracted, finally, search matching is carried out in a background knowledge base according to the recognized intention and entity, and the query result is broadcast to the user through a voice synthesis technology. The technical scheme overcomes the defects of the traditional mode to a great extent, can process a part of query based on natural language, and realizes preliminary consultation automation. However, the above method still has a technical disadvantage when applied to voice consultation in universities. Colleges and universities are a unique collection of knowledge and management, in which there are a large number of proprietary words, acronyms or acronyms with high specificity and ambiguity, e.g., in different colleges and universities, "colleges" may refer to "information science and engineering colleges" and "software and information colleges" and so on. Because the general speech recognition model lacks training for the specific acoustic scenes, the short names or the terms are easy to recognize errors in a transcription link, so that deviation of intention understanding is caused, further, error information is provided for a user or effective answers cannot be provided, and the answer accuracy of the technology in a college scene is greatly limited. Disclosure of Invention The application provides a method, a device, equipment and a medium for processing voice consultation data of a college, which improve the accuracy of voice consultation service of the college. The application provides a college voice consultation data processing method, which comprises the steps of obtaining a voice consultation request input by a user, conducting voice recognition processing on the voice consultation request to obtain an initial text sequence corresponding to the voice consultation request and a parallel phoneme sequence aligned with the initial text sequence word by word, conducting two-way parallel vocabulary processing on the initial text sequence and the parallel phoneme sequence based on a preset college knowledge base to obtain a first candidate vocabulary and a second candidate vocabulary, merging the first candidate vocabulary and the second candidate vocabulary to obtain a merged vocabulary, conducting context disambiguation on the merged vocabulary to obtain a normalized query text, and conducting information retrieval in the preset college knowledge base according to the normalized query text to generate a corresponding consultation response. By adopting the technical scheme, the voice consultation request input by the user is acquired, the voice consultation request is subjected to voice recognition processing, an initial text sequence and a parallel phoneme sequence aligned with the initial text sequence word by word are obtained, the voice input of the user can be accurately converted into a text form, the corresponding relation between the word elements and the phonemes is established, and a foundation is laid for subsequent vocabulary processing. And then, performing two-way parallel vocabulary processing on the initial t