CN-121999807-A - Pronunciation error diagnosis and correction method for Chinese learner based on PCA acoustic feature reconstruction

CN121999807ACN 121999807 ACN121999807 ACN 121999807ACN-121999807-A

Abstract

The invention discloses a pronunciation error diagnosis and correction method for Chinese learners based on PCA acoustic feature reconstruction, and belongs to the technical field of voice signal processing and pronunciation evaluation. The method comprises an off-line training stage and an on-line evaluating stage, wherein the off-line training stage firstly builds a Chinese learner voice corpus containing target phoneme standard pronunciation and typical mispronounces, extracts high-dimensional acoustic features to form a feature matrix, retains the main components of more than 90% of total variance through PCA analysis to build a low-dimensional acoustic subspace, trains a deep neural network pronunciation error classifier, and extracts the same high-dimensional features of the voice to be tested in the on-line evaluating stage, projects the same high-dimensional features to the low-dimensional subspace, inputs the same to the classifier to obtain an error diagnosis result, and further generates personalized correction feedback. According to the invention, the acoustic feature space is reconstructed through PCA, irrelevant noise is effectively filtered, wrong pronunciation key distinguishing features are highlighted, the recognition accuracy of Chinese learner specific phoneme pronunciation errors is remarkably improved, the problem of low diagnosis precision caused by high-dimensional feature redundancy in the prior art is solved, and the method has strong practicability and popularization value.

Inventors

PAN YANYAN
Han Ruiqian
LIU YINGQIN

Assignees

山东管理学院

Dates

Publication Date: 20260508
Application Date: 20260212

Claims (3)

1. The pronunciation error diagnosis and correction method for Chinese learners based on PCA acoustic feature reconstruction is characterized by comprising an offline training stage and an online evaluating stage: [ offline training phase ] A1 Collecting a Chinese learner phonetic corpus comprising standard pronunciations and at least one typical mispronounce for the target phoneme; a2 Extracting high-dimensional acoustic feature vectors from each voice sample in the corpus to form an acoustic feature matrix; a3 Performing principal component analysis PCA on the acoustic feature matrix in the step a 2) to obtain a principal component transformation matrix, reserving principal components capable of explaining more than 90% of total variance, and constructing a low-dimensional acoustic subspace aiming at the target phoneme; a4 Projecting acoustic feature vectors of all voice samples into the low-dimensional acoustic subspace, and taking the acoustic feature vectors as input to train a deep neural network pronunciation error classifier; [ Online evaluation stage ] B1 Receiving pronunciation of a learner to be evaluated; b2 Extracting the high-dimensional acoustic feature vector which is the same as that in the step a 2) from the voice to be evaluated; b3 Using the principal component transformation matrix obtained in the step a 3) to project the high-dimensional acoustic feature vector in the step b 2) to the low-dimensional acoustic subspace; b4 Inputting the projected low-dimensional feature vector into the trained deep neural network pronunciation error classifier in the step a 4) and outputting a pronunciation error type diagnosis result, b 5) generating and pushing personalized pronunciation correction guidance according to the diagnosis result.
2. The method of claim 1, wherein the target phonemes include, but are not limited to,/l/,/r/,/θ/,/ð/, in english.
3. The method of claim 1, wherein the high-dimensional acoustic feature vector comprises mel-frequency cepstral coefficient MFCC, fundamental frequency F0, and short-time energy.

Description

Pronunciation error diagnosis and correction method for Chinese learner based on PCA acoustic feature reconstruction Technical Field The invention belongs to the technical field of voice signal processing, pronunciation evaluation and intelligent education, and particularly relates to a pronunciation error diagnosis and correction method for Chinese learners based on PCA acoustic feature reconstruction. Background Along with the acceleration of globalization process, english is taken as an international general language, and pronunciation accuracy of English becomes a key for improving English communication ability of Chinese learners. However, because of the significant difference between the voice systems of Chinese and English, chinese learners are easy to confuse specific phonemes (such as/l/,/r/,/θ/,/ð /) in the English pronunciation process, typical pronunciation errors occur, and the type and the reason of the errors are difficult to accurately identify by themselves, so that the traditional artificial pronunciation guidance has the problems of low efficiency, high cost and limited coverage. The existing pronunciation error diagnosis method is used for realizing error recognition by extracting the acoustic features of the voice and combining a machine learning model, but has the defects that firstly, the extracted acoustic features are high-dimensional data and contain a large amount of irrelevant noise and redundant information, so that the model training efficiency is low and the diagnosis precision is insufficient, secondly, a specific feature optimization scheme is not designed aiming at the pronunciation characteristics of Chinese learners, the subtle differences of standard pronunciation and typical error pronunciation are difficult to distinguish, thirdly, the feature processing is mainly performed by adopting a simple preprocessing mode, the feature dimension reduction and discriminant optimization depth are not combined, and the core features of the error pronunciation cannot be highlighted, so that the pertinence and the effectiveness of correction feedback are influenced. The Principal Component Analysis (PCA) is a mature high-dimensional data dimension reduction method, which can retain data core information and filter redundant noise through linear transformation, but in the prior art, the PCA is only used as an auxiliary preprocessing step, is not combined with the core requirement of pronunciation error diagnosis, does not construct a low-dimensional discriminant acoustic subspace aiming at a specific phoneme, cannot fully play the role of PCA in the distinguishing of the prominent pronunciation characteristics, and is difficult to solve the technical problem of low pronunciation error diagnosis precision of Chinese learners. Disclosure of Invention Object of the Invention Aiming at the defects existing in the prior art, the invention provides a pronunciation error diagnosis and correction method for Chinese learners based on PCA acoustic feature reconstruction, which aims to solve the problems of high-dimensional feature redundancy, large noise interference, low diagnosis precision and insufficient correction feedback pertinence in the existing pronunciation error diagnosis, improve the recognition accuracy and correction effect of the pronunciation errors of specific phonemes of the Chinese learners, reduce pronunciation guidance cost and improve learning efficiency. Technical proposal In order to achieve the aim of the invention, the invention adopts the following technical scheme that the method for diagnosing and correcting the pronunciation errors of Chinese learners based on PCA acoustic feature reconstruction comprises an off-line training stage and an on-line evaluating stage, and comprises the following specific steps: Offline training stage: a1 Collecting a Chinese learner phonetic corpus containing standard pronunciations and at least one typical mispronounce for target phonemes including, but not limited to,/l/,/r/,/θ/,/ð/, in English; a2 Extracting high-dimensional acoustic feature vectors for each voice sample in the corpus to form an acoustic feature matrix, wherein the high-dimensional acoustic feature vectors comprise Mel Frequency Cepstrum Coefficients (MFCCs), fundamental frequencies (F0) and short-time energy; a3 Performing Principal Component Analysis (PCA) on the acoustic feature matrix in step a 2) to obtain a principal component transformation matrix, retaining principal components capable of interpreting more than 90% of the total variance, constructing a low-dimensional acoustic subspace for the target phoneme, a 4) projecting acoustic feature vectors of all voice samples into the low-dimensional acoustic subspace, and training a deep neural network pronunciation error classifier by taking the acoustic feature vectors as input. On-line evaluation stage: b1 Receiving pronunciation of a learner to be evaluated; b2 Extracting the high-dimensional acoustic charac